On Sat, 2018-01-20 at 12:28 -0800, Liran Alon wrote:
> Isn't it cleaner to check for "boot_cpu_has(X86_FEATURE_IBPB)" both
> in svm_vcpu_init_msrpm() and hardware_setup()?

Strictly speaking that's a different check. That's checking if we're
*using* IBPB, not if it exists.

Now that's probably OK here, since we need it for retpoline *and* IBRS-
based mitigations. And we *might* argue that 'nospectre_v2' on the host
kernel command line should indeed stop us exposing the features to
guests. Maybe.

But next comes IBRS support, and we definitely *won't* want to make
exposing that to guests conditional on X86_FEATURE_IBRS, because in the
retpoline case that won't be set and we probably *will* still want to
expose it to guests based merely on the fact that it exists.

So I think Karim has it right here (modulo the change I already made).

If we want a separate control for "don't expose these to guests", we
should do that explicitly.

Attachments:

smime.p7s (5.09 kB)

2018-01-20 21:11:47

On Sun, 2018-01-21 at 11:34 -0800, Linus Torvalds wrote:
> All of this is pure garbage.
>
> Is Intel really planning on making this shit architectural? Has
> anybody talked to them and told them they are f*cking insane?
>
> Please, any Intel engineers here - talk to your managers.

If the alternative was a two-decade product recall and giving everyone
free CPUs, I'm not sure it was entirely insane.

Certainly it's a nasty hack, but hey — the world was on fire and in the
end we didn't have to just turn the datacentres off and go back to goat
farming, so it's not all bad.

As a hack for existing CPUs, it's just about tolerable — as long as it
can die entirely by the next generation.

So the part is I think is odd is the IBRS_ALL feature, where a future
CPU will advertise "I am able to be not broken" and then you have to
set the IBRS bit once at boot time to *ask* it not to be broken. That
part is weird, because it ought to have been treated like the RDCL_NO
bit — just "you don't have to worry any more, it got better".

https://software.intel.com/sites/default/files/managed/c5/63/336996-Speculative-Execution-Side-Channel-Mitigations.pdf

We do need the IBPB feature to complete the protection that retpoline
gives us — it's that or rebuild all of userspace with retpoline.

We'll also want to expose IBRS to VM guests, since Windows uses it.

I think we could probably live without the IBRS frobbing in our own
syscall/interrupt paths, as long as we're prepared to live with the
very hypothetical holes that still exist on Skylake. Because I like
IBRS more... no, let me rephrase... I hate IBRS less than I hate the
'deepstack' and other stuff that was being proposed to make Skylake
almost safe with retpoline.

Attachments:

smime.p7s (5.09 kB)

2018-01-21 21:37:04

by Linus Torvalds

[permalink] [raw]

Subject: Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation

On Sun, Jan 21, 2018 at 12:28 PM, David Woodhouse <[email protected]> wrote:
> On Sun, 2018-01-21 at 11:34 -0800, Linus Torvalds wrote:
>> All of this is pure garbage.
>>
>> Is Intel really planning on making this shit architectural? Has
>> anybody talked to them and told them they are f*cking insane?
>>
>> Please, any Intel engineers here - talk to your managers.
>
> If the alternative was a two-decade product recall and giving everyone
> free CPUs, I'm not sure it was entirely insane.

You seem to have bought into the cool-aid. Please add a healthy dose
of critical thinking. Because this isn't the kind of cool-aid that
makes for a fun trip with pretty pictures. This is the kind that melts
your brain.

> Certainly it's a nasty hack, but hey — the world was on fire and in the
> end we didn't have to just turn the datacentres off and go back to goat
> farming, so it's not all bad.

It's not that it's a nasty hack. It's much worse than that.

> As a hack for existing CPUs, it's just about tolerable — as long as it
> can die entirely by the next generation.

That's part of the big problem here. The speculation control cpuid
stuff shows that Intel actually seems to plan on doing the right thing
for meltdown (the main question being _when_). Which is not a huge
surprise, since it should be easy to fix, and it's a really honking
big hole to drive through. Not doing the right thing for meltdown
would be completely unacceptable.

So the IBRS garbage implies that Intel is _not_ planning on doing the
right thing for the indirect branch speculation.

Honestly, that's completely unacceptable too.

> So the part is I think is odd is the IBRS_ALL feature, where a future
> CPU will advertise "I am able to be not broken" and then you have to
> set the IBRS bit once at boot time to *ask* it not to be broken. That
> part is weird, because it ought to have been treated like the RDCL_NO
> bit — just "you don't have to worry any more, it got better".

It's not "weird" at all. It's very much part of the whole "this is
complete garbage" issue.

The whole IBRS_ALL feature to me very clearly says "Intel is not
serious about this, we'll have a ugly hack that will be so expensive
that we don't want to enable it by default, because that would look
bad in benchmarks".

So instead they try to push the garbage down to us. And they are doing
it entirely wrong, even from a technical standpoint.

I'm sure there is some lawyer there who says "we'll have to go through
motions to protect against a lawsuit". But legal reasons do not make
for good technology, or good patches that I should apply.

> We do need the IBPB feature to complete the protection that retpoline
> gives us — it's that or rebuild all of userspace with retpoline.

BULLSHIT.

Have you _looked_ at the patches you are talking about? You should
have - several of them bear your name.

The patches do things like add the garbage MSR writes to the kernel
entry/exit points. That's insane. That says "we're trying to protect
the kernel". We already have retpoline there, with less overhead.

So somebody isn't telling the truth here. Somebody is pushing complete
garbage for unclear reasons. Sorry for having to point that out.

If this was about flushing the BTB at actual context switches between
different users, I'd believe you. But that's not at all what the
patches do.

As it is, the patches are COMPLETE AND UTTER GARBAGE.

They do literally insane things. They do things that do not make
sense. That makes all your arguments questionable and suspicious. The
patches do things that are not sane.

WHAT THE F*CK IS GOING ON?

And that's actually ignoring the much _worse_ issue, namely that the
whole hardware interface is literally mis-designed by morons.

It's mis-designed for two major reasons:

- the "the interface implies Intel will never fix it" reason.

See the difference between IBRS_ALL and RDCL_NO. One implies Intel
will fix something. The other does not.

Do you really think that is acceptable?

- the "there is no performance indicator".

The whole point of having cpuid and flags from the
microarchitecture is that we can use those to make decisions.

But since we already know that the IBRS overhead is <i>huge</i> on
existing hardware, all those hardware capability bits are just
complete and utter garbage. Nobody sane will use them, since the cost
is too damn high. So you end up having to look at "which CPU stepping
is this" anyway.

I think we need something better than this garbage.

Linus

2018-01-21 22:01:09

by David Woodhouse

[permalink] [raw]

Subject: Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation

On Sun, 2018-01-21 at 13:35 -0800, Linus Torvalds wrote:
> On Sun, Jan 21, 2018 at 12:28 PM, David Woodhouse wrote:
> > As a hack for existing CPUs, it's just about tolerable — as long as it
> > can die entirely by the next generation.
>
> That's part of the big problem here. The speculation control cpuid
> stuff shows that Intel actually seems to plan on doing the right thing
> for meltdown (the main question being _when_). Which is not a huge
> surprise, since it should be easy to fix, and it's a really honking
> big hole to drive through. Not doing the right thing for meltdown
> would be completely unacceptable.
>
> So the IBRS garbage implies that Intel is _not_ planning on doing the
> right thing for the indirect branch speculation.
>
> Honestly, that's completely unacceptable too.

Agreed. I've been saying that since I first saw the IBRS_ALL proposal.
There's *no* good reason for it to be opt-in. Just fix it!

> > So the part is I think is odd is the IBRS_ALL feature, where a future
> > CPU will advertise "I am able to be not broken" and then you have to
> > set the IBRS bit once at boot time to *ask* it not to be broken. That
> > part is weird, because it ought to have been treated like the RDCL_NO
> > bit — just "you don't have to worry any more, it got better".
>
> It's not "weird" at all. It's very much part of the whole "this is
> complete garbage" issue.
>
> The whole IBRS_ALL feature to me very clearly says "Intel is not
> serious about this, we'll have a ugly hack that will be so expensive
> that we don't want to enable it by default, because that would look
> bad in benchmarks".
>
> So instead they try to push the garbage down to us. And they are doing
> it entirely wrong, even from a technical standpoint.

Right. The whole IBRS/IBPB thing as a nasty hack in the short term I
could live with, but it's the long-term implications of IBRS_ALL that
I'm unhappy about.

My understanding was that the IBRS_ALL performance was supposed to not
suck — to the extent that we'd just turn it on and then ALTERNATIVE out
the retpolines, and that would be the best option.

But if that's the case, why are they making it an option, and not just
doing the same as RDCL_NO does for "we fixed Meltdown"?

> > We do need the IBPB feature to complete the protection that retpoline
> > gives us — it's that or rebuild all of userspace with retpoline.
>
> BULLSHIT.
>
> Have you _looked_ at the patches you are talking about? You should
> have - several of them bear your name.
>
> The patches do things like add the garbage MSR writes to the kernel
> entry/exit points. That's insane. That says "we're trying to protect
> the kernel". We already have retpoline there, with less overhead.

You're looking at IBRS usage, not IBPB. They are different things.

Yes, the one you're looking at really *is* trying to protect the
kernel, and you're right that it's largely redundant with retpoline.
(Assuming we can live with the implications on Skylake, as I said.)

> If this was about flushing the BTB at actual context switches between
> different users, I'd believe you. But that's not at all what the
> patches do.

That's what the *IBPB* patches do. Those were deliberately put first in
the series (and in fact that's where I stopped, when I posted).

Attachments:

smime.p7s (5.09 kB)

2018-01-21 22:24:20

On Mon, 2018-01-22 at 11:19 +0100, Peter Zijlstra wrote:
> Right, so if its v2/retpoline only, we really should do this asap and
> then rebuild world on distros (or arch/gentoo people could read a book
> or something).

By the time we manage to rebuild all the distros, I *seriously* hope
that someone would be shipping a fixed CPU.

And not just the half-way-there IBRS_ALL bit that still requires the
IBPB flushing on context switches that's discussed in this patch, but
an *actual* fix so we can forget about it all and go drinking.

Attachments:

smime.p7s (5.09 kB)

2018-01-22 12:07:14

by Borislav Petkov

[permalink] [raw]

Subject: Re: [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure

On Mon, Jan 22, 2018 at 10:51:53AM +0100, Peter Zijlstra wrote:
> That wouldn't be enough; AFAIU there's people with this stuff already
> flashed in their BIOS. So the kernel needs to deal with it one way or
> another.

Not a lot we can do there except maybe disable IBRS on those and users
can go and complain to their BIOS vendor to give them a downgrade or
they can downgrade themselves.

If we had free BIOS, this would've been a whole different story...

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2018-01-22 13:31:16

by Greg Kroah-Hartman

[permalink] [raw]

Subject: Re: [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure

On Mon, Jan 22, 2018 at 01:06:18PM +0100, Borislav Petkov wrote:
> On Mon, Jan 22, 2018 at 10:51:53AM +0100, Peter Zijlstra wrote:
> > That wouldn't be enough; AFAIU there's people with this stuff already
> > flashed in their BIOS. So the kernel needs to deal with it one way or
> > another.
>
> Not a lot we can do there except maybe disable IBRS on those and users
> can go and complain to their BIOS vendor to give them a downgrade or
> they can downgrade themselves.
>
> If we had free BIOS, this would've been a whole different story...

We kind of do, you can submit patches to UEFI, but I doubt that the
processor-specific portions are actually present in the Tianocore code
to be able to be patched.

What about LinuxBoot <https://linuxboot.org>, does it too take over too
late in the boot process to control this?

thanks,

greg k-h

2018-01-22 13:40:28

by Woodhouse, David

[permalink] [raw]

Subject: Re: [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure

On Mon, 2018-01-22 at 14:30 +0100, Greg Kroah-Hartman wrote:
> We kind of do, you can submit patches to UEFI, but I doubt that the
> processor-specific portions are actually present in the Tianocore code
> to be able to be patched.

This is just about which microcode your BIOS loads into the CPU before
booting the OS. It's not "process-specific portions in the Tianocore
code"; more a data blob — just like when Linux updates microcode.

> What about LinuxBoot <https://linuxboot.org>, does it too take over too
> late in the boot process to control this?

Yes, I believe microcode updates are done in PEI which is before
LinuxBoot takes over.

Attachments:

smime.p7s (5.09 kB)

2018-01-22 16:28:39

2018-01-23 09:38:21

On Tue, 2018-01-23 at 11:15 +0100, Ingo Molnar wrote:
>
> BTW., the reason this is enabled on all distro kernels is because the overhead is
> a single patched-in NOP instruction in the function epilogue, when tracing is
> disabled. So it's not even a CALL+RET - it's a patched in NOP.

Hm? We still have GCC emitting 'call __fentry__' don't we? Would be
nice to get to the point where we can patch *that* out into a NOP... or
are you saying we already can?

But this is a digression. I was being pedantic about the "0 cycles" but
sure, this would be perfectly tolerable.

Attachments:

smime.p7s (5.09 kB)

2018-01-23 10:35:54

On Tue, 2018-01-23 at 11:44 +0100, Ingo Molnar wrote:
> * David Woodhouse <[email protected]> wrote:
> > Hm? We still have GCC emitting 'call __fentry__' don't we? Would be nice to get
> > to the point where we can patch *that* out into a NOP... or are you saying we
> > already can?
> Yes, we already can and do patch the 'call __fentry__/ mcount' call site into a
> NOP today - all 50,000+ call sites on a typical distro kernel.
>
> We did so for a long time - this is all a well established, working mechanism.

That's neat; I'd missed that.

> > But this is a digression. I was being pedantic about the "0 cycles" but sure,
> > this would be perfectly tolerable.
> It's not a digression in two ways:
>
> - I wanted to make it clear that for distro kernels it _is_ a zero cycles overhead
> mechanism for non-SkyLake CPUs, literally.
>
> - I noticed that Meltdown and the CR3 writes for PTI appears to have established a
> kind of ... insensitivity and numbness to kernel micro-costs, which peaked with
> the per-syscall MSR write nonsense patch of the SkyLake workaround.
> That attitude is totally unacceptable to me as x86 maintainer and yes, still
> every cycle counts.

Yeah, absolutely. But here we're talking about the overhead on non-SKL,
and on non-SKL the IBRS overhead is zero too (well, again not precisely
zero because it turns into NOPs).

You're absolutely right that we shouldn't stop counting cycles.

I've already noted that on SKL IBRS is actually a lot faster than on
earlier generations, and we also get back some of the overhead by
turning the retpoline into a bare jmp again. We haven't *forgotten*
about performance.

I'd like to see your solution once the details are sorted out, and see
proper benchmarks — both microbenchmarks and real workloads — comparing
the two. And then make a reasoned decision based on that, and on how
happy we are with the theoretical holes that your solution leaves, in
the cold light of day.

We should also look at whether we want to set STIBP too, which is
somewhat orthogonal to using IBRS to protect the kernel, and could end
up with some of the same MSR writes (at least setting to zero) on some
of the same code paths.

Attachments:

smime.p7s (5.09 kB)

2018-01-23 11:32:46

by Liran Alon

[permalink] [raw]

Subject: Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation

----- [email protected] wrote:

> On Sun, 2018-01-21 at 14:27 -0800, Linus Torvalds wrote:
> > On Sun, Jan 21, 2018 at 2:00 PM, David Woodhouse
> <[email protected]> wrote:
> > >>
> > >> The patches do things like add the garbage MSR writes to the
> kernel
> > >> entry/exit points. That's insane. That says "we're trying to
> protect
> > >> the kernel". We already have retpoline there, with less
> overhead.
> > >
> > > You're looking at IBRS usage, not IBPB. They are different
> things.
> >
> > Ehh. Odd intel naming detail.
> >
> > If you look at this series, it very much does that kernel
> entry/exit
> > stuff. It was patch 10/10, iirc. In fact, the patch I was replying
> to
> > was explicitly setting that garbage up.
> >
> > And I really don't want to see these garbage patches just
> mindlessly
> > sent around.
>
> I think we've covered the technical part of this now, not that you
> like
> it — not that any of us *like* it. But since the peanut gallery is
> paying lots of attention it's probably worth explaining it a little
> more for their benefit.
>
> This is all about Spectre variant 2, where the CPU can be tricked
> into
> mispredicting the target of an indirect branch. And I'm specifically
> looking at what we can do on *current* hardware, where we're limited
> to
> the hacks they can manage to add in the microcode.
>
> The new microcode from Intel and AMD adds three new features.
>
> One new feature (IBPB) is a complete barrier for branch prediction.
> After frobbing this, no branch targets learned earlier are going to
> be
> used. It's kind of expensive (order of magnitude ~4000 cycles).
>
> The second (STIBP) protects a hyperthread sibling from following
> branch
> predictions which were learned on another sibling. You *might* want
> this when running unrelated processes in userspace, for example. Or
> different VM guests running on HT siblings.
>
> The third feature (IBRS) is more complicated. It's designed to be
> set when you enter a more privileged execution mode (i.e. the
> kernel).
> It prevents branch targets learned in a less-privileged execution
> mode,
> BEFORE IT WAS MOST RECENTLY SET, from taking effect. But it's not
> just
> a 'set-and-forget' feature, it also has barrier-like semantics and
> needs to be set on *each* entry into the kernel (from userspace or a
> VM
> guest). It's *also* expensive. And a vile hack, but for a while it
> was
> the only option we had.
>
> Even with IBRS, the CPU cannot tell the difference between different
> userspace processes, and between different VM guests. So in addition
> to
> IBRS to protect the kernel, we need the full IBPB barrier on context
> switch and vmexit. And maybe STIBP while they're running.
>
> Then along came Paul with the cunning plan of "oh, indirect branches
> can be exploited? Screw it, let's not have any of *those* then",
> which
> is retpoline. And it's a *lot* faster than frobbing IBRS on every
> entry
> into the kernel. It's a massive performance win.
>
> So now we *mostly* don't need IBRS. We build with retpoline, use IBPB
> on context switches/vmexit (which is in the first part of this patch
> series before IBRS is added), and we're safe. We even refactored the
> patch series to put retpoline first.
>
> But wait, why did I say "mostly"? Well, not everyone has a retpoline
> compiler yet... but OK, screw them; they need to update.
>
> Then there's Skylake, and that generation of CPU cores. For
> complicated
> reasons they actually end up being vulnerable not just on indirect
> branches, but also on a 'ret' in some circumstances (such as 16+
> CALLs
> in a deep chain).
>
> The IBRS solution, ugly though it is, did address that. Retpoline
> doesn't. There are patches being floated to detect and prevent deep
> stacks, and deal with some of the other special cases that bite on
> SKL,
> but those are icky too. And in fact IBRS performance isn't anywhere
> near as bad on this generation of CPUs as it is on earlier CPUs
> *anyway*, which makes it not quite so insane to *contemplate* using
> it
> as Intel proposed.
>
> That's why my initial idea, as implemented in this RFC patchset, was
> to
> stick with IBRS on Skylake, and use retpoline everywhere else. I'll
> give you "garbage patches", but they weren't being "just mindlessly
> sent around". If we're going to drop IBRS support and accept the
> caveats, then let's do it as a conscious decision having seen what it
> would look like, not just drop it quietly because poor Davey is too
> scared that Linus might shout at him again. :)
>
> I have seen *hand-wavy* analyses of the Skylake thing that mean I'm
> not
> actually lying awake at night fretting about it, but nothing concrete
> that really says it's OK.
>
> If you view retpoline as a performance optimisation, which is how it
> first arrived, then it's rather unconventional to say "well, it only
> opens a *little* bit of a security hole but it does go nice and fast
> so
> let's do it".
>
> But fine, I'm content with ditching the use of IBRS to protect the
> kernel, and I'm not even surprised. There's a *reason* we put it last
> in the series, as both the most contentious and most dispensable
> part.
> I'd be *happier* with a coherent analysis showing Skylake is still
> OK,
> but hey-ho, screw Skylake.
>
> The early part of the series adds the new feature bits and detects
> when
> it can turn KPTI off on non-Meltdown-vulnerable Intel CPUs, and also
> supports the IBPB barrier that we need to make retpoline complete.
> That
> much I think we definitely *do* want. There have been a bunch of us
> working on this behind the scenes; one of us will probably post that
> bit in the next day or so.
>
> I think we also want to expose IBRS to VM guests, even if we don't
> use
> it ourselves. Because Windows guests (and RHEL guests; yay!) do use
> it.
>
> If we can be done with the shouty part, I'd actually quite like to
> have
> a sensible discussion about when, if ever, we do IBPB on context
> switch
> (ptraceability and dumpable have both been suggested) and when, if
> ever, we set STIPB in userspace.

It is also important to note that current solutions, as I understand it, still have info-leak issues.

If retpoline is being used, user-mode code can leak RSB entries created while CPU was in kernel-mode.
Therefore, breaking KASLR. In order to handle this, every exit from kernel-mode to user-mode should stuff RSB. In addition, this stuffing of RSB may need to be done from a fixed address to avoid leaking the address of the RSB stuffing itself. Same concept applies for VMEntry into guests. Hypervisor should stuff RSB just before VMEntry, otherwise guest will be able to leak RSB entries which reveals hypervisor addresses.

If IBRS is being used, things seems to be even worse.
IBRS prevents BTB entries created at lower prediction-mode from being used by higher prediction-mode code.
However, nothing seems to prevent lower prediction-mode code from using BTB entries of higher prediction-mode code. This means that user-mode code could leak BTB entries in order to break KASLR and guests could leaks host's BTB entries to reveal hypervisor addresses. This seems to be an issue even with future CPUs that will have "IBRS all-the-time" feature.
Note that this issue is not theoretical. This is exactly what Google's Project-Zero KVM PoC did. They leaked host's BTB entries to reveal kvm-intel.ko, kvm.ko & vmlinux addresses.
It seems that the correct way to really handle this scenario should be to tag every BTB entry with prediction-mode and make CPU only use BTB entries tagged with current prediction-mode. Therefore, entirely separating the BTB entries between prediction-modes. That, in my opinion, should replace the IBRS-feature.

-Liran

2018-01-23 15:02:23

by Dave Hansen

[permalink] [raw]

Subject: Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation

On 01/23/2018 01:27 AM, Ingo Molnar wrote:
>
> - All asynchronous contexts (IRQs, NMIs, etc.) stuff the RSB before IRET. (The
> tracking could probably made IRQ and maybe even NMI safe, but the worst-case
> nesting scenarios make my head ache.)

This all sounds totally workable to me. We talked about using ftrace
itself to track call depth, but it would be unusable in production, of
course. This seems workable, though. You're also totally right about
the zero overhead on most kernels with it turned off when we don't need
RSB underflow protection (basically pre-Skylake).

I also agree that the safe thing to do is to just stuff before iret. I
bet we can get a ftrace-driven RSB tracker working precisely enough even
with NMIs, but it's way simpler to just stuff and be done with it for now.

2018-01-23 16:14:05

by Tom Lendacky

[permalink] [raw]

Subject: Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation

On 1/21/2018 1:14 PM, Andy Lutomirski wrote:
>
>
>> On Jan 20, 2018, at 11:23 AM, KarimAllah Ahmed <[email protected]> wrote:
>>
>> From: Tim Chen <[email protected]>
>>
>> Create macros to control Indirect Branch Speculation.
>>
>> Name them so they reflect what they are actually doing.
>> The macros are used to restrict and unrestrict the indirect branch speculation.
>> They do not *disable* (or *enable*) indirect branch speculation. A trip back to
>> user-space after *restricting* speculation would still affect the BTB.
>>
>> Quoting from a commit by Tim Chen:
>>
>> """
>> If IBRS is set, near returns and near indirect jumps/calls will not allow
>> their predicted target address to be controlled by code that executed in a
>> less privileged prediction mode *BEFORE* the IBRS mode was last written with
>> a value of 1 or on another logical processor so long as all Return Stack
>> Buffer (RSB) entries from the previous less privileged prediction mode are
>> overwritten.
>>
>> Thus a near indirect jump/call/return may be affected by code in a less
>> privileged prediction mode that executed *AFTER* IBRS mode was last written
>> with a value of 1.
>> """
>>
>> [ tglx: Changed macro names and rewrote changelog ]
>> [ karahmed: changed macro names *again* and rewrote changelog ]
>>
>> Signed-off-by: Tim Chen <[email protected]>
>> Signed-off-by: Thomas Gleixner <[email protected]>
>> Signed-off-by: KarimAllah Ahmed <[email protected]>
>> Cc: Andrea Arcangeli <[email protected]>
>> Cc: Andi Kleen <[email protected]>
>> Cc: Peter Zijlstra <[email protected]>
>> Cc: Greg KH <[email protected]>
>> Cc: Dave Hansen <[email protected]>
>> Cc: Andy Lutomirski <[email protected]>
>> Cc: Paolo Bonzini <[email protected]>
>> Cc: Dan Williams <[email protected]>
>> Cc: Arjan Van De Ven <[email protected]>
>> Cc: Linus Torvalds <[email protected]>
>> Cc: David Woodhouse <[email protected]>
>> Cc: Ashok Raj <[email protected]>
>> Link: https://lkml.kernel.org/r/3aab341725ee6a9aafd3141387453b45d788d61a.1515542293.git.tim.c.chen@linux.intel.com
>> Signed-off-by: David Woodhouse <[email protected]>
>> ---
>> arch/x86/entry/calling.h | 73 ++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 73 insertions(+)
>>
>> diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
>> index 3f48f69..5aafb51 100644
>> --- a/arch/x86/entry/calling.h
>> +++ b/arch/x86/entry/calling.h
>> @@ -6,6 +6,8 @@
>> #include <asm/percpu.h>
>> #include <asm/asm-offsets.h>
>> #include <asm/processor-flags.h>
>> +#include <asm/msr-index.h>
>> +#include <asm/cpufeatures.h>
>>
>> /*
>>
>> @@ -349,3 +351,74 @@ For 32-bit we have the following conventions - kernel is built with
>> .Lafter_call_\@:
>> #endif
>> .endm
>> +
>> +/*
>> + * IBRS related macros
>> + */
>> +.macro PUSH_MSR_REGS
>> + pushq %rax
>> + pushq %rcx
>> + pushq %rdx
>> +.endm
>> +
>> +.macro POP_MSR_REGS
>> + popq %rdx
>> + popq %rcx
>> + popq %rax
>> +.endm
>> +
>> +.macro WRMSR_ASM msr_nr:req edx_val:req eax_val:req
>> + movl \msr_nr, %ecx
>> + movl \edx_val, %edx
>> + movl \eax_val, %eax
>> + wrmsr
>> +.endm
>> +
>> +.macro RESTRICT_IB_SPEC
>> + ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
>> + PUSH_MSR_REGS
>> + WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $SPEC_CTRL_IBRS
>> + POP_MSR_REGS
>> +.Lskip_\@:
>> +.endm
>> +
>> +.macro UNRESTRICT_IB_SPEC
>> + ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
>> + PUSH_MSR_REGS
>> + WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $0
>
> I think you should be writing 2, not 0, since I'm reasonably confident that we want STIBP on. Can you explain why you're writing 0?

Do we want to talk about STIBP in general? Should it be (yet another)
boot option to enable or disable? If there is STIBP support without
IBRS support, it could be a set and forget at boot time.

Thanks,
Tom

>
> Also, holy cow, there are so many macros here.
>
> And a meta question: why are there so many submitters of the same series?
>

2018-01-23 16:26:05

On Sun, 2018-01-21 at 15:31 +0100, Thomas Gleixner wrote:
> >
> > XX: Do we want a microcode blacklist?
>
> Oh yes, we want a microcode blacklist. Ideally we refuse to load the
> affected microcode in the first place and if its already loaded then at
> least avoid to use the borked features.
>
> PR texts promising that Intel is committed to transparency in this matter
> are not sufficient. Intel, please provide the facts, i.e. a proper list of
> micro codes and affected SKUs, ASAP.

They've finally published one, at
https://newsroom.intel.com/wp-content/uploads/sites/11/2018/01/microcode-update-guidance.pdf

For shits and giggles, you can compare it with the one at
https://kb.vmware.com/s/article/52345

Intel's seems to be a bit rushed. For example for Broadwell-EX 406F1
they say "0x25, 0x23" are bad, but VMware's list says 0x0B000025 and I
have a CPU with 0x0B0000xx. So I've "corrected" their numbers in
attempt at a blacklist patch accordingly, and likewise for some Skylake
SKUs. But there are others in Intel's list that I can't easily
proofread for them right now. Am I missing something?

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index b720dacac051..52855d1a4f9a 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -102,6 +102,57 @@ static void probe_xeon_phi_r3mwait(struct cpuinfo_x86 *c)
ELF_HWCAP2 |= HWCAP2_RING3MWAIT;
}

+/*
+ * Early microcode releases for the Spectre v2 mitigation were broken:
+ * https://newsroom.intel.com/wp-content/uploads/sites/11/2018/01/microcode-update-guidance.pdf
+ * VMware also has a list at https://kb.vmware.com/s/article/52345
+ */
+struct sku_microcode {
+ u8 model;
+ u8 stepping;
+ u32 microcode;
+};
+static const struct sku_microcode spectre_bad_microcodes[] = {
+ { INTEL_FAM6_KABYLAKE_DESKTOP, 0x0B, 0x80 },
+ { INTEL_FAM6_KABYLAKE_MOBILE, 0x0A, 0x80 },
+ { INTEL_FAM6_KABYLAKE_MOBILE, 0x0A, 0x80 },
+ { INTEL_FAM6_KABYLAKE_MOBILE, 0x09, 0x80 },
+ { INTEL_FAM6_KABYLAKE_DESKTOP, 0x09, 0x80 },
+ { INTEL_FAM6_SKYLAKE_X, 0x04, 0x0200003C },
+ { INTEL_FAM6_SKYLAKE_MOBILE, 0x03, 0x000000C2 },
+ { INTEL_FAM6_SKYLAKE_DESKTOP, 0x03, 0x000000C2 },
+ { INTEL_FAM6_BROADWELL_CORE, 0x04, 0x28 },
+ { INTEL_FAM6_BROADWELL_GT3E, 0x01, 0x0000001B },
+ { INTEL_FAM6_HASWELL_ULT, 0x01, 0x21 },
+ { INTEL_FAM6_HASWELL_GT3E, 0x01, 0x18 },
+ { INTEL_FAM6_HASWELL_CORE, 0x03, 0x23 },
+ { INTEL_FAM6_IVYBRIDGE_X, 0x04, 0x42a },
+ { INTEL_FAM6_HASWELL_X, 0x02, 0x3b },
+ { INTEL_FAM6_HASWELL_X, 0x04, 0x10 },
+ { INTEL_FAM6_HASWELL_CORE, 0x03, 0x23 },
+ { INTEL_FAM6_BROADWELL_XEON_D, 0x02, 0x14 },
+ { INTEL_FAM6_BROADWELL_XEON_D, 0x03, 0x7000011 },
+ { INTEL_FAM6_BROADWELL_GT3E, 0x01, 0x0000001B },
+ /* For 406F1 Intel says "0x25, 0x23" while VMware says 0x0B000025
+ * and a real CPU has a firmware in the 0x0B0000xx range. So: */
+ { INTEL_FAM6_BROADWELL_X, 0x01, 0x0b000025 },
+ { INTEL_FAM6_KABYLAKE_DESKTOP, 0x09, 0x80 },
+ { INTEL_FAM6_SKYLAKE_X, 0x03, 0x100013e },
+ { INTEL_FAM6_SKYLAKE_X, 0x04, 0x200003c },
+};
+
+static int bad_spectre_microcode(struct cpuinfo_x86 *c)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(spectre_bad_microcodes); i++) {
+ if (c->x86_model == spectre_bad_microcodes[i].model &&
+     c->x86_mask == spectre_bad_microcodes[i].stepping)
+ return (c->microcode <= spectre_bad_microcodes[i].microcode);
+ }
+ return 0;
+}
+
static void early_init_intel(struct cpuinfo_x86 *c)
{
u64 misc_enable;
@@ -122,6 +173,18 @@ static void early_init_intel(struct cpuinfo_x86 *c)
if (c->x86 >= 6 && !cpu_has(c, X86_FEATURE_IA64))
c->microcode = intel_get_microcode_revision();

+ if ((cpu_has(c, X86_FEATURE_SPEC_CTRL) ||
+      cpu_has(c, X86_FEATURE_AMD_SPEC_CTRL) ||
+      cpu_has(c, X86_FEATURE_AMD_PRED_CMD) ||
+      cpu_has(c, X86_FEATURE_AMD_STIBP)) && bad_spectre_microcode(c)) {
+ pr_warn("Intel Spectre v2 broken microcode detected; disabling SPEC_CTRL\n");
+ clear_cpu_cap(c, X86_FEATURE_SPEC_CTRL);
+ clear_cpu_cap(c, X86_FEATURE_STIBP);
+ clear_cpu_cap(c, X86_FEATURE_AMD_SPEC_CTRL);
+ clear_cpu_cap(c, X86_FEATURE_AMD_PRED_CMD);
+ clear_cpu_cap(c, X86_FEATURE_AMD_STIBP);
+ }
+
/*
* Atom erratum AAE44/AAF40/AAG38/AAH41:
*

Attachments:

smime.p7s (5.09 kB)

2018-01-23 22:38:54

by Tom Lendacky

[permalink] [raw]

Subject: Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation

On 1/23/2018 10:20 AM, Woodhouse, David wrote:
> On Tue, 2018-01-23 at 10:12 -0600, Tom Lendacky wrote:
>>
>>>> +.macro UNRESTRICT_IB_SPEC
>>>> +    ALTERNATIVE "jmp .Lskip_\@", "", X86_FEATURE_IBRS
>>>> +    PUSH_MSR_REGS
>>>> +    WRMSR_ASM $MSR_IA32_SPEC_CTRL, $0, $0
>>>
>> I think you should be writing 2, not 0, since I'm reasonably
>> confident that we want STIBP on. Can you explain why you're writing
>> 0?
>>
>> Do we want to talk about STIBP in general? Should it be (yet another)
>> boot option to enable or disable? If there is STIBP support without
>> IBRS support, it could be a set and forget at boot time.
>
> We haven't got patches which enable STIBP in general. The kernel itself
> is safe either way with retpoline, or because IBRS implies STIBP too
> (that is, there's no difference between writing 1 and 3).
>
> So STIBP is purely about protecting userspace processes from one
> another, and VM guests from one another, when they run on HT siblings.
>
> There's an argument that there are so many other information leaks
> between HT siblings that we might not care. Especially as it's hard to
> *tell* when you're scheduling, whether you trust all the processes (or
> guests) on your HT siblings right now... let alone later when
> scheduling another process if you need to *now* set STIBP on a sibling
> which is no longer save from this process now running.
>
> I'm not sure we want to set STIBP *unconditionally* either because of
> the performance implications.
>
> For IBRS we had an answer and it was just ugly. For STIBP we don't
> actually have an answer for "how do we use this?". Do we?

Not sure. Maybe to start, the answer might be to allow it to be set for
the ultra-paranoid, but in general don't enable it by default. Having it
enabled would be an alternative to someone deciding to disable SMT, since
that would have even more of a performance impact.

Thanks,
Tom

>
>

2018-01-23 22:50:52

by Andi Kleen

[permalink] [raw]

Subject: Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation

> Not sure. Maybe to start, the answer might be to allow it to be set for
> the ultra-paranoid, but in general don't enable it by default. Having it
> enabled would be an alternative to someone deciding to disable SMT, since
> that would have even more of a performance impact.

I agree. A reasonable strategy would be to only enable it for
processes that have dumpable disabled. This should be already set for
high value processes like GPG, and allows others to opt-in if
they need to.

-Andi

2018-01-23 23:20:08

On Tue, 2018-01-23 at 17:00 -0800, Andy Lutomirski wrote:
> On Tue, Jan 23, 2018 at 4:47 PM, Tim Chen <[email protected]> wrote:
> >
> > On 01/23/2018 03:14 PM, Woodhouse, David wrote:
> > >
> > > On Tue, 2018-01-23 at 14:49 -0800, Andi Kleen wrote:
> > > >
> > > > >
> > > > > Not sure.  Maybe to start, the answer might be to allow it to be set for
> > > > > the ultra-paranoid, but in general don't enable it by default.  Having it
> > > > > enabled would be an alternative to someone deciding to disable SMT, since
> > > > > that would have even more of a performance impact.
> > > > I agree. A reasonable strategy would be to only enable it for
> > > > processes that have dumpable disabled. This should be already set for
> > > > high value processes like GPG, and allows others to opt-in if
> > > > they need to.
> > > That seems to make sense, and I think was the solution we were
> > > approaching for IBPB on context switch too, right?
> > >
> > > Are we generally agreed on dumpable as the criterion for both of those?
> > >
> > It is a reasonable approach.  Let a process who needs max security
> > opt in with disabled dumpable. It can have a flush with IBPB clear before
> > starting to run, and have STIBP set while running.
> >
> Do we maybe want a separate opt in?  I can easily imagine things like
> web browsers that *don't* want to be non-dumpable but do want this
> opt-in.

This is to protect you from another local process running on a HT
sibling. Not the kind of thing that web browsers are normally worrying
about.

> Also, what's the performance hit of STIBP?

Varies per CPU generation, but generally approaching that of full IBRS
I think? I don't recall looking at this specifically (since we haven't
actually used it for this yet).

Attachments:

smime.p7s (5.09 kB)

2018-01-24 02:00:33

by Van De Ven, Arjan

[permalink] [raw]

Subject: RE: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation

> > It is a reasonable approach. Let a process who needs max security
> > opt in with disabled dumpable. It can have a flush with IBPB clear before
> > starting to run, and have STIBP set while running.
> >
>
> Do we maybe want a separate opt in? I can easily imagine things like
> web browsers that *don't* want to be non-dumpable but do want this
> opt-in.

eventually we need something better. Probably in addition.
dumpable is used today for things that want this.

>
> Also, what's the performance hit of STIBP?

pretty steep, but it depends on the CPU generation, for some it's cheaper than others. (yes I realize this is a vague answer, but the range is really from just about zero to oh my god)

I'm not a fan of doing this right now to be honest. We really need to not piece meal some of this, and come up with a better concept of protection on a higher level.
For example, you mention web browsers, but the threat model for browsers is generally internet content. For V2 to work you need to get some "evil pointer" into the app from the observer and browsers usually aren't doing that.
The most likely user would be some software-TPM-like service that has magic keys.

And for keys we want something else... we want an madvice() sort of thing that does a few things, like equivalent of mlock (so the key does not end up in swap), not having the page (but potentially the rest) end up in core dumps, and the kernel making sure that if the program exits (say for segv) that the key page gets zeroed before going into the free pool. Once you do that as feature, making the key speculation safe is not too hard (intel and arm have cpu options to mark pages for that)

2018-01-24 03:26:14

by Andy Lutomirski

[permalink] [raw]

Subject: Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation

> On Jan 23, 2018, at 5:59 PM, Van De Ven, Arjan <[email protected]> wrote:
>
>
>>> It is a reasonable approach. Let a process who needs max security
>>> opt in with disabled dumpable. It can have a flush with IBPB clear before
>>> starting to run, and have STIBP set while running.
>>>
>>
>> Do we maybe want a separate opt in? I can easily imagine things like
>> web browsers that *don't* want to be non-dumpable but do want this
>> opt-in.
>
> eventually we need something better. Probably in addition.
> dumpable is used today for things that want this.
>
>>
>> Also, what's the performance hit of STIBP?
>
> pretty steep, but it depends on the CPU generation, for some it's cheaper than others. (yes I realize this is a vague answer, but the range is really from just about zero to oh my god)
>
> I'm not a fan of doing this right now to be honest. We really need to not piece meal some of this, and come up with a better concept of protection on a higher level.
> For example, you mention web browsers, but the threat model for browsers is generally internet content. For V2 to work you need to get some "evil pointer" into the app from the observer and browsers usually aren't doing that.
> The most likely user would be some software-TPM-like service that has magic keys.
>
> And for keys we want something else... we want an madvice() sort of thing that does a few things, like equivalent of mlock (so the key does not end up in swap),

I'd love to see a slight variant: encrypt that page against some ephemeral key if it gets swapped.

> not having the page (but potentially the rest) end up in core dumps, and the kernel making sure that if the program exits (say for segv) that the key page gets zeroed before going into the free pool. Once you do that as feature, making the key speculation safe is not too hard (intel and arm have cpu options to mark pages for that)
>
>

How do we do that on Intel? Make it UC?

2018-01-24 08:49:03

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure

On Tue, Jan 23, 2018 at 08:58:36PM +0000, David Woodhouse wrote:

> +static const struct sku_microcode spectre_bad_microcodes[] = {
> + { INTEL_FAM6_KABYLAKE_DESKTOP, 0x0B, 0x80 },
> + { INTEL_FAM6_KABYLAKE_MOBILE, 0x0A, 0x80 },
> + { INTEL_FAM6_KABYLAKE_MOBILE, 0x0A, 0x80 },
> + { INTEL_FAM6_KABYLAKE_MOBILE, 0x09, 0x80 },
> + { INTEL_FAM6_KABYLAKE_DESKTOP, 0x09, 0x80 },
> + { INTEL_FAM6_SKYLAKE_X, 0x04, 0x0200003C },
> + { INTEL_FAM6_SKYLAKE_MOBILE, 0x03, 0x000000C2 },
> + { INTEL_FAM6_SKYLAKE_DESKTOP, 0x03, 0x000000C2 },
> + { INTEL_FAM6_BROADWELL_CORE, 0x04, 0x28 },
> + { INTEL_FAM6_BROADWELL_GT3E, 0x01, 0x0000001B },
> + { INTEL_FAM6_HASWELL_ULT, 0x01, 0x21 },
> + { INTEL_FAM6_HASWELL_GT3E, 0x01, 0x18 },
> + { INTEL_FAM6_HASWELL_CORE, 0x03, 0x23 },
> + { INTEL_FAM6_IVYBRIDGE_X, 0x04, 0x42a },
> + { INTEL_FAM6_HASWELL_X, 0x02, 0x3b },
> + { INTEL_FAM6_HASWELL_X, 0x04, 0x10 },
> + { INTEL_FAM6_HASWELL_CORE, 0x03, 0x23 },
> + { INTEL_FAM6_BROADWELL_XEON_D, 0x02, 0x14 },
> + { INTEL_FAM6_BROADWELL_XEON_D, 0x03, 0x7000011 },
> + { INTEL_FAM6_BROADWELL_GT3E, 0x01, 0x0000001B },
> + /* For 406F1 Intel says "0x25, 0x23" while VMware says 0x0B000025
> + ?* and a real CPU has a firmware in the 0x0B0000xx range. So: */
> + { INTEL_FAM6_BROADWELL_X, 0x01, 0x0b000025 },
> + { INTEL_FAM6_KABYLAKE_DESKTOP, 0x09, 0x80 },
> + { INTEL_FAM6_SKYLAKE_X, 0x03, 0x100013e },
> + { INTEL_FAM6_SKYLAKE_X, 0x04, 0x200003c },
> +};

Typically tglx likes to use x86_match_cpu() for these things; see also
commit: bd9240a18edfb ("x86/apic: Add TSC_DEADLINE quirk due to
errata").

> +
> +static int bad_spectre_microcode(struct cpuinfo_x86 *c)
> +{
> + int i;
> +
> + for (i = 0; i < ARRAY_SIZE(spectre_bad_microcodes); i++) {
> + if (c->x86_model == spectre_bad_microcodes[i].model &&
> + ????c->x86_mask == spectre_bad_microcodes[i].stepping)
> + return (c->microcode <= spectre_bad_microcodes[i].microcode);
> + }
> + return 0;
> +}

The above is Intel only, you should check vendor too I think.

> ?static void early_init_intel(struct cpuinfo_x86 *c)
> ?{
> ? u64 misc_enable;
> @@ -122,6 +173,18 @@ static void early_init_intel(struct cpuinfo_x86 *c)
> ? if (c->x86 >= 6 && !cpu_has(c, X86_FEATURE_IA64))
> ? c->microcode = intel_get_microcode_revision();
> ?
> + if ((cpu_has(c, X86_FEATURE_SPEC_CTRL) ||
> + ?????cpu_has(c, X86_FEATURE_AMD_SPEC_CTRL) ||
> + ?????cpu_has(c, X86_FEATURE_AMD_PRED_CMD) ||
> + ?????cpu_has(c, X86_FEATURE_AMD_STIBP)) && bad_spectre_microcode(c)) {
> + pr_warn("Intel Spectre v2 broken microcode detected; disabling SPEC_CTRL\n");
> + clear_cpu_cap(c, X86_FEATURE_SPEC_CTRL);
> + clear_cpu_cap(c, X86_FEATURE_STIBP);
> + clear_cpu_cap(c, X86_FEATURE_AMD_SPEC_CTRL);
> + clear_cpu_cap(c, X86_FEATURE_AMD_PRED_CMD);
> + clear_cpu_cap(c, X86_FEATURE_AMD_STIBP);
> + }

And since its Intel only, what are those AMD features doing there?

2018-01-24 09:05:00

On Wed, 2018-01-24 at 09:47 +0100, Peter Zijlstra wrote:
>
> Typically tglx likes to use x86_match_cpu() for these things; see also
> commit: bd9240a18edfb ("x86/apic: Add TSC_DEADLINE quirk due to
> errata").

Ewww.

static u32 hsx_deadline_rev(void)
{
switch (boot_cpu_data.x86_mask) {
case 0x02: return 0x3a; /* EP */
case 0x04: return 0x0f; /* EX */
}

return ~0U;
}
...
static const struct x86_cpu_id deadline_match[] = {
DEADLINE_MODEL_MATCH_FUNC( INTEL_FAM6_HASWELL_X,        hsx_deadline_rev),
DEADLINE_MODEL_MATCH_REV ( INTEL_FAM6_BROADWELL_X,      0x0b000020),
DEADLINE_MODEL_MATCH_FUNC( INTEL_FAM6_BROADWELL_XEON_D, bdx_deadline_rev),
DEADLINE_MODEL_MATCH_REV ( INTEL_FAM6_SKYLAKE_X,        0x02000014),
...

       /*
        * Function pointers will have the MSB set due to address layout,
        * immediate revisions will not.
        */
       if ((long)m->driver_data < 0)
               rev = ((u32 (*)(void))(m->driver_data))();
       else
               rev = (u32)m->driver_data;

EWWWW!

Shan't.

Attachments:

smime.p7s (5.09 kB)

2018-01-24 12:29:57

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure

On Wed, Jan 24, 2018 at 12:14:51PM +0000, David Woodhouse wrote:
> On Wed, 2018-01-24 at 09:47 +0100, Peter Zijlstra wrote:
> >
> > Typically tglx likes to use x86_match_cpu() for these things; see also
> > commit: bd9240a18edfb ("x86/apic: Add TSC_DEADLINE quirk due to
> > errata").
>
> Ewww.
>
> static u32 hsx_deadline_rev(void)
> {
> ? ? ? ?switch (boot_cpu_data.x86_mask) {
> ? ? ? ?case 0x02: return 0x3a; /* EP */
> ? ? ? ?case 0x04: return 0x0f; /* EX */
> ? ? ? ?}
>
> ? ? ? ?return ~0U;
> }
> ...
> static const struct x86_cpu_id deadline_match[] = {
> ? ? ? ?DEADLINE_MODEL_MATCH_FUNC( INTEL_FAM6_HASWELL_X,????????hsx_deadline_rev),
> ? ? ? ?DEADLINE_MODEL_MATCH_REV ( INTEL_FAM6_BROADWELL_X,??????0x0b000020),
> ? ? ? ?DEADLINE_MODEL_MATCH_FUNC( INTEL_FAM6_BROADWELL_XEON_D, bdx_deadline_rev),
> ? ? ? ?DEADLINE_MODEL_MATCH_REV ( INTEL_FAM6_SKYLAKE_X,????????0x02000014),
> ...
>
> ???????/*
> ????????* Function pointers will have the MSB set due to address layout,
> ????????* immediate revisions will not.
> ????????*/
> ???????if ((long)m->driver_data < 0)
> ???????????????rev = ((u32 (*)(void))(m->driver_data))();
> ???????else
> ???????????????rev = (u32)m->driver_data;
>
> EWWWW!
>

Yes :/

We could look at extending x86_cpu_id and x86_match_cpu with a stepping
option I suppose, but that might be lots of churn.

Thomas?

2018-01-24 12:31:06

by David Woodhouse

[permalink] [raw]

Subject: Re: [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure

On Wed, 2018-01-24 at 08:49 -0200, Henrique de Moraes Holschuh wrote:
> On Wed, 24 Jan 2018, David Woodhouse wrote:
> >
> > I'm kind of tempted to turn it into a whitelist just by adding 1 to the
> > microcode revision in each table entry. Sure, that N+1 might be another
> > microcode build that also has issues but never saw the light of day...
> Watch out for the (AFAIK) still not properly documented where it should
> be (i.e. the microcode chapter of the Intel SDM) weirdness in Skylake+
> microcode revision. Actually, this is related to SGX, so anything that
> has SGX.
>
> When it has SGX inside, Intel will release microcode only with even
> revision numbers, but the processor may report it as odd (and will do so
> by subtracting 1, so microcode 0xb0 is the same as microcode 0xaf) when
> the update is loaded by the processor itself from FIT (as opposed as
> being loaded by WRMSR from BIOS/UEFI/OS).
>
> So, you could see N-1 from within Linux if we did not update the
> microcode, and fail to trigger a whitelist (or mistrigger a blacklist).

That's OK. If they ship a fixed 0x0200003E firmware for SKX, for
example, which appears as 0x0200003D when it's loaded from FIT, that's
still >= 0x0200003C *and* !(<0x0200003D) if we were to do that.

In fact, the code for the "whitelist X+1" vs. "blacklist X" approach is
*entirely* equivalent; it's purely a cosmetic change. Because

!(< X) ≡ ≥ (X+1)

The *real* change here is that for ∀ SKU, we are being asked to
blacklist all microcode revisions <= 0xFFFFFFFF¹ for now, and change
that only once new microcode is actually released. Every time, and then
get people to rebuild their kernels because they can *use* the features
from the new microcode.

¹(OK, *there's* a functional difference between whitelist and blacklist
approach. But we'll never actually see 0xffffffff so that's not
important right now :)

Attachments:

smime.p7s (5.09 kB)

2018-01-24 12:59:59

[permalink] [raw]

Subject: Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation

On Thu, 2018-01-25 at 18:11 -0800, Liran Alon wrote:
>
> P.S:
> It seems to me that all these issues could be resolved completely at
> hardware in future CPUs if BTB/BHB/RSB entries were tagged with
> prediction-mode (or similar metadata). It will be nice if Intel/AMD
> could share if that is the planned long-term solution instead of
> IBRS-all-the-time.

IBRS-all-the-time is tagging with the ring and VMX root/non-root mode,
it seems. That much they could slip into the upcoming generation of
CPUs. And it's supposed to be fast¹; none of the dirty hacks in
microcode that they needed to implement the first-generation IBRS.

But we still need to tag with ASID/VMID and do proper flushing for
those, before we can completely ditch the need to do IBPB at the right
times.

Reading between the lines, I don't think they could add *that* without
stopping the fabs for a year or so while they go back to the drawing
board. But yes, I sincerely hope they *are* planning to do it, and
expose a 'SPECTRE_NO' bit in IA32_ARCH_CAPABILITIES, as soon as is
humanly possible.

¹ Fast enough that we'll want to use it and ALTERNATIVE out the
retpolines.

Attachments:

smime.p7s (5.09 kB)

2018-01-26 09:12:39

[permalink] [raw]

Subject: Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure

On Mon, 2018-01-29 at 16:23 -0800, Linus Torvalds wrote:
> And the "big hammer" approach to spectre would seem to
> be to just make sure the BTB and RSB are flushed at vmexit time - and
> even then you might decide that you really want to just move it to
> vmenter time, and only do it if the VM has changed since last time
> (per CPU).

The IBPB which flushes the BTB is *expensive*; we really want to reduce
the amount we do that. For VM guests it's not so bad — we do it only on
VMPTRLD which is sufficient to ensure it's done between running one
vCPU and the next. And if vCPUs are pinned to pCPUs that means we
basically never do it.

Even for userspace we've mostly settled on a heuristic where we only do
the IBPB flush for non-dumpable processes, precisely because it's so
expensive.

> Why do you even _care_ about the guest, and how it acts wrt Skylake?
> What you should care about is not so much the guests (which do their
> own thing) but protect guests from each other, no?

Well yes, that's the part we had to fix before anyone was allowed to
sleep. But customers kind of care about security *within* their part
too, and we care about customers. :)

Sure, the cloud *enables* a model where a given VM guest is just a
single-tenant standalone compute job, and the kernel is effectively
just a library to provide services to the application. In some sense
it's all about the app, and you might as well be using uCLinux from the
security point of view. So *some* (perhaps even *many*) guests don't
need to care.

But there are still plenty who *do* need to care, for various reasons.

Attachments:

smime.p7s (5.09 kB)

2018-01-30 11:37:29

On Sun, 2018-02-04 at 19:43 +0100, Thomas Gleixner wrote:
>
> __x86_return_thunk would look like this:
>
> __x86_return_thunk:
>         testl   $0xf, PER_CPU_VAR(call_depth)
>         jnz     1f
>         stuff_rsb
>    1:
>         decl    PER_CPU_VAR(call_depth)
>         ret
>
> The call_depth variable would be reset on context switch.

Note that the 'jnz' can be predicted taken there, allowing the CPU to
speculate all the way to the 'ret'... and beyond.

Attachments:

smime.p7s (5.09 kB)

2018-02-06 09:16:07

by David Woodhouse

[permalink] [raw]

Subject: Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation

On Sun, 2018-02-04 at 19:43 +0100, Thomas Gleixner wrote:
> Yet another possibility is to avoid the function entry and accouting magic
> and use the generic gcc return thunk:
>
> __x86_return_thunk:
>         call L2
> L1:
>         pause
>         lfence
>         jmp L1
> L2:
>         lea 8(%rsp), %rsp|lea 4(%esp), %esp
>         ret
>
> which basically refills the RSB on every return. That can be inline or
> extern, but in both cases we should be able to patch it out.
>
> I have no idea how that affects performance, but it might be worthwhile to
> experiment with that.

That was what I had in mind when I asked HJ to add -mfunction-return.

I suspect the performance hit would be significant because it would
cause a prediction miss on *every* return.

But as I said, let's implement what we can without IBRS for Skylake,
then we can compare the two options for performance, security coverage
and general fugliness.

Attachments:

smime.p7s (5.09 kB)