2018-11-18 20:44:20

by Linus Torvalds

[permalink] [raw]
Subject: STIBP by default.. Revert?

This was marked for stable, and honestly, nowhere in the discussion
did I see any mention of just *how* bad the performance impact of this
was.

When performance goes down by 50% on some loads, people need to start
asking themselves whether it was worth it. It's apparently better to
just disable SMT entirely, which is what security-conscious people do
anyway.

So why do that STIBP slow-down by default when the people who *really*
care already disabled SMT?

I think we should use the same logic as for L1TF: we default to
something that doesn't kill performance. Warn once about it, and let
the crazy people say "I'd rather take a 50% performance hit than
worry about a theoretical issue".

Linus


2018-11-18 21:58:56

by Jiri Kosina

[permalink] [raw]
Subject: Re: STIBP by default.. Revert?

On Sun, 18 Nov 2018, Linus Torvalds wrote:

> This was marked for stable, and honestly, nowhere in the discussion
> did I see any mention of just *how* bad the performance impact of this
> was.

Frankly, I ran some benchmarks myself, and am seeing very, very
varying/noisy results, which were rather inconclusive across the
individual workloads.

The article someone pointed me to at Phoronix yesterday also was talking
about something between 3% and 20% IIRC.

> When performance goes down by 50% on some loads, people need to start
> asking themselves whether it was worth it. It's apparently better to
> just disable SMT entirely, which is what security-conscious people do
> anyway.
>
> So why do that STIBP slow-down by default when the people who *really*
> care already disabled SMT?

BTW for them, there is no impact at all. And they probably did it because
of crypto libraries being implemented in a way that their observable
operation depends on value of the secrets (tlbleed etc), but this is being
fixed in the said crypto code.

STIBP is only activated on systems with HT on; plus odds are that people
who don't care about spectrev2 already have 'nospectre_v2' on their
command-line, so they are fine as well.

> I think we should use the same logic as for L1TF: we default to
> something that doesn't kill performance. Warn once about it, and let
> the crazy people say "I'd rather take a 50% performance hit than
> worry about a theoretical issue".

So, I think it's as theoretical as any other spectrev2 (only with the
extra "HT" condition added on top).

Namely, I think that scenarios such as "one browser tab's javascript is
attacking other browser's tab private data on a sibling" is rather a
practical one, I've seen such demos (not with the sibling condition being
in place, but I don't think that matters that much). The same likely holds
for server threads serving individual requests in separate threads, etc
(but yeah, you need to have proper gadgets there in place already in
server's .text, which makes it much less practical).

For L1TF, the basic argument AFAICS was, that the default situation is
"only special people care about strict isolation between VMs". But this is
isolation between individual processess.

Tim is currently working [1] on a patchset on top of my original
STIBP-enabling commit, that will make STIBP to be used in much smaller
number of cases (taking prctl()-based aproach as one of the possible ones,
similarly as we did for SSBD), and as I indicated in that thread, I think
it should be really considered for this -rc cycle still if it lands in tip
in a reasonable future.

To conclude -- I am quite happy that this finally started to move
(although it's sad that some of it is due to clickbaity article titles,
but whatever), as Intel didn't really provide any patch / guidance (*) in
past ~1 year to those who care about spectrev2 isolation on HT, which is
something I wasn't really feeling comfortable with, and that's why I
submitted the patch.

If we make it opt-in (on kernel cmdline) and issue big fat warning if not
mitigated, fine, but then we're bit incosistent how we deal with
cross-core (IBPB) and cross-thread (STIBP) spectrev2 security in the
kernel code.

If we take Tim's aproach, even better -- there would be means available to
make system secure, although not sacrifying performance by default.

I would not prefer going the plain revert path though, as that leaves no
other option to mitigate rather than turning SMT off, which might possibly
have even *worse* performance numbers depending on workload.

[1] https://lore.kernel.org/lkml/?q=cover.1542418936.git.tim.c.chen%40linux.intel.com

(*) no code to implement STIBP sanely, no recommendation about turning SMT
off at least for some workloads

Thanks,

--
Jiri Kosina
SUSE Labs


2018-11-18 22:02:48

by Linus Torvalds

[permalink] [raw]
Subject: Re: STIBP by default.. Revert?

On Sun, Nov 18, 2018 at 1:49 PM Jiri Kosina <[email protected]> wrote:
>
> > So why do that STIBP slow-down by default when the people who *really*
> > care already disabled SMT?
>
> BTW for them, there is no impact at all.

Right. People who really care about security and are anal about it do
not see *any* advantage of the patch.

But people who aren't that worried suddenly see potentially huge slowdowns.

In other words, the behavior of the patch is basically essentially
exactly the reverse of what you'd want. You penalize the people who
don't even want it and don't care.

> STIBP is only activated on systems with HT on; plus odds are that people
> who don't care about spectrev2 already have 'nospectre_v2' on their
> command-line, so they are fine as well.

I'm talking about *normal* people. People who simply aren't all that
invested in this all. People who just want to get their work done.

> So, I think it's as theoretical as any other spectrev2 (only with the
> extra "HT" condition added on top).

What? No.

It's *way* more theoretical than something like meltdown, which could
be trivially used to get data from another protection domain.

Have you seen any actual realistic attacks for normal human users?
Things where the *kernel* should actually care?

The javascript thing is for the browser to fix up, not for the kernel
to say "now everything should run up to 50% slower".

Linus

2018-11-18 22:04:12

by Willy Tarreau

[permalink] [raw]
Subject: Re: STIBP by default.. Revert?

On Sun, Nov 18, 2018 at 10:49:44PM +0100, Jiri Kosina wrote:
> odds are that people
> who don't care about spectrev2 already have 'nospectre_v2' on their
> command-line, so they are fine as well.

FWIW in our appliances, we never modify the boot loader's cmdline
in field, we only provide new kernel+rootfs images. We've however
disabled the config options for all this class of vulnerabilities.
As long as it remains possible to disable the new ones using config
options only, that's not an issue for me.

Cheers,
Willy

2018-11-18 22:19:38

by Jiri Kosina

[permalink] [raw]
Subject: Re: STIBP by default.. Revert?

On Sun, 18 Nov 2018, Linus Torvalds wrote:

> > So, I think it's as theoretical as any other spectrev2 (only with the
> > extra "HT" condition added on top).
>
> What? No.
>
> It's *way* more theoretical than something like meltdown, which could
> be trivially used to get data from another protection domain.

Oh yeah, I absolutely agree that spectrev2 and Meltdown and completely
different beasts.

> Have you seen any actual realistic attacks for normal human users?
> Things where the *kernel* should actually care?
>
> The javascript thing is for the browser to fix up,

It's probably not just browsers, but anything running JITed sandboxed
code. So the most straightforward way might be the prctl() aproach, where
userspace would claim "I do care about this, please fix it up for me". So
prctl() + perhaps SECCOMP.

Which gets us back to Tim's fixup patch. Do you still prefer the revert,
given the existence of that? I think that if Tim's fixup makes it through
(it's currently missing SECCOMP handling, but that is trivial to add on
top), it might be the best compromise. We'd also have have to make IBPB
obey it to be consistent (and get even a few more % of performance back),
but that's easy as well.

Thanks,

--
Jiri Kosina
SUSE Labs


2018-11-18 22:44:19

by Linus Torvalds

[permalink] [raw]
Subject: Re: STIBP by default.. Revert?

On Sun, Nov 18, 2018 at 2:17 PM Jiri Kosina <[email protected]> wrote:
> Which gets us back to Tim's fixup patch. Do you still prefer the revert,
> given the existence of that?

I don't think the code needs to be reverted, but the *behavior* of
just unconditionally enabling STIBP needs to be reverted.

Because it was clearly way more expensive than people were told.

Linus

2018-11-18 22:45:16

by Dave Hansen

[permalink] [raw]
Subject: Re: STIBP by default.. Revert?


> On Nov 18, 2018, at 2:17 PM, Jiri Kosina <[email protected]> wrote:
>
> It's probably not just browsers, but anything running JITed sandboxed
> code. So the most straightforward way might be the prctl() aproach, where
> userspace would claim "I do care about this, please fix it up for me". So
> prctl() + perhaps SECCOMP.

Yeah, the prctl() shifts the pain to the right place: folks explicitly opting in. Always-on seemed way too draconian to me.

2018-11-18 22:58:52

by Tim Chen

[permalink] [raw]
Subject: Re: STIBP by default.. Revert?

On 11/18/2018 02:36 PM, Linus Torvalds wrote:
> On Sun, Nov 18, 2018 at 2:17 PM Jiri Kosina <[email protected]> wrote:
>> Which gets us back to Tim's fixup patch. Do you still prefer the revert,
>> given the existence of that?
>
> I don't think the code needs to be reverted, but the *behavior* of
> just unconditionally enabling STIBP needs to be reverted.

Yes, in the patchset I proposed on top of Jiri's, the default will
be enabling STIBP only for tasks that ask for STIBP via prctl or
those that are non-dumpable, like sshd.

>
> Because it was clearly way more expensive than people were told.

I did realize the performance implications of making STIBP on for
all tasks. Here's to recap performance impact of STIBP when I posted
my patchset:

"...leaving STIBP on all the time is expensive for certain
applications that have frequent indirect branches. One such application
is perlbench in the SpecInt Rate 2006 test suite which shows a
21% reduction in throughput."


I think if we include Jiri's patchset for 4.20, we should have
my patchset also included to avoid the performance impact on
regular tasks but still allow admin and applications to turn on STIBP if needed.

Thanks.

Tim

2018-11-18 23:24:04

by Tim Chen

[permalink] [raw]
Subject: Re: STIBP by default.. Revert?

On 11/18/2018 02:17 PM, Jiri Kosina wrote:
> On Sun, 18 Nov 2018, Linus Torvalds wrote:
>
>>> So, I think it's as theoretical as any other spectrev2 (only with the
>>> extra "HT" condition added on top).
>>
>> What? No.
>>
>> It's *way* more theoretical than something like meltdown, which could
>> be trivially used to get data from another protection domain.
>
> Oh yeah, I absolutely agree that spectrev2 and Meltdown and completely
> different beasts.
>
>> Have you seen any actual realistic attacks for normal human users?
>> Things where the *kernel* should actually care?
>>
>> The javascript thing is for the browser to fix up,
>
> It's probably not just browsers, but anything running JITed sandboxed
> code. So the most straightforward way might be the prctl() aproach, where
> userspace would claim "I do care about this, please fix it up for me". So
> prctl() + perhaps SECCOMP.
>
> Which gets us back to Tim's fixup patch. Do you still prefer the revert,
> given the existence of that? I think that if Tim's fixup makes it through
> (it's currently missing SECCOMP handling, but that is trivial to add on
> top), it might be the best compromise. We'd also have have to make IBPB
> obey it to be consistent (and get even a few more % of performance back),
> but that's easy as well.
>
> Thanks,
>

I think if Thomas can merge my patchset along with Jiri's, the default option will become
opt in for tasks that want the extra security and we won't lose performance.

Tasks that want extra security will enable that via prctl interface or
making themselves non-dumpable.

Admin that want security for all tasks will enable the strict option on boot to
enable STIBP for all tasks.

So my patchset and Jiri's patchset should probably be merged together, so the
users have a choice of the behavior.

Tim

2018-11-18 23:24:32

by Luck, Tony

[permalink] [raw]
Subject: Re: STIBP by default.. Revert?

On Sun, Nov 18, 2018 at 2:19 PM Jiri Kosina <[email protected]> wrote:
> Which gets us back to Tim's fixup patch. Do you still prefer the revert,
> given the existence of that? I think that if Tim's fixup makes it through
> (it's currently missing SECCOMP handling, but that is trivial to add on
> top), it might be the best compromise. We'd also have have to make IBPB
> obey it to be consistent (and get even a few more % of performance back),
> but that's easy as well.

+1 for Tim's patch. That make us more consistent with how we handled
L1TF (giving the system owner a control knob to decide whether they
want this level of fixup, based on their own analysis of their vulnerability).

-Tony

2018-11-18 23:57:42

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Re: STIBP by default.. Revert?

On 11/19/2018 6:00 AM, Linus Torvalds wrote:
> On Sun, Nov 18, 2018 at 1:49 PM Jiri Kosina <[email protected]> wrote:
>>
>>> So why do that STIBP slow-down by default when the people who *really*
>>> care already disabled SMT?
>>
>> BTW for them, there is no impact at all.
>
> Right. People who really care about security and are anal about it do
> not see *any* advantage of the patch.

In the documentation, AMD officially recommends against this by default, and I can
speak for Intel that our position is that as well: this really must not be on by default.

STIBP and its friends are there as tools, and were created early on as big hammers because
that is all that one can add in a microcode update.. expensive big hammers.

In some ways it's analogous to the "disable caches" bit in CR0. sure it's there as a big hammer,
but you don't set that always just because caches could be used for a side channel

Using these tools much more surgically is fine, if a paranoid task wants it for example,
or when you know you are doing a hard core security transition. But always on? Yikes.


2018-11-19 00:02:03

by Andi Kleen

[permalink] [raw]
Subject: Re: STIBP by default.. Revert?

Linus Torvalds <[email protected]> writes:

> On Sun, Nov 18, 2018 at 2:17 PM Jiri Kosina <[email protected]> wrote:
>> Which gets us back to Tim's fixup patch. Do you still prefer the revert,
>> given the existence of that?
>
> I don't think the code needs to be reverted, but the *behavior* of
> just unconditionally enabling STIBP needs to be reverted.

Actually I think it should be reverted. Yes of course opt-in
is needed.

But also when you opt-in it doesn't make sense to set STIBP
when the sibling is running the same security context, which
is actually a common case. So to even use it properly you would
need some scheduler support to detect these cases and only
enable it then with opt-in. These patches didn't even try to tackle
this problem.

-Andi

2018-11-19 00:03:15

by Andi Kleen

[permalink] [raw]
Subject: Re: STIBP by default.. Revert?

> So my patchset and Jiri's patchset should probably be merged together, so the
> users have a choice of the behavior.

... or delay Jiri's patchkit until the scheduler can actually check
properly for the cases when it is really needed.

-Andi

2018-11-19 00:09:38

by Jiri Kosina

[permalink] [raw]
Subject: Re: STIBP by default.. Revert?

On Sun, 18 Nov 2018, Jiri Kosina wrote:

> It's probably not just browsers, but anything running JITed sandboxed
> code. So the most straightforward way might be the prctl() aproach, where
> userspace would claim "I do care about this, please fix it up for me". So
> prctl() + perhaps SECCOMP.

I've just sent SECCOMP handling as a followup to Tim's set.

I still feel like we should make STIBP and IBPB behavior consistent (in a
sense they should always be used both, or none of them), but that might be
additional 4.21 optimization.

Thanks,

--
Jiri Kosina
SUSE Labs


2018-11-19 03:55:40

by Willy Tarreau

[permalink] [raw]
Subject: Re: STIBP by default.. Revert?

On Sun, Nov 18, 2018 at 02:40:28PM -0800, Tim Chen wrote:
> Tasks that want extra security will enable that via prctl interface or
> making themselves non-dumpable.

Well, you need to be careful regarding the last part of your option
above, because a number of network daemons become non-dumpable by
executing setuid() at boot, and certainly don't want to suffer a
performance loss as a side effect of wanting to become "normally"
secure. I'd suggest to use the prctl only so that it doesn't
randomly hit innocent applications that would only have as a last
resort to turn off reasonable security features to avoid this impact.

Regards,
Willy

2018-11-19 08:40:28

by Ingo Molnar

[permalink] [raw]
Subject: Re: STIBP by default.. Revert?


* Linus Torvalds <[email protected]> wrote:

> This was marked for stable, and honestly, nowhere in the discussion
> did I see any mention of just *how* bad the performance impact of this
> was.

Yeah. This was an oversight - we'll fix it!

> When performance goes down by 50% on some loads, people need to start
> asking themselves whether it was worth it. It's apparently better to
> just disable SMT entirely, which is what security-conscious people do
> anyway.
>
> So why do that STIBP slow-down by default when the people who *really*
> care already disabled SMT?
>
> I think we should use the same logic as for L1TF: we default to
> something that doesn't kill performance. Warn once about it, and let
> the crazy people say "I'd rather take a 50% performance hit than
> worry about a theoretical issue".

Yeah, absolutely.

We'll also require performance measurements in changelogs enabling any
sort of mitigation feature from now on - this requirement was implicit
but 53c613fe6349 flew in under the radar, so it's going to be explicit an
explicit requirement.

Thanks,

Ingo

2018-11-19 08:45:34

by Jiri Kosina

[permalink] [raw]
Subject: Re: STIBP by default.. Revert?

On Mon, 19 Nov 2018, Ingo Molnar wrote:

> > This was marked for stable, and honestly, nowhere in the discussion
> > did I see any mention of just *how* bad the performance impact of this
> > was.
>
> Yeah. This was an oversight - we'll fix it!
>
> > When performance goes down by 50% on some loads, people need to start
> > asking themselves whether it was worth it. It's apparently better to
> > just disable SMT entirely, which is what security-conscious people do
> > anyway.
> >
> > So why do that STIBP slow-down by default when the people who *really*
> > care already disabled SMT?
> >
> > I think we should use the same logic as for L1TF: we default to
> > something that doesn't kill performance. Warn once about it, and let
> > the crazy people say "I'd rather take a 50% performance hit than
> > worry about a theoretical issue".
>
> Yeah, absolutely.
>
> We'll also require performance measurements in changelogs enabling any
> sort of mitigation feature from now on - this requirement was implicit
> but 53c613fe6349 flew in under the radar, so it's going to be explicit an
> explicit requirement.

Do you already have an idea whether you'd proceed with Tim's patchset for
current cycle? If so, great as far as I am concerned. If not, I'll send a
patch that switches this to opt-in via kernel cmdline (+boot-time warning
if not mitigated).

Thanks,

--
Jiri Kosina
SUSE Labs


2018-11-19 12:51:03

by Thomas Gleixner

[permalink] [raw]
Subject: Re: STIBP by default.. Revert?

On Sun, 18 Nov 2018, Tim Chen wrote:
> On 11/18/2018 02:17 PM, Jiri Kosina wrote:
> > On Sun, 18 Nov 2018, Linus Torvalds wrote:
> >
> >>> So, I think it's as theoretical as any other spectrev2 (only with the
> >>> extra "HT" condition added on top).
> >>
> >> What? No.
> >>
> >> It's *way* more theoretical than something like meltdown, which could
> >> be trivially used to get data from another protection domain.
> >
> > Oh yeah, I absolutely agree that spectrev2 and Meltdown and completely
> > different beasts.
> >
> >> Have you seen any actual realistic attacks for normal human users?
> >> Things where the *kernel* should actually care?
> >>
> >> The javascript thing is for the browser to fix up,
> >
> > It's probably not just browsers, but anything running JITed sandboxed
> > code. So the most straightforward way might be the prctl() aproach, where
> > userspace would claim "I do care about this, please fix it up for me". So
> > prctl() + perhaps SECCOMP.
> >
> > Which gets us back to Tim's fixup patch. Do you still prefer the revert,
> > given the existence of that? I think that if Tim's fixup makes it through
> > (it's currently missing SECCOMP handling, but that is trivial to add on
> > top), it might be the best compromise. We'd also have have to make IBPB
> > obey it to be consistent (and get even a few more % of performance back),
> > but that's easy as well.
> >
> I think if Thomas can merge my patchset along with Jiri's, the default
> option will become opt in for tasks that want the extra security and we
> won't lose performance.

If it would be in mergeable state ....

Thanks,

tglx

2018-11-20 16:06:34

by Jiri Kosina

[permalink] [raw]
Subject: Re: Re: STIBP by default.. Revert?

On Mon, 19 Nov 2018, Arjan van de Ven wrote:

> In the documentation, AMD officially recommends against this by default,
> and I can speak for Intel that our position is that as well: this really
> must not be on by default.

Thanks for pointing to the AMD doc, it's indeed clearly stated there.

Is there any chance this could perhaps be added to Intel documentation as
well, so that we avoid cases like this in the future?

The revision 3.0 of Intel doc from may 2018 [1] I am always looking into
doesn't say anything discouraging about STIBP to me.

[1] https://software.intel.com/security-software-guidance/api-app/sites/default/files/336996-Speculative-Execution-Side-Channel-Mitigations.pdf?_gclid=5b78f4d130faf8.22277271-5b78f4d130fb70.17467890&_utm_source=xakep&_utm_campaign=mention177777&_utm_medium=inline&_utm_content=lnk223716354570

Thanks,

--
Jiri Kosina
SUSE Labs


2018-11-20 17:53:27

by Jiri Kosina

[permalink] [raw]
Subject: Re: STIBP by default.. Revert?

On Sun, 18 Nov 2018, Linus Torvalds wrote:

> This was marked for stable, and honestly, nowhere in the discussion
> did I see any mention of just *how* bad the performance impact of this
> was.
>
> When performance goes down by 50% on some loads, people need to start
> asking themselves whether it was worth it. It's apparently better to
> just disable SMT entirely, which is what security-conscious people do
> anyway.
>
> So why do that STIBP slow-down by default when the people who *really*
> care already disabled SMT?
>
> I think we should use the same logic as for L1TF: we default to
> something that doesn't kill performance. Warn once about it, and let
> the crazy people say "I'd rather take a 50% performance hit than
> worry about a theoretical issue".

Just to update status quo here -- Thomas is working on polishing Tim's set
into mergeable state, I've just sent him small addition on top that makes
IBPB also be controlled via the same toggle.

That should make the whole 'spectre v2 userspace-to-userspace' mitigation
control consistent and undestandable. And also give us even few more %
back that are lost due to IBPB as well.

--
Jiri Kosina
SUSE Labs


2018-11-20 23:51:26

by Arjan van de Ven

[permalink] [raw]
Subject: Re: STIBP by default.. Revert?

On 11/20/2018 11:27 PM, Jiri Kosina wrote:
> On Mon, 19 Nov 2018, Arjan van de Ven wrote:
>
>> In the documentation, AMD officially recommends against this by default,
>> and I can speak for Intel that our position is that as well: this really
>> must not be on by default.
>
> Thanks for pointing to the AMD doc, it's indeed clearly stated there.
>
> Is there any chance this could perhaps be added to Intel documentation as
> well, so that we avoid cases like this in the future?

absolutely that's now already in progress;
the doc publishing process is a bit on the long side unfortunately so it won't
be today ;)