Hi Eric,
On Thu, Aug 25, 2022 at 11:15:58PM +0000, Eric Biggers wrote:
> I'm wondering if people are aware of this issue, and whether anyone has any
> thoughts on whether/where the kernel should be setting these new CPU flags.
> There don't appear to have been any prior discussions about this. (Thanks to
Maybe it should be set unconditionally now, until we figure out how to
make it more granular.
In terms of granularity, I saw other folks suggesting making it per-task
(so, presumably, a prctl() knob), and others mentioning doing it just
for kernel crypto. For the latter, I guess the crypto API could set it
inside of its abstractions, and the various lib/crypto APIs could set it
at invocation time. I wonder, though, what's the cost of
enabling/disabling it? Would we in fact need a kind of lazy-deferred
disabling, like we have with kernel_fpu_end()? I also wonder what
crypto-adjacent code might wind up being missed if we're going function
by function. Like, obviously we'd set this for crypto_memneq, but what
about potential unprotected `==` of ID numbers that could leak some info
in various protocols? What other subtle nearby code should we be
thinking about, that relies on constant time logic but isn't neatly
folded inside a crypto_do_something() function?
Jason
On Mon, Aug 29, 2022 at 12:39:53PM -0400, Jason A. Donenfeld wrote:
> Hi Eric,
>
> On Thu, Aug 25, 2022 at 11:15:58PM +0000, Eric Biggers wrote:
> > I'm wondering if people are aware of this issue, and whether anyone has any
> > thoughts on whether/where the kernel should be setting these new CPU flags.
> > There don't appear to have been any prior discussions about this. (Thanks to
>
> Maybe it should be set unconditionally now, until we figure out how to
> make it more granular.
>
> In terms of granularity, I saw other folks suggesting making it per-task
> (so, presumably, a prctl() knob), and others mentioning doing it just
> for kernel crypto. For the latter, I guess the crypto API could set it
> inside of its abstractions, and the various lib/crypto APIs could set it
> at invocation time. I wonder, though, what's the cost of
> enabling/disabling it? Would we in fact need a kind of lazy-deferred
> disabling, like we have with kernel_fpu_end()? I also wonder what
> crypto-adjacent code might wind up being missed if we're going function
> by function. Like, obviously we'd set this for crypto_memneq, but what
> about potential unprotected `==` of ID numbers that could leak some info
> in various protocols? What other subtle nearby code should we be
> thinking about, that relies on constant time logic but isn't neatly
> folded inside a crypto_do_something() function?
>
I'd much prefer it being set unconditionally by default as well, as making
everyone (both kernel and userspace) turn it on and off constantly would be a
nightmare.
Note that Intel's documentation says that CPUs before Ice Lake behave as if
DOITM is always set:
"For Intel® Core™ family processors based on microarchitectures before Ice
Lake and Intel Atom® family processors based on microarchitectures before
Gracemont that do not enumerate IA32_UARCH_MISC_CTL, developers may assume
that the instructions listed here operate as if DOITM is enabled."
(It's a bit ambiguous, as it leaves the door open to IA32_UARCH_MISC_CTL being
retroactively added to old CPUs. But I assume that hasn't actually happened.)
So I think the logical approach is to unconditionally set DOITM by default, to
fix this CPU bug in Ice Lake and later and just bring things back to the way
they were in CPUs before Ice Lake. With that as a baseline, we can then discuss
whether it's useful to provide ways to re-enable this CPU bug / "feature", for
people who want to get the performance boost (if one actually exists) of data
dependent timing after carefully assessing the risks.
The other way around, of making everything insecure by default, seems like a
really bad idea.
- Eric
On Mon, Aug 29, 2022 at 12:39:53PM -0400, Jason A. Donenfeld wrote:
> In terms of granularity, I saw other folks suggesting making it per-task
> (so, presumably, a prctl() knob), and others mentioning doing it just
> for kernel crypto. For the latter, I guess the crypto API could set it
> inside of its abstractions, and the various lib/crypto APIs could set it
> at invocation time. I wonder, though, what's the cost of
> enabling/disabling it? Would we in fact need a kind of lazy-deferred
> disabling, like we have with kernel_fpu_end()? I also wonder what
> crypto-adjacent code might wind up being missed if we're going function
> by function. Like, obviously we'd set this for crypto_memneq, but what
> about potential unprotected `==` of ID numbers that could leak some info
> in various protocols? What other subtle nearby code should we be
> thinking about, that relies on constant time logic but isn't neatly
> folded inside a crypto_do_something() function?
Another random note on this: I would hope that setting that MSR
represents a speculation barrier or general instruction stream barrier,
so that you can't do something naughty with the scheduler to toggle it
rapidly and measure crypto timings somehow.
On Mon, Aug 29, 2022 at 06:08:07PM +0000, Eric Biggers wrote:
> I'd much prefer it being set unconditionally by default as well, as making
> everyone (both kernel and userspace) turn it on and off constantly would be a
> nightmare.
>
> Note that Intel's documentation says that CPUs before Ice Lake behave as if
> DOITM is always set:
>
> "For Intel® Core™ family processors based on microarchitectures before Ice
> Lake and Intel Atom® family processors based on microarchitectures before
> Gracemont that do not enumerate IA32_UARCH_MISC_CTL, developers may assume
> that the instructions listed here operate as if DOITM is enabled."
>
> (It's a bit ambiguous, as it leaves the door open to IA32_UARCH_MISC_CTL being
> retroactively added to old CPUs. But I assume that hasn't actually happened.)
>
> So I think the logical approach is to unconditionally set DOITM by default, to
> fix this CPU bug in Ice Lake and later and just bring things back to the way
> they were in CPUs before Ice Lake. With that as a baseline, we can then discuss
> whether it's useful to provide ways to re-enable this CPU bug / "feature", for
> people who want to get the performance boost (if one actually exists) of data
> dependent timing after carefully assessing the risks.
>
> The other way around, of making everything insecure by default, seems like a
> really bad idea.
Right. It's actually kind of surprising that Intel didn't already do
this by default. Sure, maybe the Intel manual never explicitly
guaranteed constant time, but a heck of a lot of code relies on that
being the case.
Jason
On 8/29/22 09:39, Jason A. Donenfeld wrote:
> On Thu, Aug 25, 2022 at 11:15:58PM +0000, Eric Biggers wrote:
>> I'm wondering if people are aware of this issue, and whether anyone has any
>> thoughts on whether/where the kernel should be setting these new CPU flags.
>> There don't appear to have been any prior discussions about this. (Thanks to
> Maybe it should be set unconditionally now, until we figure out how to
> make it more granular.
Personally, I'm in this camp as well. Let's be safe and set it by
default. There's also this tidbit in the Intel docs (and chopping out a
bunch of the noise):
(On) processors based on microarchitectures before Ice Lake ...
the instructions listed here operate as if DOITM is enabled.
IOW, setting DOITM=0 isn't going back to the stone age. At worst, I'd
guess that you're giving up some optimization that only shows up in very
recent CPUs in the first place.
If folks want DOITM=1 on their snazzy new CPUs, then they came come with
performance data to demonstrate the gain they'll get from adding kernel
code to get DOITM=1. There are a range of ways we could handle it, all
the way from adding a command-line parameter to per-task management.
Anybody disagree?
On Tue, Aug 30, 2022 at 07:25:29AM -0700, Dave Hansen wrote:
> On 8/29/22 09:39, Jason A. Donenfeld wrote:
> > On Thu, Aug 25, 2022 at 11:15:58PM +0000, Eric Biggers wrote:
> >> I'm wondering if people are aware of this issue, and whether anyone has any
> >> thoughts on whether/where the kernel should be setting these new CPU flags.
> >> There don't appear to have been any prior discussions about this. (Thanks to
> > Maybe it should be set unconditionally now, until we figure out how to
> > make it more granular.
>
> Personally, I'm in this camp as well. Let's be safe and set it by
> default. There's also this tidbit in the Intel docs (and chopping out a
> bunch of the noise):
>
> (On) processors based on microarchitectures before Ice Lake ...
> the instructions listed here operate as if DOITM is enabled.
>
> IOW, setting DOITM=0 isn't going back to the stone age. At worst, I'd
> guess that you're giving up some optimization that only shows up in very
> recent CPUs in the first place.
>
> If folks want DOITM=1 on their snazzy new CPUs, then they came come with
> performance data to demonstrate the gain they'll get from adding kernel
> code to get DOITM=1. There are a range of ways we could handle it, all
> the way from adding a command-line parameter to per-task management.
>
> Anybody disagree?
It's not my preferred option for arm64 but I admit the same reasoning
could equally apply to us. If some existing crypto libraries relied on
data independent timing for current CPUs but newer ones (with the DIT
feature) come up with more aggressive, data-dependent optimisations,
they may be caught off-guard. That said the ARM architecture spec never
promised any timing, that's a micro-architecture detail and not all
implementations are done by ARM Ltd. So I can't really tell what's out
there.
So I guess knobs for finer grained control would do, at least a sysctl
(or cmdline) to turn it on/off globally and maybe a prctl() for user. We
don't necessarily need this on arm64 but if x86 adds one, we might as
well wire it up.
--
Catalin
On Thu, 15 Sept 2022 at 19:52, Catalin Marinas <[email protected]> wrote:
>
> On Tue, Aug 30, 2022 at 07:25:29AM -0700, Dave Hansen wrote:
> > On 8/29/22 09:39, Jason A. Donenfeld wrote:
> > > On Thu, Aug 25, 2022 at 11:15:58PM +0000, Eric Biggers wrote:
> > >> I'm wondering if people are aware of this issue, and whether anyone has any
> > >> thoughts on whether/where the kernel should be setting these new CPU flags.
> > >> There don't appear to have been any prior discussions about this. (Thanks to
> > > Maybe it should be set unconditionally now, until we figure out how to
> > > make it more granular.
> >
> > Personally, I'm in this camp as well. Let's be safe and set it by
> > default. There's also this tidbit in the Intel docs (and chopping out a
> > bunch of the noise):
> >
> > (On) processors based on microarchitectures before Ice Lake ...
> > the instructions listed here operate as if DOITM is enabled.
> >
> > IOW, setting DOITM=0 isn't going back to the stone age. At worst, I'd
> > guess that you're giving up some optimization that only shows up in very
> > recent CPUs in the first place.
> >
> > If folks want DOITM=1 on their snazzy new CPUs, then they came come with
> > performance data to demonstrate the gain they'll get from adding kernel
> > code to get DOITM=1. There are a range of ways we could handle it, all
> > the way from adding a command-line parameter to per-task management.
> >
> > Anybody disagree?
>
> It's not my preferred option for arm64 but I admit the same reasoning
> could equally apply to us. If some existing crypto libraries relied on
> data independent timing for current CPUs but newer ones (with the DIT
> feature) come up with more aggressive, data-dependent optimisations,
> they may be caught off-guard. That said the ARM architecture spec never
> promised any timing, that's a micro-architecture detail and not all
> implementations are done by ARM Ltd. So I can't really tell what's out
> there.
>
> So I guess knobs for finer grained control would do, at least a sysctl
> (or cmdline) to turn it on/off globally and maybe a prctl() for user. We
> don't necessarily need this on arm64 but if x86 adds one, we might as
> well wire it up.
>
With all the effort spent on plugging timing leaks in the kernel over
the past couple of years, not enabling this at EL1 seems silly, no?
Why would we ever permit privileged code to exhibit data dependent
timing variances?
As for a prctl() for user space - wouldn't it make more sense to
enable this by default, and add a hwcap so user space can clear DIT
directly if it feels the need to do so?