2022-10-27 11:45:00

by Ard Biesheuvel

[permalink] [raw]
Subject: [RFC PATCH] arm64: Enable data independent timing (DIT) in the kernel

The ARM architecture revision v8.4 introduces a data independent timing
control (DIT) which can be set at any exception level, and instructs the
CPU to avoid optimizations that may result in a correlation between the
execution time of certain instructions and the value of the data they
operate on.

The DIT bit is part of PSTATE, and is therefore context switched as
usual, given that it becomes part of the saved program state (SPSR) when
taking an exception. We have also defined a hwcap for DIT, and so user
space can discover already whether or nor DIT is available. This means
that, as far as user space is concerned, DIT is wired up and fully
functional.

In the kernel, however, we never bothered with DIT: we disable at it
boot (i.e., INIT_PSTATE_EL1 has DIT cleared) and ignore the fact that we
might run with DIT enabled if user space happened to set it.

Given that running privileged code with DIT disabled on a CPU that
implements support for it may result in a side channel that exposes
privileged data to unprivileged user space processes, let's enable DIT
while running in the kernel if supported by all CPUs.

Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: Jason A. Donenfeld <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Suzuki K Poulose <[email protected]>
Cc: Adam Langley <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/arm64/include/asm/sysreg.h | 3 +++
arch/arm64/kernel/cpufeature.c | 16 ++++++++++++++++
arch/arm64/kernel/entry.S | 3 +++
arch/arm64/tools/cpucaps | 1 +
4 files changed, 23 insertions(+)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 7d301700d1a9..18e065f5130c 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -94,15 +94,18 @@
#define PSTATE_PAN pstate_field(0, 4)
#define PSTATE_UAO pstate_field(0, 3)
#define PSTATE_SSBS pstate_field(3, 1)
+#define PSTATE_DIT pstate_field(3, 2)
#define PSTATE_TCO pstate_field(3, 4)

#define SET_PSTATE_PAN(x) __emit_inst(0xd500401f | PSTATE_PAN | ((!!x) << PSTATE_Imm_shift))
#define SET_PSTATE_UAO(x) __emit_inst(0xd500401f | PSTATE_UAO | ((!!x) << PSTATE_Imm_shift))
#define SET_PSTATE_SSBS(x) __emit_inst(0xd500401f | PSTATE_SSBS | ((!!x) << PSTATE_Imm_shift))
+#define SET_PSTATE_DIT(x) __emit_inst(0xd500401f | PSTATE_DIT | ((!!x) << PSTATE_Imm_shift))
#define SET_PSTATE_TCO(x) __emit_inst(0xd500401f | PSTATE_TCO | ((!!x) << PSTATE_Imm_shift))

#define set_pstate_pan(x) asm volatile(SET_PSTATE_PAN(x))
#define set_pstate_uao(x) asm volatile(SET_PSTATE_UAO(x))
+#define set_pstate_dit(x) asm volatile(SET_PSTATE_DIT(x))
#define set_pstate_ssbs(x) asm volatile(SET_PSTATE_SSBS(x))

#define __SYS_BARRIER_INSN(CRm, op2, Rt) \
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 6062454a9067..273a74df24fe 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2077,6 +2077,11 @@ static void cpu_trap_el0_impdef(const struct arm64_cpu_capabilities *__unused)
sysreg_clear_set(sctlr_el1, 0, SCTLR_EL1_TIDCP);
}

+static void cpu_enable_dit(const struct arm64_cpu_capabilities *__unused)
+{
+ set_pstate_dit(1);
+}
+
/* Internal helper functions to match cpu capability type */
static bool
cpucap_late_cpu_optional(const struct arm64_cpu_capabilities *cap)
@@ -2640,6 +2645,17 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
.matches = has_cpuid_feature,
.cpu_enable = cpu_trap_el0_impdef,
},
+ {
+ .desc = "Data independent timing control (DIT)",
+ .capability = ARM64_HAS_DIT,
+ .type = ARM64_CPUCAP_SYSTEM_FEATURE,
+ .matches = has_cpuid_feature,
+ .sys_reg = SYS_ID_AA64PFR0_EL1,
+ .field_pos = ID_AA64PFR0_EL1_DIT_SHIFT,
+ .field_width = 4,
+ .min_field_value = 1,
+ .cpu_enable = cpu_enable_dit,
+ },
{},
};

diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index e28137d64b76..229b505e6366 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -197,6 +197,9 @@ alternative_cb_end
.endm

.macro kernel_entry, el, regsize = 64
+ .if \el == 0
+ ALTERNATIVE(nop, SET_PSTATE_DIT(1), ARM64_HAS_DIT)
+ .endif
.if \regsize == 32
mov w0, w0 // zero upper 32 bits of x0
.endif
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index f1c0347ec31a..a86ee376920a 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -20,6 +20,7 @@ HAS_CNP
HAS_CRC32
HAS_DCPODP
HAS_DCPOP
+HAS_DIT
HAS_E0PD
HAS_ECV
HAS_EPAN
--
2.35.1



2022-10-27 12:26:38

by Mark Rutland

[permalink] [raw]
Subject: Re: [RFC PATCH] arm64: Enable data independent timing (DIT) in the kernel

On Thu, Oct 27, 2022 at 01:27:41PM +0200, Ard Biesheuvel wrote:
> The ARM architecture revision v8.4 introduces a data independent timing
> control (DIT) which can be set at any exception level, and instructs the
> CPU to avoid optimizations that may result in a correlation between the
> execution time of certain instructions and the value of the data they
> operate on.
>
> The DIT bit is part of PSTATE, and is therefore context switched as
> usual, given that it becomes part of the saved program state (SPSR) when
> taking an exception. We have also defined a hwcap for DIT, and so user
> space can discover already whether or nor DIT is available. This means
> that, as far as user space is concerned, DIT is wired up and fully
> functional.
>
> In the kernel, however, we never bothered with DIT: we disable at it
> boot (i.e., INIT_PSTATE_EL1 has DIT cleared) and ignore the fact that we
> might run with DIT enabled if user space happened to set it.

FWIW, I did raise this with some architects a while back, and the thinking at
the time was that implementations were likely to have data independent timing
regardless of the value of the DIT bit, and so manipulating this at exception
entry was busy work with no actual gain (especially if we had to handle
mismatched big.LITTLE systems).

Since then, I have become aware of some possible implementation choices which
would consider the bit (though I have no idea if anyone is building such
implementations).

> Given that running privileged code with DIT disabled on a CPU that
> implements support for it may result in a side channel that exposes
> privileged data to unprivileged user space processes,

I appreciate this is a simple way to rule out issues of that sort, but I think
the "may" in that sentence is doing a lot of work, since:

* IIUC, we don't have a specific case in mind that we're concerned about. I can
believe that we think all the crypto code we intend to be constant time is
theoretically affected.

* IIUC we haven't gone an audited all constant-time code to check it doesn't
happen to use instructions with data-dependent-timing. So there might be more
work to do atop this to ensure theoretical correctness.

* AFAIK there are no contemporary implementations where the DIT is both
implemented and it being clear results in data-dependent-timing. i.e. we have
nothing to actually test on.

Given that, it would be nice if we could make this a bit clearer, e.g. state
that this is currently a belt-and-braces approach for theoretical cases, rather
than an extant side-channel today (as far as we're aware).

> let's enable DIT while running in the kernel if supported by all CPUs.

FWIW, I think it's reasonable to do this when all CPUs have DIT.

I have a slight fear that (as above) if there are future CPUs which do consider
DIT, there's presumably a noticeable performance difference (or the CPU would
just provide data-independent-timing regardless), but I'm not sure if that's
just something we have to live with or could punt on until we notice such
cases.

> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: Eric Biggers <[email protected]>
> Cc: Jason A. Donenfeld <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Suzuki K Poulose <[email protected]>
> Cc: Adam Langley <[email protected]>
> Link: https://lore.kernel.org/all/[email protected]/
> Signed-off-by: Ard Biesheuvel <[email protected]>
> ---
> arch/arm64/include/asm/sysreg.h | 3 +++
> arch/arm64/kernel/cpufeature.c | 16 ++++++++++++++++
> arch/arm64/kernel/entry.S | 3 +++
> arch/arm64/tools/cpucaps | 1 +
> 4 files changed, 23 insertions(+)

Don't we also need to touch __cpu_suspend_exit() in arch/am64/kernel/suspend.c?
I'm assuming so given that has __uaccess_enable_hw_pan() today.

Presumably we'd also care about this in KVM hyp code?

> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 7d301700d1a9..18e065f5130c 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -94,15 +94,18 @@
> #define PSTATE_PAN pstate_field(0, 4)
> #define PSTATE_UAO pstate_field(0, 3)
> #define PSTATE_SSBS pstate_field(3, 1)
> +#define PSTATE_DIT pstate_field(3, 2)
> #define PSTATE_TCO pstate_field(3, 4)
>
> #define SET_PSTATE_PAN(x) __emit_inst(0xd500401f | PSTATE_PAN | ((!!x) << PSTATE_Imm_shift))
> #define SET_PSTATE_UAO(x) __emit_inst(0xd500401f | PSTATE_UAO | ((!!x) << PSTATE_Imm_shift))
> #define SET_PSTATE_SSBS(x) __emit_inst(0xd500401f | PSTATE_SSBS | ((!!x) << PSTATE_Imm_shift))
> +#define SET_PSTATE_DIT(x) __emit_inst(0xd500401f | PSTATE_DIT | ((!!x) << PSTATE_Imm_shift))
> #define SET_PSTATE_TCO(x) __emit_inst(0xd500401f | PSTATE_TCO | ((!!x) << PSTATE_Imm_shift))
>
> #define set_pstate_pan(x) asm volatile(SET_PSTATE_PAN(x))
> #define set_pstate_uao(x) asm volatile(SET_PSTATE_UAO(x))
> +#define set_pstate_dit(x) asm volatile(SET_PSTATE_DIT(x))
> #define set_pstate_ssbs(x) asm volatile(SET_PSTATE_SSBS(x))
>
> #define __SYS_BARRIER_INSN(CRm, op2, Rt) \
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 6062454a9067..273a74df24fe 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -2077,6 +2077,11 @@ static void cpu_trap_el0_impdef(const struct arm64_cpu_capabilities *__unused)
> sysreg_clear_set(sctlr_el1, 0, SCTLR_EL1_TIDCP);
> }
>
> +static void cpu_enable_dit(const struct arm64_cpu_capabilities *__unused)
> +{
> + set_pstate_dit(1);
> +}

I think this wants the same treatment as PAN, i.e.
WARN_ON_ONCE(in_interrupt()); with a comment about PSTATE being discareded upon
ERET.

Thanks,
Mark.

> +
> /* Internal helper functions to match cpu capability type */
> static bool
> cpucap_late_cpu_optional(const struct arm64_cpu_capabilities *cap)
> @@ -2640,6 +2645,17 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
> .matches = has_cpuid_feature,
> .cpu_enable = cpu_trap_el0_impdef,
> },
> + {
> + .desc = "Data independent timing control (DIT)",
> + .capability = ARM64_HAS_DIT,
> + .type = ARM64_CPUCAP_SYSTEM_FEATURE,
> + .matches = has_cpuid_feature,
> + .sys_reg = SYS_ID_AA64PFR0_EL1,
> + .field_pos = ID_AA64PFR0_EL1_DIT_SHIFT,
> + .field_width = 4,
> + .min_field_value = 1,
> + .cpu_enable = cpu_enable_dit,
> + },
> {},
> };
>
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index e28137d64b76..229b505e6366 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -197,6 +197,9 @@ alternative_cb_end
> .endm
>
> .macro kernel_entry, el, regsize = 64
> + .if \el == 0
> + ALTERNATIVE(nop, SET_PSTATE_DIT(1), ARM64_HAS_DIT)
> + .endif
> .if \regsize == 32
> mov w0, w0 // zero upper 32 bits of x0
> .endif
> diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
> index f1c0347ec31a..a86ee376920a 100644
> --- a/arch/arm64/tools/cpucaps
> +++ b/arch/arm64/tools/cpucaps
> @@ -20,6 +20,7 @@ HAS_CNP
> HAS_CRC32
> HAS_DCPODP
> HAS_DCPOP
> +HAS_DIT
> HAS_E0PD
> HAS_ECV
> HAS_EPAN
> --
> 2.35.1
>

2022-10-27 13:07:52

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [RFC PATCH] arm64: Enable data independent timing (DIT) in the kernel

On Thu, Oct 27, 2022 at 01:12:16PM +0100, Mark Rutland wrote:
> I appreciate this is a simple way to rule out issues of that sort, but I think
> the "may" in that sentence is doing a lot of work, since:
>
> * IIUC, we don't have a specific case in mind that we're concerned about. I can
> believe that we think all the crypto code we intend to be constant time is
> theoretically affected.
>
> * IIUC we haven't gone an audited all constant-time code to check it doesn't
> happen to use instructions with data-dependent-timing. So there might be more
> work to do atop this to ensure theoretical correctness.
>
> * AFAIK there are no contemporary implementations where the DIT is both
> implemented and it being clear results in data-dependent-timing. i.e. we have
> nothing to actually test on.
>
> I have a slight fear that (as above) if there are future CPUs which do consider
> DIT, there's presumably a noticeable performance difference (or the CPU would
> just provide data-independent-timing regardless), but I'm not sure if that's
> just something we have to live with or could punt on until we notice such
> cases.

You're heading on a road to disaster reasoning like that.

You wrote, "we don't have a specific case in mind that we're concerned
about", but actually, all you can say here is that you're not personally
aware of a specific case in mind to be concerned about. As somebody who
actually works on this code, I do have specific cases I'm worried about,
and I know there are certain assumptions I've made about various coding
patterns being CT, resulting in various instructions that I assume to be
CT, which is something I tend to check by hand, while others have entire
frameworks to automatically verify this kind of thing. In other words,
one man's theory is another man's practice.

Then you write that there aren't any contemporary instructions where
this matters, but you fear they could come up in the future. Okay, good,
that's a perspective we can both share. The logical thing to do about
that would be Ard's patch here. However, you then conclude something
vague about performance and suggest punting this down the road. I guess
this makes sense to you because you don't think timing attacks are real
anyway. You're entitled to your opinion, of course, but I don't think
it's a popular one, and it certainly is contrary to that of most
implementers of the concerned code.

On the contrary, we should be proactive in ensuring the kernel remains
a suitable environment for CT code, preventing the problem *before* it
happens, rather than having to deal with vulnerability response down the
road, "punt[ing]" it, as you said. And perhaps if we handle this now,
CPU designers also won't feel like they can get away with silly
performance gains at the cost of CT instructions.

Jason

2022-10-27 13:58:13

by Mark Rutland

[permalink] [raw]
Subject: Re: [RFC PATCH] arm64: Enable data independent timing (DIT) in the kernel

Hi Jason,

I think my wording below might have been unclear -- I was not arguing to punt
on this patch. I've tried to clarify that below. Tone is always hard in text...

On Thu, Oct 27, 2022 at 02:40:14PM +0200, Jason A. Donenfeld wrote:
> On Thu, Oct 27, 2022 at 01:12:16PM +0100, Mark Rutland wrote:
> > I appreciate this is a simple way to rule out issues of that sort, but I think
> > the "may" in that sentence is doing a lot of work, since:
> >
> > * IIUC, we don't have a specific case in mind that we're concerned about. I can
> > believe that we think all the crypto code we intend to be constant time is
> > theoretically affected.
> >
> > * IIUC we haven't gone an audited all constant-time code to check it doesn't
> > happen to use instructions with data-dependent-timing. So there might be more
> > work to do atop this to ensure theoretical correctness.
> >
> > * AFAIK there are no contemporary implementations where the DIT is both
> > implemented and it being clear results in data-dependent-timing. i.e. we have
> > nothing to actually test on.
> >
> > I have a slight fear that (as above) if there are future CPUs which do consider
> > DIT, there's presumably a noticeable performance difference (or the CPU would
> > just provide data-independent-timing regardless), but I'm not sure if that's
> > just something we have to live with or could punt on until we notice such
> > cases.
>
> You're heading on a road to disaster reasoning like that.
>
> You wrote, "we don't have a specific case in mind that we're concerned
> about", but actually, all you can say here is that you're not personally
> aware of a specific case in mind to be concerned about. As somebody who
> actually works on this code, I do have specific cases I'm worried about,
> and I know there are certain assumptions I've made about various coding
> patterns being CT, resulting in various instructions that I assume to be
> CT, which is something I tend to check by hand, while others have entire
> frameworks to automatically verify this kind of thing. In other words,
> one man's theory is another man's practice.

What I'm trying to say here is that as-written the "may result in a side
channel" wording is sufficiently vague that it can easily be read both more
strongly and wore weakly than may have been intended (and can easily be
editotorialised to "there's a new side-channel bug" style reporting that isn't
helpful). Hence I'm calling that out explicitly for the benefit of others.

If we say something like "in the absence of DIT we can't ensure that
instructions sequences have data independent timing, and so where this matters
there could in theory be a side channel", or something like that, I'd be
happier.

As you say, people have different levels of riguour here, and (unfortunately)
there's plenty of extant code with informal assumptions that are hard to reason
about. If you have a specific example to contribute to the commit messge, that
would be great.

> Then you write that there aren't any contemporary instructions where
> this matters, but you fear they could come up in the future. Okay, good,
> that's a perspective we can both share. The logical thing to do about
> that would be Ard's patch here. However, you then conclude something
> vague about performance and suggest punting this down the road.

Sorry, I think my wording was unclear here: I meant punting on the performance
concern (i.e. take Ard's patch now, and if we see a performance issue in
future, address it then). I think we're actually agreed on that?

My reason for noting the performance concern was to lampshade that for anyone
aware of any extant / in-progress implementations, both to get them to tell
their HW folk that they might not get the practical gains they expect, and
since if those do exist we could consider following up with mechanisms to
opt-out of data-independent-timing where we are both satisifed that doing so is
safe and where there's a noticeable performance impact.

> I guess this makes sense to you because you don't think timing attacks are
> real anyway. You're entitled to your opinion, of course, but I don't think
> it's a popular one, and it certainly is contrary to that of most implementers
> of the concerned code.

That is not at all what I was trying to say here, but I can see how it's
possible to read that, so apologies for not being clear.

I am well aware that timing attacks are real, having specnt quite a while on
spectre its acquaintances, and having argued for stronger architectural
guarantees / better mechanisms to deal with this sort of thing.

> On the contrary, we should be proactive in ensuring the kernel remains
> a suitable environment for CT code, preventing the problem *before* it
> happens,

I agree. I'm just trying to make sure that we're clear that we're doing that
and this isn't a fix for an extant seurity bug which is actively exploitable
today.

> rather than having to deal with vulnerability response down the road,
> "punt[ing]" it, as you said. And perhaps if we handle this now, CPU designers
> also won't feel like they can get away with silly performance gains at the
> cost of CT instructions.

Sure.

Thanks,
Mark.

2022-10-27 13:59:55

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [RFC PATCH] arm64: Enable data independent timing (DIT) in the kernel

On Thu, 27 Oct 2022 at 14:12, Mark Rutland <[email protected]> wrote:
>
> On Thu, Oct 27, 2022 at 01:27:41PM +0200, Ard Biesheuvel wrote:
> > The ARM architecture revision v8.4 introduces a data independent timing
> > control (DIT) which can be set at any exception level, and instructs the
> > CPU to avoid optimizations that may result in a correlation between the
> > execution time of certain instructions and the value of the data they
> > operate on.
> >
> > The DIT bit is part of PSTATE, and is therefore context switched as
> > usual, given that it becomes part of the saved program state (SPSR) when
> > taking an exception. We have also defined a hwcap for DIT, and so user
> > space can discover already whether or nor DIT is available. This means
> > that, as far as user space is concerned, DIT is wired up and fully
> > functional.
> >
> > In the kernel, however, we never bothered with DIT: we disable at it
> > boot (i.e., INIT_PSTATE_EL1 has DIT cleared) and ignore the fact that we
> > might run with DIT enabled if user space happened to set it.
>
> FWIW, I did raise this with some architects a while back, and the thinking at
> the time was that implementations were likely to have data independent timing
> regardless of the value of the DIT bit, and so manipulating this at exception
> entry was busy work with no actual gain (especially if we had to handle
> mismatched big.LITTLE systems).
>
> Since then, I have become aware of some possible implementation choices which
> would consider the bit (though I have no idea if anyone is building such
> implementations).
>

It's a bit unfortunate that FEAT_DIT is mandatory even if the uarch in
question behaves the same regardless. And the fact that it is
documented as resetting to 0x0, and already being exposed to user
space doesn't help either. But that doesn't justify keeping this
disabled in the kernel.

The architecture does not permit us to distinguish between the cases
where this is just busywork and where it does make a difference. So we
have to assume it is the latter.

> > Given that running privileged code with DIT disabled on a CPU that
> > implements support for it may result in a side channel that exposes
> > privileged data to unprivileged user space processes,
>
> I appreciate this is a simple way to rule out issues of that sort, but I think
> the "may" in that sentence is doing a lot of work, since:
>
> * IIUC, we don't have a specific case in mind that we're concerned about. I can
> believe that we think all the crypto code we intend to be constant time is
> theoretically affected.
>

I think this reasoning is backwards. I don't think it is necessary to
go and identify where we are relying on timing invariance. Crypto
keeps coming up, and of course, it is a well known example of where
timing variances may leak the plaintext or the key. But crypto is just
one way to keep data confidential, and another method we rely on
heavily is the privilege boundary between kernel and user space. So
just like all crypto should be constant time, all privileged execution
should [ideally] be constant time as well.

> * IIUC we haven't gone an audited all constant-time code to check it doesn't
> happen to use instructions with data-dependent-timing. So there might be more
> work to do atop this to ensure theoretical correctness.
>

Sure.

> * AFAIK there are no contemporary implementations where the DIT is both
> implemented and it being clear results in data-dependent-timing. i.e. we have
> nothing to actually test on.
>

Then why on earth are such implementations required to implement FEAT_DIT??

> Given that, it would be nice if we could make this a bit clearer, e.g. state
> that this is currently a belt-and-braces approach for theoretical cases, rather
> than an extant side-channel today (as far as we're aware).
>

Sure - I'll add some extra prose to the commit log to capture the above.

> > let's enable DIT while running in the kernel if supported by all CPUs.
>
> FWIW, I think it's reasonable to do this when all CPUs have DIT.
>
> I have a slight fear that (as above) if there are future CPUs which do consider
> DIT, there's presumably a noticeable performance difference (or the CPU would
> just provide data-independent-timing regardless), but I'm not sure if that's
> just something we have to live with or could punt on until we notice such
> cases.
>

Sure. Another concern might be the overhead of toggling the bit, so
getting this change in sooner than later might actually help direct
the development towards implementations where the performance uplift
substantially outweighs the cycles spent on managing the DIT state.

To me, it seems unlikely that timing dependent optimizations in
privileged code would benefit actual real-world workloads while not
resulting in exploitable timing leajs, but user space should be able
to manage this however it wants.

IOW, yes. Let's pick a safe default, and when use cases turn up where
disabling DIT makes a substantial difference, we can revisit this.

> > Cc: Catalin Marinas <[email protected]>
> > Cc: Will Deacon <[email protected]>
> > Cc: Mark Rutland <[email protected]>
> > Cc: Marc Zyngier <[email protected]>
> > Cc: Eric Biggers <[email protected]>
> > Cc: Jason A. Donenfeld <[email protected]>
> > Cc: Kees Cook <[email protected]>
> > Cc: Suzuki K Poulose <[email protected]>
> > Cc: Adam Langley <[email protected]>
> > Link: https://lore.kernel.org/all/[email protected]/
> > Signed-off-by: Ard Biesheuvel <[email protected]>
> > ---
> > arch/arm64/include/asm/sysreg.h | 3 +++
> > arch/arm64/kernel/cpufeature.c | 16 ++++++++++++++++
> > arch/arm64/kernel/entry.S | 3 +++
> > arch/arm64/tools/cpucaps | 1 +
> > 4 files changed, 23 insertions(+)
>
> Don't we also need to touch __cpu_suspend_exit() in arch/am64/kernel/suspend.c?
> I'm assuming so given that has __uaccess_enable_hw_pan() today.
>

Indeed, I missed that.

> Presumably we'd also care about this in KVM hyp code?
>

Presumably, yes but I didn't investigate thoroughly what that would
entail. I think this can be managed separately, and I'll look into it
once the discussion here converges.

> > diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> > index 7d301700d1a9..18e065f5130c 100644
> > --- a/arch/arm64/include/asm/sysreg.h
> > +++ b/arch/arm64/include/asm/sysreg.h
> > @@ -94,15 +94,18 @@
> > #define PSTATE_PAN pstate_field(0, 4)
> > #define PSTATE_UAO pstate_field(0, 3)
> > #define PSTATE_SSBS pstate_field(3, 1)
> > +#define PSTATE_DIT pstate_field(3, 2)
> > #define PSTATE_TCO pstate_field(3, 4)
> >
> > #define SET_PSTATE_PAN(x) __emit_inst(0xd500401f | PSTATE_PAN | ((!!x) << PSTATE_Imm_shift))
> > #define SET_PSTATE_UAO(x) __emit_inst(0xd500401f | PSTATE_UAO | ((!!x) << PSTATE_Imm_shift))
> > #define SET_PSTATE_SSBS(x) __emit_inst(0xd500401f | PSTATE_SSBS | ((!!x) << PSTATE_Imm_shift))
> > +#define SET_PSTATE_DIT(x) __emit_inst(0xd500401f | PSTATE_DIT | ((!!x) << PSTATE_Imm_shift))
> > #define SET_PSTATE_TCO(x) __emit_inst(0xd500401f | PSTATE_TCO | ((!!x) << PSTATE_Imm_shift))
> >
> > #define set_pstate_pan(x) asm volatile(SET_PSTATE_PAN(x))
> > #define set_pstate_uao(x) asm volatile(SET_PSTATE_UAO(x))
> > +#define set_pstate_dit(x) asm volatile(SET_PSTATE_DIT(x))
> > #define set_pstate_ssbs(x) asm volatile(SET_PSTATE_SSBS(x))
> >
> > #define __SYS_BARRIER_INSN(CRm, op2, Rt) \
> > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > index 6062454a9067..273a74df24fe 100644
> > --- a/arch/arm64/kernel/cpufeature.c
> > +++ b/arch/arm64/kernel/cpufeature.c
> > @@ -2077,6 +2077,11 @@ static void cpu_trap_el0_impdef(const struct arm64_cpu_capabilities *__unused)
> > sysreg_clear_set(sctlr_el1, 0, SCTLR_EL1_TIDCP);
> > }
> >
> > +static void cpu_enable_dit(const struct arm64_cpu_capabilities *__unused)
> > +{
> > + set_pstate_dit(1);
> > +}
>
> I think this wants the same treatment as PAN, i.e.
> WARN_ON_ONCE(in_interrupt()); with a comment about PSTATE being discareded upon
> ERET.
>

This is only called from the CPU feature code, which calls it under
stop_machine(), so I decided it was not needed. I don't mind adding
it, though, just seems unnecessary to me.

2022-10-27 15:06:20

by Mark Kettenis

[permalink] [raw]
Subject: Re: [RFC PATCH] arm64: Enable data independent timing (DIT) in the kernel

> From: Ard Biesheuvel <[email protected]>
> Date: Thu, 27 Oct 2022 15:17:56 +0200
>
> On Thu, 27 Oct 2022 at 14:12, Mark Rutland <[email protected]> wrote:
> >
> > On Thu, Oct 27, 2022 at 01:27:41PM +0200, Ard Biesheuvel wrote:
> > > The ARM architecture revision v8.4 introduces a data independent timing
> > > control (DIT) which can be set at any exception level, and instructs the
> > > CPU to avoid optimizations that may result in a correlation between the
> > > execution time of certain instructions and the value of the data they
> > > operate on.
> > >
> > > The DIT bit is part of PSTATE, and is therefore context switched as
> > > usual, given that it becomes part of the saved program state (SPSR) when
> > > taking an exception. We have also defined a hwcap for DIT, and so user
> > > space can discover already whether or nor DIT is available. This means
> > > that, as far as user space is concerned, DIT is wired up and fully
> > > functional.
> > >
> > > In the kernel, however, we never bothered with DIT: we disable at it
> > > boot (i.e., INIT_PSTATE_EL1 has DIT cleared) and ignore the fact that we
> > > might run with DIT enabled if user space happened to set it.
> >
> > FWIW, I did raise this with some architects a while back, and the thinking at
> > the time was that implementations were likely to have data independent timing
> > regardless of the value of the DIT bit, and so manipulating this at exception
> > entry was busy work with no actual gain (especially if we had to handle
> > mismatched big.LITTLE systems).
> >
> > Since then, I have become aware of some possible implementation choices which
> > would consider the bit (though I have no idea if anyone is building such
> > implementations).
> >
>
> It's a bit unfortunate that FEAT_DIT is mandatory even if the uarch in
> question behaves the same regardless. And the fact that it is
> documented as resetting to 0x0, and already being exposed to user
> space doesn't help either. But that doesn't justify keeping this
> disabled in the kernel.
>
> The architecture does not permit us to distinguish between the cases
> where this is just busywork and where it does make a difference. So we
> have to assume it is the latter.
>
> > > Given that running privileged code with DIT disabled on a CPU that
> > > implements support for it may result in a side channel that exposes
> > > privileged data to unprivileged user space processes,
> >
> > I appreciate this is a simple way to rule out issues of that sort, but I think
> > the "may" in that sentence is doing a lot of work, since:
> >
> > * IIUC, we don't have a specific case in mind that we're concerned about. I can
> > believe that we think all the crypto code we intend to be constant time is
> > theoretically affected.
> >
>
> I think this reasoning is backwards. I don't think it is necessary to
> go and identify where we are relying on timing invariance. Crypto
> keeps coming up, and of course, it is a well known example of where
> timing variances may leak the plaintext or the key. But crypto is just
> one way to keep data confidential, and another method we rely on
> heavily is the privilege boundary between kernel and user space. So
> just like all crypto should be constant time, all privileged execution
> should [ideally] be constant time as well.
>
> > * IIUC we haven't gone an audited all constant-time code to check it doesn't
> > happen to use instructions with data-dependent-timing. So there might be more
> > work to do atop this to ensure theoretical correctness.
> >
>
> Sure.
>
> > * AFAIK there are no contemporary implementations where the DIT is both
> > implemented and it being clear results in data-dependent-timing. i.e. we have
> > nothing to actually test on.
> >
>
> Then why on earth are such implementations required to implement FEAT_DIT??
>
> > Given that, it would be nice if we could make this a bit clearer, e.g. state
> > that this is currently a belt-and-braces approach for theoretical cases, rather
> > than an extant side-channel today (as far as we're aware).
> >
>
> Sure - I'll add some extra prose to the commit log to capture the above.
>
> > > let's enable DIT while running in the kernel if supported by all CPUs.
> >
> > FWIW, I think it's reasonable to do this when all CPUs have DIT.
> >
> > I have a slight fear that (as above) if there are future CPUs which do consider
> > DIT, there's presumably a noticeable performance difference (or the CPU would
> > just provide data-independent-timing regardless), but I'm not sure if that's
> > just something we have to live with or could punt on until we notice such
> > cases.
> >
>
> Sure. Another concern might be the overhead of toggling the bit, so
> getting this change in sooner than later might actually help direct
> the development towards implementations where the performance uplift
> substantially outweighs the cycles spent on managing the DIT state.
>
> To me, it seems unlikely that timing dependent optimizations in
> privileged code would benefit actual real-world workloads while not
> resulting in exploitable timing leajs, but user space should be able
> to manage this however it wants.
>
> IOW, yes. Let's pick a safe default, and when use cases turn up where
> disabling DIT makes a substantial difference, we can revisit this.

FWIW, I came to the same conclusion and that's what I implemented for
OpenBSD.

> > > Cc: Catalin Marinas <[email protected]>
> > > Cc: Will Deacon <[email protected]>
> > > Cc: Mark Rutland <[email protected]>
> > > Cc: Marc Zyngier <[email protected]>
> > > Cc: Eric Biggers <[email protected]>
> > > Cc: Jason A. Donenfeld <[email protected]>
> > > Cc: Kees Cook <[email protected]>
> > > Cc: Suzuki K Poulose <[email protected]>
> > > Cc: Adam Langley <[email protected]>
> > > Link: https://lore.kernel.org/all/[email protected]/
> > > Signed-off-by: Ard Biesheuvel <[email protected]>
> > > ---
> > > arch/arm64/include/asm/sysreg.h | 3 +++
> > > arch/arm64/kernel/cpufeature.c | 16 ++++++++++++++++
> > > arch/arm64/kernel/entry.S | 3 +++
> > > arch/arm64/tools/cpucaps | 1 +
> > > 4 files changed, 23 insertions(+)
> >
> > Don't we also need to touch __cpu_suspend_exit() in arch/am64/kernel/suspend.c?
> > I'm assuming so given that has __uaccess_enable_hw_pan() today.
> >
>
> Indeed, I missed that.
>
> > Presumably we'd also care about this in KVM hyp code?
> >
>
> Presumably, yes but I didn't investigate thoroughly what that would
> entail. I think this can be managed separately, and I'll look into it
> once the discussion here converges.
>
> > > diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> > > index 7d301700d1a9..18e065f5130c 100644
> > > --- a/arch/arm64/include/asm/sysreg.h
> > > +++ b/arch/arm64/include/asm/sysreg.h
> > > @@ -94,15 +94,18 @@
> > > #define PSTATE_PAN pstate_field(0, 4)
> > > #define PSTATE_UAO pstate_field(0, 3)
> > > #define PSTATE_SSBS pstate_field(3, 1)
> > > +#define PSTATE_DIT pstate_field(3, 2)
> > > #define PSTATE_TCO pstate_field(3, 4)
> > >
> > > #define SET_PSTATE_PAN(x) __emit_inst(0xd500401f | PSTATE_PAN | ((!!x) << PSTATE_Imm_shift))
> > > #define SET_PSTATE_UAO(x) __emit_inst(0xd500401f | PSTATE_UAO | ((!!x) << PSTATE_Imm_shift))
> > > #define SET_PSTATE_SSBS(x) __emit_inst(0xd500401f | PSTATE_SSBS | ((!!x) << PSTATE_Imm_shift))
> > > +#define SET_PSTATE_DIT(x) __emit_inst(0xd500401f | PSTATE_DIT | ((!!x) << PSTATE_Imm_shift))
> > > #define SET_PSTATE_TCO(x) __emit_inst(0xd500401f | PSTATE_TCO | ((!!x) << PSTATE_Imm_shift))
> > >
> > > #define set_pstate_pan(x) asm volatile(SET_PSTATE_PAN(x))
> > > #define set_pstate_uao(x) asm volatile(SET_PSTATE_UAO(x))
> > > +#define set_pstate_dit(x) asm volatile(SET_PSTATE_DIT(x))
> > > #define set_pstate_ssbs(x) asm volatile(SET_PSTATE_SSBS(x))
> > >
> > > #define __SYS_BARRIER_INSN(CRm, op2, Rt) \
> > > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > > index 6062454a9067..273a74df24fe 100644
> > > --- a/arch/arm64/kernel/cpufeature.c
> > > +++ b/arch/arm64/kernel/cpufeature.c
> > > @@ -2077,6 +2077,11 @@ static void cpu_trap_el0_impdef(const struct arm64_cpu_capabilities *__unused)
> > > sysreg_clear_set(sctlr_el1, 0, SCTLR_EL1_TIDCP);
> > > }
> > >
> > > +static void cpu_enable_dit(const struct arm64_cpu_capabilities *__unused)
> > > +{
> > > + set_pstate_dit(1);
> > > +}
> >
> > I think this wants the same treatment as PAN, i.e.
> > WARN_ON_ONCE(in_interrupt()); with a comment about PSTATE being discareded upon
> > ERET.
> >
>
> This is only called from the CPU feature code, which calls it under
> stop_machine(), so I decided it was not needed. I don't mind adding
> it, though, just seems unnecessary to me.
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>

2022-11-04 08:21:34

by Eric Biggers

[permalink] [raw]
Subject: Re: [RFC PATCH] arm64: Enable data independent timing (DIT) in the kernel

Hi Ard,

On Thu, Oct 27, 2022 at 01:27:41PM +0200, Ard Biesheuvel wrote:
> Given that running privileged code with DIT disabled on a CPU that
> implements support for it may result in a side channel that exposes
> privileged data to unprivileged user space processes, let's enable DIT
> while running in the kernel if supported by all CPUs.

This patch looks good to me, though I'm not an expert in low-level arm64 stuff.
It's a bit unfortunate that we have to manually create the .inst to enable DIT
instead of just using the assembler. But it looks like there's a reason for it
(it's done for PAN et al. too), and based on the manual it looks correct.

Two nits below:

> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 7d301700d1a9..18e065f5130c 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -94,15 +94,18 @@
> #define PSTATE_PAN pstate_field(0, 4)
> #define PSTATE_UAO pstate_field(0, 3)
> #define PSTATE_SSBS pstate_field(3, 1)
> +#define PSTATE_DIT pstate_field(3, 2)
> #define PSTATE_TCO pstate_field(3, 4)
>
> #define SET_PSTATE_PAN(x) __emit_inst(0xd500401f | PSTATE_PAN | ((!!x) << PSTATE_Imm_shift))
> #define SET_PSTATE_UAO(x) __emit_inst(0xd500401f | PSTATE_UAO | ((!!x) << PSTATE_Imm_shift))
> #define SET_PSTATE_SSBS(x) __emit_inst(0xd500401f | PSTATE_SSBS | ((!!x) << PSTATE_Imm_shift))
> +#define SET_PSTATE_DIT(x) __emit_inst(0xd500401f | PSTATE_DIT | ((!!x) << PSTATE_Imm_shift))
> #define SET_PSTATE_TCO(x) __emit_inst(0xd500401f | PSTATE_TCO | ((!!x) << PSTATE_Imm_shift))
>
> #define set_pstate_pan(x) asm volatile(SET_PSTATE_PAN(x))
> #define set_pstate_uao(x) asm volatile(SET_PSTATE_UAO(x))
> +#define set_pstate_dit(x) asm volatile(SET_PSTATE_DIT(x))
> #define set_pstate_ssbs(x) asm volatile(SET_PSTATE_SSBS(x))

To keep the order consistent, set_pstate_dit() should be defined after
set_pstate_ssbs(), not before.

> /* Internal helper functions to match cpu capability type */
> static bool
> cpucap_late_cpu_optional(const struct arm64_cpu_capabilities *cap)
> @@ -2640,6 +2645,17 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
> .matches = has_cpuid_feature,
> .cpu_enable = cpu_trap_el0_impdef,
> },
> + {
> + .desc = "Data independent timing control (DIT)",
> + .capability = ARM64_HAS_DIT,
> + .type = ARM64_CPUCAP_SYSTEM_FEATURE,
> + .matches = has_cpuid_feature,
> + .sys_reg = SYS_ID_AA64PFR0_EL1,
> + .field_pos = ID_AA64PFR0_EL1_DIT_SHIFT,
> + .field_width = 4,
> + .min_field_value = 1,
> + .cpu_enable = cpu_enable_dit,
> + },

The other entries in this array are explicit about '.sign = FTR_UNSIGNED'
(even though FTR_UNSIGNED is defined to false, so it's the default value).

- Eric

2022-11-04 10:47:30

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [RFC PATCH] arm64: Enable data independent timing (DIT) in the kernel

On Fri, 4 Nov 2022 at 09:09, Eric Biggers <[email protected]> wrote:
>
> Hi Ard,
>
> On Thu, Oct 27, 2022 at 01:27:41PM +0200, Ard Biesheuvel wrote:
> > Given that running privileged code with DIT disabled on a CPU that
> > implements support for it may result in a side channel that exposes
> > privileged data to unprivileged user space processes, let's enable DIT
> > while running in the kernel if supported by all CPUs.
>
> This patch looks good to me, though I'm not an expert in low-level arm64 stuff.
> It's a bit unfortunate that we have to manually create the .inst to enable DIT
> instead of just using the assembler. But it looks like there's a reason for it
> (it's done for PAN et al. too), and based on the manual it looks correct.
>

Yes. The reason is that the assembler requires -march=armv8.2-a to be
passed when using the DIT register (and similar requirements apply to
the other registers). However, doing so may result in object code that
can no longer run on pre-v8.2 cores, whereas the DIT accesses
themselves are only emitted in a carefully controlled manner anyway,
so keeping the arch baseline to v8.0 and using .inst is the cleanest
way around this.


> Two nits below:
>
> > diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> > index 7d301700d1a9..18e065f5130c 100644
> > --- a/arch/arm64/include/asm/sysreg.h
> > +++ b/arch/arm64/include/asm/sysreg.h
> > @@ -94,15 +94,18 @@
> > #define PSTATE_PAN pstate_field(0, 4)
> > #define PSTATE_UAO pstate_field(0, 3)
> > #define PSTATE_SSBS pstate_field(3, 1)
> > +#define PSTATE_DIT pstate_field(3, 2)
> > #define PSTATE_TCO pstate_field(3, 4)
> >
> > #define SET_PSTATE_PAN(x) __emit_inst(0xd500401f | PSTATE_PAN | ((!!x) << PSTATE_Imm_shift))
> > #define SET_PSTATE_UAO(x) __emit_inst(0xd500401f | PSTATE_UAO | ((!!x) << PSTATE_Imm_shift))
> > #define SET_PSTATE_SSBS(x) __emit_inst(0xd500401f | PSTATE_SSBS | ((!!x) << PSTATE_Imm_shift))
> > +#define SET_PSTATE_DIT(x) __emit_inst(0xd500401f | PSTATE_DIT | ((!!x) << PSTATE_Imm_shift))
> > #define SET_PSTATE_TCO(x) __emit_inst(0xd500401f | PSTATE_TCO | ((!!x) << PSTATE_Imm_shift))
> >
> > #define set_pstate_pan(x) asm volatile(SET_PSTATE_PAN(x))
> > #define set_pstate_uao(x) asm volatile(SET_PSTATE_UAO(x))
> > +#define set_pstate_dit(x) asm volatile(SET_PSTATE_DIT(x))
> > #define set_pstate_ssbs(x) asm volatile(SET_PSTATE_SSBS(x))
>
> To keep the order consistent, set_pstate_dit() should be defined after
> set_pstate_ssbs(), not before.
>

Ack. Seems I just inserted it one from the bottom without actually reading :-)

> > /* Internal helper functions to match cpu capability type */
> > static bool
> > cpucap_late_cpu_optional(const struct arm64_cpu_capabilities *cap)
> > @@ -2640,6 +2645,17 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
> > .matches = has_cpuid_feature,
> > .cpu_enable = cpu_trap_el0_impdef,
> > },
> > + {
> > + .desc = "Data independent timing control (DIT)",
> > + .capability = ARM64_HAS_DIT,
> > + .type = ARM64_CPUCAP_SYSTEM_FEATURE,
> > + .matches = has_cpuid_feature,
> > + .sys_reg = SYS_ID_AA64PFR0_EL1,
> > + .field_pos = ID_AA64PFR0_EL1_DIT_SHIFT,
> > + .field_width = 4,
> > + .min_field_value = 1,
> > + .cpu_enable = cpu_enable_dit,
> > + },
>
> The other entries in this array are explicit about '.sign = FTR_UNSIGNED'
> (even though FTR_UNSIGNED is defined to false, so it's the default value).
>

Ack.

2022-11-04 16:23:06

by Catalin Marinas

[permalink] [raw]
Subject: Re: [RFC PATCH] arm64: Enable data independent timing (DIT) in the kernel

On Fri, Nov 04, 2022 at 10:29:10AM +0100, Ard Biesheuvel wrote:
> On Fri, 4 Nov 2022 at 09:09, Eric Biggers <[email protected]> wrote:
> > On Thu, Oct 27, 2022 at 01:27:41PM +0200, Ard Biesheuvel wrote:
> > > Given that running privileged code with DIT disabled on a CPU that
> > > implements support for it may result in a side channel that exposes
> > > privileged data to unprivileged user space processes, let's enable DIT
> > > while running in the kernel if supported by all CPUs.
> >
> > This patch looks good to me, though I'm not an expert in low-level arm64 stuff.
> > It's a bit unfortunate that we have to manually create the .inst to enable DIT
> > instead of just using the assembler. But it looks like there's a reason for it
> > (it's done for PAN et al. too), and based on the manual it looks correct.
>
> Yes. The reason is that the assembler requires -march=armv8.2-a to be
> passed when using the DIT register (and similar requirements apply to
> the other registers). However, doing so may result in object code that
> can no longer run on pre-v8.2 cores, whereas the DIT accesses
> themselves are only emitted in a carefully controlled manner anyway,
> so keeping the arch baseline to v8.0 and using .inst is the cleanest
> way around this.

We worked around this already by defining asm-arch in
arch/arm64/Makefile to the latest that the assembler supports while
keeping the C compiler on armv8.0. Unlike the C compiler, the assembler
shouldn't generate new instructions unless specifically asked through
inline asm or .S files. We use this trick for MTE already (and LSE
atomics), though we needed another __MTE_PREAMBLE as armv8.5-a wasn't
sufficient for these instructions.

I think we ended up with .inst initially as binutils did not support
some of those instructions. We could try to clean them up but it's a bit
of a hassle to check which versions your binutils supports.

--
Catalin

2022-11-04 17:01:57

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [RFC PATCH] arm64: Enable data independent timing (DIT) in the kernel

On Fri, 4 Nov 2022 at 17:19, Catalin Marinas <[email protected]> wrote:
>
> On Fri, Nov 04, 2022 at 10:29:10AM +0100, Ard Biesheuvel wrote:
> > On Fri, 4 Nov 2022 at 09:09, Eric Biggers <[email protected]> wrote:
> > > On Thu, Oct 27, 2022 at 01:27:41PM +0200, Ard Biesheuvel wrote:
> > > > Given that running privileged code with DIT disabled on a CPU that
> > > > implements support for it may result in a side channel that exposes
> > > > privileged data to unprivileged user space processes, let's enable DIT
> > > > while running in the kernel if supported by all CPUs.
> > >
> > > This patch looks good to me, though I'm not an expert in low-level arm64 stuff.
> > > It's a bit unfortunate that we have to manually create the .inst to enable DIT
> > > instead of just using the assembler. But it looks like there's a reason for it
> > > (it's done for PAN et al. too), and based on the manual it looks correct.
> >
> > Yes. The reason is that the assembler requires -march=armv8.2-a to be
> > passed when using the DIT register (and similar requirements apply to
> > the other registers). However, doing so may result in object code that
> > can no longer run on pre-v8.2 cores, whereas the DIT accesses
> > themselves are only emitted in a carefully controlled manner anyway,
> > so keeping the arch baseline to v8.0 and using .inst is the cleanest
> > way around this.
>
> We worked around this already by defining asm-arch in
> arch/arm64/Makefile to the latest that the assembler supports while
> keeping the C compiler on armv8.0. Unlike the C compiler, the assembler
> shouldn't generate new instructions unless specifically asked through
> inline asm or .S files. We use this trick for MTE already (and LSE
> atomics), though we needed another __MTE_PREAMBLE as armv8.5-a wasn't
> sufficient for these instructions.
>
> I think we ended up with .inst initially as binutils did not support
> some of those instructions. We could try to clean them up but it's a bit
> of a hassle to check which versions your binutils supports.
>

OK, good to know.

However, I double checked, and DIT needs v8.4-a (not v8.2 as i
mentioned above), and my ubuntu 16.04 toolchain, which has GCC 5.3,
only goes up to v8.2

So I guess we should be able to fix this at /some/ point, but for now,
I'll need to stick with the __inst()