2023-01-10 06:03:41

by Chen, Yian

[permalink] [raw]
Subject: [PATCH 3/7] x86/cpu: Disable kernel LASS when patching kernel alternatives

Most of the kernel is mapped at virtual addresses
in the upper half of the address range. But kernel
deliberately initialized a temporary mm area
within the lower half of the address range
for text poking, see commit 4fc19708b165
("x86/alternatives: Initialize temporary mm
for patching").

LASS stops access to a lower half address in kernel,
and this can be deactivated if AC bit in EFLAGS
register is set. Hence use stac and clac instructions
around access to the address to avoid triggering a
LASS #GP fault.

Kernel objtool validation warns if the binary calls
to a non-whitelisted function that exists outside of
the stac/clac guard, or references any function with a
dynamic function pointer inside the guard; see section
9 in the document tools/objtool/Documentation/objtool.txt.

For these reasons, also considering text poking size is
usually small, simple modifications have been done
in function text_poke_memcpy() and text_poke_memset() to
avoid non-whitelisted function calls inside the stac/clac
guard.

Gcc may detect and replace the target with its built-in
functions. However, the replacement would break the
objtool validation criteria. Hence, add compiler option
-fno-builtin for the file.

Co-developed-by: Tony Luck <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Yian Chen <[email protected]>
---
arch/x86/include/asm/smap.h | 13 +++++++++++++
arch/x86/kernel/Makefile | 2 ++
arch/x86/kernel/alternative.c | 21 +++++++++++++++++++--
tools/objtool/arch/x86/special.c | 2 ++
4 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
index bab490379c65..6f7ac0839b10 100644
--- a/arch/x86/include/asm/smap.h
+++ b/arch/x86/include/asm/smap.h
@@ -39,6 +39,19 @@ static __always_inline void stac(void)
alternative("", __ASM_STAC, X86_FEATURE_SMAP);
}

+/* Deactivate/activate LASS via AC bit in EFLAGS register */
+static __always_inline void low_addr_access_begin(void)
+{
+ /* Note: a barrier is implicit in alternative() */
+ alternative("", __ASM_STAC, X86_FEATURE_LASS);
+}
+
+static __always_inline void low_addr_access_end(void)
+{
+ /* Note: a barrier is implicit in alternative() */
+ alternative("", __ASM_CLAC, X86_FEATURE_LASS);
+}
+
static __always_inline unsigned long smap_save(void)
{
unsigned long flags;
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 96d51bbc2bd4..f8a455fc56a2 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -7,6 +7,8 @@ extra-y += vmlinux.lds

CPPFLAGS_vmlinux.lds += -U$(UTS_MACHINE)

+CFLAGS_alternative.o += -fno-builtin
+
ifdef CONFIG_FUNCTION_TRACER
# Do not profile debug and lowlevel utilities
CFLAGS_REMOVE_tsc.o = -pg
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 7d8c3cbde368..4de8b54fb5f2 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -1530,14 +1530,31 @@ __ro_after_init unsigned long poking_addr;

static void text_poke_memcpy(void *dst, const void *src, size_t len)
{
- memcpy(dst, src, len);
+ const char *s = src;
+ char *d = dst;
+
+ /* The parameter dst ends up referencing to the global variable
+ * poking_addr, which is mapped to the low half address space.
+ * In kernel, accessing the low half address range is prevented
+ * by LASS. So relax LASS prevention while accessing the memory
+ * range.
+ */
+ low_addr_access_begin();
+ while (len-- > 0)
+ *d++ = *s++;
+ low_addr_access_end();
}

static void text_poke_memset(void *dst, const void *src, size_t len)
{
int c = *(const int *)src;
+ char *d = dst;

- memset(dst, c, len);
+ /* The same comment as it is in function text_poke_memcpy */
+ low_addr_access_begin();
+ while (len-- > 0)
+ *d++ = c;
+ low_addr_access_end();
}

typedef void text_poke_f(void *dst, const void *src, size_t len);
diff --git a/tools/objtool/arch/x86/special.c b/tools/objtool/arch/x86/special.c
index 7c97b7391279..3a34ebe3966a 100644
--- a/tools/objtool/arch/x86/special.c
+++ b/tools/objtool/arch/x86/special.c
@@ -6,11 +6,13 @@

#define X86_FEATURE_POPCNT (4 * 32 + 23)
#define X86_FEATURE_SMAP (9 * 32 + 20)
+#define X86_FEATURE_LASS (12 * 32 + 6)

void arch_handle_alternative(unsigned short feature, struct special_alt *alt)
{
switch (feature) {
case X86_FEATURE_SMAP:
+ case X86_FEATURE_LASS:
/*
* If UACCESS validation is enabled; force that alternative;
* otherwise force it the other way.
--
2.34.1


2023-01-10 21:20:52

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 3/7] x86/cpu: Disable kernel LASS when patching kernel alternatives

On Mon, Jan 09, 2023 at 09:52:00PM -0800, Yian Chen wrote:
> Most of the kernel is mapped at virtual addresses
> in the upper half of the address range. But kernel
> deliberately initialized a temporary mm area
> within the lower half of the address range
> for text poking, see commit 4fc19708b165
> ("x86/alternatives: Initialize temporary mm
> for patching").
>
> LASS stops access to a lower half address in kernel,
> and this can be deactivated if AC bit in EFLAGS
> register is set. Hence use stac and clac instructions
> around access to the address to avoid triggering a
> LASS #GP fault.
>
> Kernel objtool validation warns if the binary calls
> to a non-whitelisted function that exists outside of
> the stac/clac guard, or references any function with a
> dynamic function pointer inside the guard; see section
> 9 in the document tools/objtool/Documentation/objtool.txt.
>
> For these reasons, also considering text poking size is
> usually small, simple modifications have been done
> in function text_poke_memcpy() and text_poke_memset() to
> avoid non-whitelisted function calls inside the stac/clac
> guard.
>
> Gcc may detect and replace the target with its built-in
> functions. However, the replacement would break the
> objtool validation criteria. Hence, add compiler option
> -fno-builtin for the file.

Please reflow to 72 characters consistently, this is silly.

> Co-developed-by: Tony Luck <[email protected]>
> Signed-off-by: Tony Luck <[email protected]>
> Signed-off-by: Yian Chen <[email protected]>
> ---
> arch/x86/include/asm/smap.h | 13 +++++++++++++
> arch/x86/kernel/Makefile | 2 ++
> arch/x86/kernel/alternative.c | 21 +++++++++++++++++++--
> tools/objtool/arch/x86/special.c | 2 ++
> 4 files changed, 36 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
> index bab490379c65..6f7ac0839b10 100644
> --- a/arch/x86/include/asm/smap.h
> +++ b/arch/x86/include/asm/smap.h
> @@ -39,6 +39,19 @@ static __always_inline void stac(void)
> alternative("", __ASM_STAC, X86_FEATURE_SMAP);
> }
>
> +/* Deactivate/activate LASS via AC bit in EFLAGS register */
> +static __always_inline void low_addr_access_begin(void)
> +{
> + /* Note: a barrier is implicit in alternative() */
> + alternative("", __ASM_STAC, X86_FEATURE_LASS);
> +}
> +
> +static __always_inline void low_addr_access_end(void)
> +{
> + /* Note: a barrier is implicit in alternative() */
> + alternative("", __ASM_CLAC, X86_FEATURE_LASS);
> +}

Can't say I like the name. Also if you look at bit 63 as a sign bit,
it's actively wrong since -1 is lower than 0.

> +
> static __always_inline unsigned long smap_save(void)
> {
> unsigned long flags;
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index 96d51bbc2bd4..f8a455fc56a2 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -7,6 +7,8 @@ extra-y += vmlinux.lds
>
> CPPFLAGS_vmlinux.lds += -U$(UTS_MACHINE)
>
> +CFLAGS_alternative.o += -fno-builtin
> +
> ifdef CONFIG_FUNCTION_TRACER
> # Do not profile debug and lowlevel utilities
> CFLAGS_REMOVE_tsc.o = -pg
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index 7d8c3cbde368..4de8b54fb5f2 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -1530,14 +1530,31 @@ __ro_after_init unsigned long poking_addr;
>
> static void text_poke_memcpy(void *dst, const void *src, size_t len)
> {
> - memcpy(dst, src, len);
> + const char *s = src;
> + char *d = dst;
> +
> + /* The parameter dst ends up referencing to the global variable
> + * poking_addr, which is mapped to the low half address space.
> + * In kernel, accessing the low half address range is prevented
> + * by LASS. So relax LASS prevention while accessing the memory
> + * range.
> + */
> + low_addr_access_begin();
> + while (len-- > 0)
> + *d++ = *s++;
> + low_addr_access_end();
> }
>
> static void text_poke_memset(void *dst, const void *src, size_t len)
> {
> int c = *(const int *)src;
> + char *d = dst;
>
> - memset(dst, c, len);
> + /* The same comment as it is in function text_poke_memcpy */
> + low_addr_access_begin();
> + while (len-- > 0)
> + *d++ = c;
> + low_addr_access_end();
> }

This is horrific tinkering :/

Also, what about the EFI mm? IIRC EFI also lives in the user address
space.

2023-01-10 22:59:08

by Sohil Mehta

[permalink] [raw]
Subject: Re: [PATCH 3/7] x86/cpu: Disable kernel LASS when patching kernel alternatives

On 1/9/2023 9:52 PM, Yian Chen wrote:

> LASS stops access to a lower half address in kernel,
> and this can be deactivated if AC bit in EFLAGS
> register is set. Hence use stac and clac instructions
> around access to the address to avoid triggering a
> LASS #GP fault.
>

It seems we are implicitly relying on the on stac() and clac() calls
that are added for SMAP. Have you tried running with SMAP disabled i.e
"clearcpuid=smap"?

I believe there needs to be a dependency between LASS and SMAP.

diff --git a/arch/x86/kernel/cpu/cpuid-deps.c
b/arch/x86/kernel/cpu/cpuid-deps.c
index d95221117129..00bc7e4a65d2 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -79,6 +79,7 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_XFD, X86_FEATURE_XSAVES },
{ X86_FEATURE_XFD, X86_FEATURE_XGETBV1 },
{ X86_FEATURE_AMX_TILE, X86_FEATURE_XFD },
+ { X86_FEATURE_LASS, X86_FEATURE_SMAP },
{}
};

2023-01-11 01:36:05

by Chen, Yian

[permalink] [raw]
Subject: Re: [PATCH 3/7] x86/cpu: Disable kernel LASS when patching kernel alternatives



On 1/10/2023 1:04 PM, Peter Zijlstra wrote:
> On Mon, Jan 09, 2023 at 09:52:00PM -0800, Yian Chen wrote:
>> Most of the kernel is mapped at virtual addresses
>> in the upper half of the address range. But kernel
>> deliberately initialized a temporary mm area
>> within the lower half of the address range
>> for text poking, see commit 4fc19708b165
>> ("x86/alternatives: Initialize temporary mm
>> for patching").
>>
>> LASS stops access to a lower half address in kernel,
>> and this can be deactivated if AC bit in EFLAGS
>> register is set. Hence use stac and clac instructions
>> around access to the address to avoid triggering a
>> LASS #GP fault.
>>
>> Kernel objtool validation warns if the binary calls
>> to a non-whitelisted function that exists outside of
>> the stac/clac guard, or references any function with a
>> dynamic function pointer inside the guard; see section
>> 9 in the document tools/objtool/Documentation/objtool.txt.
>>
>> For these reasons, also considering text poking size is
>> usually small, simple modifications have been done
>> in function text_poke_memcpy() and text_poke_memset() to
>> avoid non-whitelisted function calls inside the stac/clac
>> guard.
>>
>> Gcc may detect and replace the target with its built-in
>> functions. However, the replacement would break the
>> objtool validation criteria. Hence, add compiler option
>> -fno-builtin for the file.
>
> Please reflow to 72 characters consistently, this is silly.
>
Sure. I will format the commit msg guideline.

>> Co-developed-by: Tony Luck <[email protected]>
>> Signed-off-by: Tony Luck <[email protected]>
>> Signed-off-by: Yian Chen <[email protected]>
>> ---
>> arch/x86/include/asm/smap.h | 13 +++++++++++++
>> arch/x86/kernel/Makefile | 2 ++
>> arch/x86/kernel/alternative.c | 21 +++++++++++++++++++--
>> tools/objtool/arch/x86/special.c | 2 ++
>> 4 files changed, 36 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
>> index bab490379c65..6f7ac0839b10 100644
>> --- a/arch/x86/include/asm/smap.h
>> +++ b/arch/x86/include/asm/smap.h
>> @@ -39,6 +39,19 @@ static __always_inline void stac(void)
>> alternative("", __ASM_STAC, X86_FEATURE_SMAP);
>> }
>>
>> +/* Deactivate/activate LASS via AC bit in EFLAGS register */
>> +static __always_inline void low_addr_access_begin(void)
>> +{
>> + /* Note: a barrier is implicit in alternative() */
>> + alternative("", __ASM_STAC, X86_FEATURE_LASS);
>> +}
>> +
>> +static __always_inline void low_addr_access_end(void)
>> +{
>> + /* Note: a barrier is implicit in alternative() */
>> + alternative("", __ASM_CLAC, X86_FEATURE_LASS);
>> +}
>
> Can't say I like the name.
Indeed, there are alternative ways to name the functions. for example,
enable_kernel_lass()/disable_kernel_lass(), or simply keep no change to
use stac()/clac().

I choose this name because it is straight forward to the purpose and
helps in understanding when to use the functions.

Also if you look at bit 63 as a sign bit,
> it's actively wrong since -1 is lower than 0.
>This could be a trade-off choice. While considering address manipulation
and calculation, it is likely an unsigned. I would be happy to get input
for better naming.

>> +
>> static __always_inline unsigned long smap_save(void)
>> {
>> unsigned long flags;
>> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
>> index 96d51bbc2bd4..f8a455fc56a2 100644
>> --- a/arch/x86/kernel/Makefile
>> +++ b/arch/x86/kernel/Makefile
>> @@ -7,6 +7,8 @@ extra-y += vmlinux.lds
>>
>> CPPFLAGS_vmlinux.lds += -U$(UTS_MACHINE)
>>
>> +CFLAGS_alternative.o += -fno-builtin
>> +
>> ifdef CONFIG_FUNCTION_TRACER
>> # Do not profile debug and lowlevel utilities
>> CFLAGS_REMOVE_tsc.o = -pg
>> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
>> index 7d8c3cbde368..4de8b54fb5f2 100644
>> --- a/arch/x86/kernel/alternative.c
>> +++ b/arch/x86/kernel/alternative.c
>> @@ -1530,14 +1530,31 @@ __ro_after_init unsigned long poking_addr;
>>
>> static void text_poke_memcpy(void *dst, const void *src, size_t len)
>> {
>> - memcpy(dst, src, len);
>> + const char *s = src;
>> + char *d = dst;
>> +
>> + /* The parameter dst ends up referencing to the global variable
>> + * poking_addr, which is mapped to the low half address space.
>> + * In kernel, accessing the low half address range is prevented
>> + * by LASS. So relax LASS prevention while accessing the memory
>> + * range.
>> + */
>> + low_addr_access_begin();
>> + while (len-- > 0)
>> + *d++ = *s++;
>> + low_addr_access_end();
>> }
>>
>> static void text_poke_memset(void *dst, const void *src, size_t len)
>> {
>> int c = *(const int *)src;
>> + char *d = dst;
>>
>> - memset(dst, c, len);
>> + /* The same comment as it is in function text_poke_memcpy */
>> + low_addr_access_begin();
>> + while (len-- > 0)
>> + *d++ = c;
>> + low_addr_access_end();
>> }
>
> This is horrific tinkering :/
>
This part seems difficult to have a perfect solution since function call
or function pointer inside the guard of instruction stac and clac will
trigger objtool warning (stated the reasons in the commit msg)

To avoid the warning, I considered this might be okay since the poking
text usually seems a few bytes.

> Also, what about the EFI mm? IIRC EFI also lives in the user address
> space.

I didn't encounter EFI mm related problem while I tested the
implementation. I will update you later after I investigate more around
the EFI mm.

Thanks,
Yian

2023-01-11 09:26:10

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 3/7] x86/cpu: Disable kernel LASS when patching kernel alternatives

On Tue, Jan 10, 2023 at 05:01:59PM -0800, Chen, Yian wrote:

> > > +/* Deactivate/activate LASS via AC bit in EFLAGS register */
> > > +static __always_inline void low_addr_access_begin(void)
> > > +{
> > > + /* Note: a barrier is implicit in alternative() */
> > > + alternative("", __ASM_STAC, X86_FEATURE_LASS);
> > > +}
> > > +
> > > +static __always_inline void low_addr_access_end(void)
> > > +{
> > > + /* Note: a barrier is implicit in alternative() */
> > > + alternative("", __ASM_CLAC, X86_FEATURE_LASS);
> > > +}
> >
> > Can't say I like the name.
> Indeed, there are alternative ways to name the functions. for example,
> enable_kernel_lass()/disable_kernel_lass(), or simply keep no change to use
> stac()/clac().
>
> I choose this name because it is straight forward to the purpose and helps
> in understanding when to use the functions.

Given we normally refer to the kernel address space as negative, it is
somewhat confusing.

lass_access_{begin,end}()

or something might be better names.

> Also if you look at bit 63 as a sign bit,
> > it's actively wrong since -1 is lower than 0.
> This could be a trade-off choice. While considering address manipulation
> and calculation, it is likely an unsigned. I would be happy to get input for
> better naming.

Note that Documentation/x86/x86_64/mm.rst likes to refer to the kernel
range as negative.

Also things like making a canonical address use sign-extention.

> > This is horrific tinkering :/
> >
> This part seems difficult to have a perfect solution since function call or
> function pointer inside the guard of instruction stac and clac will trigger
> objtool warning (stated the reasons in the commit msg)

Yeah, I'm familiar with that objtool warning, I wrote that particular check :-)

Still, this is a horrific solution. Adding something like
__inline_mem{set,cpy}() is a much saner option.

Something a little like the completely untested below.

---
diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 888731ccf1f6..f43fc2d9b182 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -23,6 +23,16 @@ extern void *memcpy(void *to, const void *from, size_t len);
#endif
extern void *__memcpy(void *to, const void *from, size_t len);

+static __always_inline void *__inline_memcpy(void *to, const void *from, size_t len)
+{
+ void *ret = to;
+
+ asm volatile ("rep movsb"
+ : "+D" (to), "+S" (from), "+c" (len)
+ : : "memory");
+ return ret;
+}
+
#define __HAVE_ARCH_MEMSET
#if defined(__SANITIZE_MEMORY__) && defined(__NO_FORTIFY)
extern void *__msan_memset(void *s, int c, size_t n);
@@ -33,6 +43,17 @@ void *memset(void *s, int c, size_t n);
#endif
void *__memset(void *s, int c, size_t n);

+static __always_inline void *__inline_memset(void *s, int v, size_t n)
+{
+ void *ret = s;
+
+ asm volatile("rep stosb"
+ : "+D" (s), "+c" (n)
+ : "a" ((uint8_t)v)
+ : "memory");
+ return ret;
+}
+
#define __HAVE_ARCH_MEMSET16
static inline void *memset16(uint16_t *s, uint16_t v, size_t n)
{

2023-01-12 01:02:43

by Chen, Yian

[permalink] [raw]
Subject: Re: [PATCH 3/7] x86/cpu: Disable kernel LASS when patching kernel alternatives



On 1/10/2023 2:41 PM, Sohil Mehta wrote:
> On 1/9/2023 9:52 PM, Yian Chen wrote:
>
>> LASS stops access to a lower half address in kernel,
>> and this can be deactivated if AC bit in EFLAGS
>> register is set. Hence use stac and clac instructions
>> around access to the address to avoid triggering a
>> LASS #GP fault.
>>
>
> It seems we are implicitly relying on the on stac() and clac() calls
> that are added for SMAP. Have you tried running with SMAP disabled  i.e
> "clearcpuid=smap"?
>
Yes, I tested with clearcpuid=smap.

> I believe there needs to be a dependency between LASS and SMAP.
>
Yes, In kernel mode, LASS depends on SMAP to work. And in user mode, it
doesn't, so the dependency description in following may miss user space
effect.

> diff --git a/arch/x86/kernel/cpu/cpuid-deps.c
> b/arch/x86/kernel/cpu/cpuid-deps.c
> index d95221117129..00bc7e4a65d2 100644
> --- a/arch/x86/kernel/cpu/cpuid-deps.c
> +++ b/arch/x86/kernel/cpu/cpuid-deps.c
> @@ -79,6 +79,7 @@ static const struct cpuid_dep cpuid_deps[] = {
>      { X86_FEATURE_XFD,            X86_FEATURE_XSAVES    },
>      { X86_FEATURE_XFD,            X86_FEATURE_XGETBV1   },
>      { X86_FEATURE_AMX_TILE,            X86_FEATURE_XFD       },
> +    { X86_FEATURE_LASS,            X86_FEATURE_SMAP      },
>      {}
>  };

Thanks,
Yian

2023-01-12 01:08:39

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 3/7] x86/cpu: Disable kernel LASS when patching kernel alternatives

On 1/11/23 16:27, Chen, Yian wrote:
>> It seems we are implicitly relying on the on stac() and clac()
>> calls that are added for SMAP. Have you tried running with SMAP
>> disabled i.e "clearcpuid=smap"?
>>
> Yes, I tested with clearcpuid=smap.
It works by accident, then.

clearcpuid=smap means that the kernel should be running as if
CPUID.(EAX=07H,ECX=0H):EBX.SMAP[bit 20]==0. STAC/CLAC should #UD in
that case.

The only reason that it happens to work is that STAC/CLAC apparently
actually continue to work even if CR4.SMAP==0.

I'm actually a _bit_ surprised by this, but I bet there's a good reason
for it.

In any case, please just make LASS dependent on SMAP. It's the right
thing to do on several levels.

2023-01-12 19:28:36

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 3/7] x86/cpu: Disable kernel LASS when patching kernel alternatives

On 1/12/23 10:36, Chen, Yian wrote:
> On 1/11/2023 4:37 PM, Dave Hansen wrote:
>> On 1/11/23 16:27, Chen, Yian wrote:
>>>> It seems we are implicitly relying on the on stac() and clac()
>>>> calls that are added for SMAP. Have you tried running with SMAP
>>>> disabled i.e "clearcpuid=smap"?
>>>>
>>> Yes, I tested with clearcpuid=smap.
>> It works by accident, then.
>>
>> clearcpuid=smap means that the kernel should be running as if
>> CPUID.(EAX=07H,ECX=0H):EBX.SMAP[bit 20]==0.  STAC/CLAC should #UD in
>> that case.
>>
> It could be something wrong in my Simics simulation environment.

Also, Andy Cooper made a very good point: when the kernel enables
paging, it's running with a low address so that the instruction pointer
stays valid as paging becomes enabled.

But, if LASS were enabled and enforced at that point, you'd get a LASS
fault and go kablooey. Can you see what simics does in that case, and
also make sure we're clearing CR4.LASS when enabling paging? That would
obviate the need to do it later in C code too.

2023-01-12 19:41:55

by Chen, Yian

[permalink] [raw]
Subject: Re: [PATCH 3/7] x86/cpu: Disable kernel LASS when patching kernel alternatives



On 1/11/2023 4:37 PM, Dave Hansen wrote:
> On 1/11/23 16:27, Chen, Yian wrote:
>>> It seems we are implicitly relying on the on stac() and clac()
>>> calls that are added for SMAP. Have you tried running with SMAP
>>> disabled i.e "clearcpuid=smap"?
>>>
>> Yes, I tested with clearcpuid=smap.
> It works by accident, then.
>
> clearcpuid=smap means that the kernel should be running as if
> CPUID.(EAX=07H,ECX=0H):EBX.SMAP[bit 20]==0. STAC/CLAC should #UD in
> that case.
>
It could be something wrong in my Simics simulation environment.

> The only reason that it happens to work is that STAC/CLAC apparently
> actually continue to work even if CR4.SMAP==0.
>
> I'm actually a _bit_ surprised by this, but I bet there's a good reason
> for it.
>
> In any case, please just make LASS dependent on SMAP. It's the right
> thing to do on several levels.
Sure, I will add the dependency.

2023-02-01 02:11:41

by Sohil Mehta

[permalink] [raw]
Subject: Re: [PATCH 3/7] x86/cpu: Disable kernel LASS when patching kernel alternatives

On 1/11/2023 4:37 PM, Dave Hansen wrote:
> clearcpuid=smap means that the kernel should be running as if
> CPUID.(EAX=07H,ECX=0H):EBX.SMAP[bit 20]==0. STAC/CLAC should #UD in
> that case.
>
> The only reason that it happens to work is that STAC/CLAC apparently
> actually continue to work even if CR4.SMAP==0.
>

It seems this is by design. I was trying to experiment with
clearcpuid=smap on an older platform (KBL). I noticed that even if
CR4.SMAP==0 the STAC/CLAC instructions do not fault.

The STAC instruction operation in the SDM also suggests that it may be
intentional. It does *not* list CR4.SMAP=0 as a reason for #UD.

#UD If the LOCK prefix is used.
If the CPL > 0.
If CPUID.(EAX=07H, ECX=0H):EBX.SMAP[bit 20] = 0.


> I'm actually a _bit_ surprised by this, but I bet there's a good reason
> for it.
>

I would love to find out the actual reasoning behind this. I'll try to
poke some of the architects internally.

> In any case, please just make LASS dependent on SMAP. It's the right
> thing to do on several levels.

Anyway, the end result still remains the same. We should still make LASS
dependent on SMAP since:

1) The spec says that LASS enforcement only happens when SMAP is enabled.
"A supervisor-mode data access causes a LASS violation only if
supervisor-mode access protection is enabled (because CR4.SMAP = 1)"

2) In the extremely improbably case that LASS is available but SMAP is
not available on the hardware, the STAC and CLAC instructions will fault
due to the missing SMAP CPUID bit.

-Sohil

2023-02-01 02:25:16

by Sohil Mehta

[permalink] [raw]
Subject: Re: [PATCH 3/7] x86/cpu: Disable kernel LASS when patching kernel alternatives

On 1/12/2023 10:48 AM, Dave Hansen wrote:
> Also, Andy Cooper made a very good point: when the kernel enables
> paging, it's running with a low address so that the instruction pointer
> stays valid as paging becomes enabled.
>
> But, if LASS were enabled and enforced at that point, you'd get a LASS
> fault and go kablooey. Can you see what simics does in that case, and
> also make sure we're clearing CR4.LASS when enabling paging? That would
> obviate the need to do it later in C code too.

CR4 and CR0 are always restored to a known state during kexec. So,
running with LASS enabled should not happen during early boot.

machine_kexec()
-> relocate_kernel()
-> identity_mapped()

> /*
> * Set cr0 to a known state:
> * - Paging enabled
> * - Alignment check disabled
> * - Write protect disabled
> * - No task switch
> * - Don't do FP software emulation.
> * - Protected mode enabled
> */
> movq %cr0, %rax
> andq $~(X86_CR0_AM | X86_CR0_WP | X86_CR0_TS | X86_CR0_EM), %rax
> orl $(X86_CR0_PG | X86_CR0_PE), %eax
> movq %rax, %cr0
>
> /*
> * Set cr4 to a known state:
> * - physical address extension enabled
> * - 5-level paging, if it was enabled before
> */
> movl $X86_CR4_PAE, %eax
> testq $X86_CR4_LA57, %r13
> jz 1f
> orl $X86_CR4_LA57, %eax
> 1:
> movq %rax, %cr4
>
> jmp 1f
> 1:

Dave, does this address your concern or were you looking for something
else? Is there some path other than kexec that should also be audited
for this scenario?



2023-02-01 18:28:08

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 3/7] x86/cpu: Disable kernel LASS when patching kernel alternatives

On 1/31/23 18:25, Sohil Mehta wrote:
>> /*
>> * Set cr4 to a known state:
>> * - physical address extension enabled
>> * - 5-level paging, if it was enabled before
>> */
>> movl $X86_CR4_PAE, %eax
>> testq $X86_CR4_LA57, %r13
>> jz 1f
>> orl $X86_CR4_LA57, %eax
>> 1:
>> movq %rax, %cr4
>>
>> jmp 1f
>> 1:
> Dave, does this address your concern or were you looking for something
> else? Is there some path other than kexec that should also be audited
> for this scenario?

Yep, that addresses it. I don't know of any other path that would
matter. Couldn't hurt to poke around and look for other CR4
manipulation that might need to be LASS-aware, though.