Due to some historical confusion, arm64's current_top_of_stack() isn't
what the stackleak code expects. This could in theory result in a number
of problems, and practically results in an unnecessary performance hit.
We can avoid this by aligning the arm64 implementation with the x86
implementation.
The arm64 implementation of current_top_of_stack() was added
specifically for stackleak in commit:
0b3e336601b82c6a ("arm64: Add support for STACKLEAK gcc plugin")
This was intended to be equivalent to the x86 implementation, but the
implementation, semantics, and performance characteristics differ
wildly:
* On x86, current_top_of_stack() returns the top of the current task's
task stack, regardless of which stack is in active use.
The implementation accesses a percpu variable which the x86 entry code
maintains, and returns the location immediately above the pt_regs on
the task stack (above which x86 has some padding).
* On arm64 current_top_of_stack() returns the top of the stack in active
use (i.e. the one which is currently being used).
The implementation checks the SP against a number of
potentially-accessible stacks, and will BUG() if no stack is found.
The core stackleak_erase() code determines the upper bound of stack to
erase with:
| if (on_thread_stack())
| boundary = current_stack_pointer;
| else
| boundary = current_top_of_stack();
On arm64 stackleak_erase() is always called on a task stack, and
on_thread_stack() should always be true. On x86, stackleak_erase() is
mostly called on a trampoline stack, and is sometimes called on a task
stack.
Currently, this results in a lot of unnecessary code being generated for
arm64 for the impossible !on_thread_stack() case. Some of this is
inlined, bloating stackleak_erase(), while portions of this are left
out-of-line and permitted to be instrumented (which would be a
functional problem if that code were reachable).
As a first step towards improving this, this patch aligns arm64's
implementation of current_top_of_stack() with x86's, always returning
the top of the current task's stack. With GCC 11.1.0 this results in the
bulk of the unnecessary code being removed, including all of the
out-of-line instrumentable code.
While I don't believe there's a functional problem in practice I've
marked this as a fix since the semantic was clearly wrong, the fix
itself is simple, and other code might rely upon this in future.
Fixes: 0b3e336601b82c6a ("arm64: Add support for STACKLEAK gcc plugin")
Signed-off-by: Mark Rutland <[email protected]>
Cc: Alexander Popov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/include/asm/processor.h | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 73e38d9a540ce..6b1a12c23fe77 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -381,12 +381,10 @@ long get_tagged_addr_ctrl(struct task_struct *task);
* of header definitions for the use of task_stack_page.
*/
-#define current_top_of_stack() \
-({ \
- struct stack_info _info; \
- BUG_ON(!on_accessible_stack(current, current_stack_pointer, 1, &_info)); \
- _info.high; \
-})
+/*
+ * The top of the current task's task stack
+ */
+#define current_top_of_stack() ((unsigned long)current->stack + THREAD_SIZE)
#define on_thread_stack() (on_task_stack(current, current_stack_pointer, 1, NULL))
#endif /* __ASSEMBLY__ */
--
2.30.2
On Wed, May 04, 2022 at 05:41:32PM +0100, Catalin Marinas wrote:
> On Wed, Apr 27, 2022 at 06:31:16PM +0100, Mark Rutland wrote:
> > [...]
> > Fixes: 0b3e336601b82c6a ("arm64: Add support for STACKLEAK gcc plugin")
> > Signed-off-by: Mark Rutland <[email protected]>
> > Cc: Alexander Popov <[email protected]>
> > Cc: Andrew Morton <[email protected]>
> > Cc: Andy Lutomirski <[email protected]>
> > Cc: Catalin Marinas <[email protected]>
> > Cc: Kees Cook <[email protected]>
> > Cc: Will Deacon <[email protected]>
>
> I thought this was queued already but I couldn't find it in -next. So:
>
> Acked-by: Catalin Marinas <[email protected]>
Should this patch go via the arm64 tree for -rc6, or should I just carry
it as part of the overall stackleak series?
--
Kees Cook
On Wed, May 04, 2022 at 08:55:39PM +0100, Catalin Marinas wrote:
> On Wed, May 04, 2022 at 12:01:11PM -0700, Kees Cook wrote:
> > On Wed, May 04, 2022 at 05:41:32PM +0100, Catalin Marinas wrote:
> > > On Wed, Apr 27, 2022 at 06:31:16PM +0100, Mark Rutland wrote:
> > > > [...]
> > > > Fixes: 0b3e336601b82c6a ("arm64: Add support for STACKLEAK gcc plugin")
> > > > Signed-off-by: Mark Rutland <[email protected]>
> > > > Cc: Alexander Popov <[email protected]>
> > > > Cc: Andrew Morton <[email protected]>
> > > > Cc: Andy Lutomirski <[email protected]>
> > > > Cc: Catalin Marinas <[email protected]>
> > > > Cc: Kees Cook <[email protected]>
> > > > Cc: Will Deacon <[email protected]>
> > >
> > > I thought this was queued already but I couldn't find it in -next. So:
> > >
> > > Acked-by: Catalin Marinas <[email protected]>
> >
> > Should this patch go via the arm64 tree for -rc6, or should I just carry
> > it as part of the overall stackleak series?
>
> I'll leave this up to Will (we take turns in managing the kernel
> releases) but it doesn't look urgent at all to me since it fixes a
> commit in 4.19.
Agreed, and nobody has actually experienced a problem with the current code
afaict, so I'd prefer to leave this with the rest of the series rather than
run the risk of a late regression.
Will
Hi Mark!
On 27.04.2022 20:31, Mark Rutland wrote:
> Due to some historical confusion, arm64's current_top_of_stack() isn't
> what the stackleak code expects. This could in theory result in a number
> of problems, and practically results in an unnecessary performance hit.
> We can avoid this by aligning the arm64 implementation with the x86
> implementation.
>
> The arm64 implementation of current_top_of_stack() was added
> specifically for stackleak in commit:
>
> 0b3e336601b82c6a ("arm64: Add support for STACKLEAK gcc plugin")
>
> This was intended to be equivalent to the x86 implementation, but the
> implementation, semantics, and performance characteristics differ
> wildly:
>
> * On x86, current_top_of_stack() returns the top of the current task's
> task stack, regardless of which stack is in active use.
>
> The implementation accesses a percpu variable which the x86 entry code
> maintains, and returns the location immediately above the pt_regs on
> the task stack (above which x86 has some padding).
>
> * On arm64 current_top_of_stack() returns the top of the stack in active
> use (i.e. the one which is currently being used).
>
> The implementation checks the SP against a number of
> potentially-accessible stacks, and will BUG() if no stack is found.
As I could understand, for arm64, calling stackleak_erase() not from the thread
stack would bring troubles because current_top_of_stack() would return an
unexpected address from a foreign stack. Is this correct?
But this bug doesn't happen because arm64 always calls stackleak_erase() from
the current thread stack. Right?
> The core stackleak_erase() code determines the upper bound of stack to
> erase with:
>
> | if (on_thread_stack())
> | boundary = current_stack_pointer;
> | else
> | boundary = current_top_of_stack();
>
> On arm64 stackleak_erase() is always called on a task stack, and
> on_thread_stack() should always be true. On x86, stackleak_erase() is
> mostly called on a trampoline stack, and is sometimes called on a task
> stack.
>
> Currently, this results in a lot of unnecessary code being generated for
> arm64 for the impossible !on_thread_stack() case. Some of this is
> inlined, bloating stackleak_erase(), while portions of this are left
> out-of-line and permitted to be instrumented (which would be a
> functional problem if that code were reachable).
Sorry, I didn't understand this part about instrumentation. Could you elaborate
please?
> As a first step towards improving this, this patch aligns arm64's
> implementation of current_top_of_stack() with x86's, always returning
> the top of the current task's stack. With GCC 11.1.0 this results in the
> bulk of the unnecessary code being removed, including all of the
> out-of-line instrumentable code.
>
> While I don't believe there's a functional problem in practice I've
> marked this as a fix since the semantic was clearly wrong, the fix
> itself is simple, and other code might rely upon this in future.
>
> Fixes: 0b3e336601b82c6a ("arm64: Add support for STACKLEAK gcc plugin")
> Signed-off-by: Mark Rutland <[email protected]>
> Cc: Alexander Popov <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Will Deacon <[email protected]>
> ---
> arch/arm64/include/asm/processor.h | 10 ++++------
> 1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
> index 73e38d9a540ce..6b1a12c23fe77 100644
> --- a/arch/arm64/include/asm/processor.h
> +++ b/arch/arm64/include/asm/processor.h
> @@ -381,12 +381,10 @@ long get_tagged_addr_ctrl(struct task_struct *task);
> * of header definitions for the use of task_stack_page.
> */
>
> -#define current_top_of_stack() \
> -({ \
> - struct stack_info _info; \
> - BUG_ON(!on_accessible_stack(current, current_stack_pointer, 1, &_info)); \
> - _info.high; \
> -})
> +/*
> + * The top of the current task's task stack
> + */
> +#define current_top_of_stack() ((unsigned long)current->stack + THREAD_SIZE)
> #define on_thread_stack() (on_task_stack(current, current_stack_pointer, 1, NULL))
>
> #endif /* __ASSEMBLY__ */
On Wed, Apr 27, 2022 at 06:31:16PM +0100, Mark Rutland wrote:
> Due to some historical confusion, arm64's current_top_of_stack() isn't
> what the stackleak code expects. This could in theory result in a number
> of problems, and practically results in an unnecessary performance hit.
> We can avoid this by aligning the arm64 implementation with the x86
> implementation.
>
> The arm64 implementation of current_top_of_stack() was added
> specifically for stackleak in commit:
>
> 0b3e336601b82c6a ("arm64: Add support for STACKLEAK gcc plugin")
>
> This was intended to be equivalent to the x86 implementation, but the
> implementation, semantics, and performance characteristics differ
> wildly:
>
> * On x86, current_top_of_stack() returns the top of the current task's
> task stack, regardless of which stack is in active use.
>
> The implementation accesses a percpu variable which the x86 entry code
> maintains, and returns the location immediately above the pt_regs on
> the task stack (above which x86 has some padding).
>
> * On arm64 current_top_of_stack() returns the top of the stack in active
> use (i.e. the one which is currently being used).
>
> The implementation checks the SP against a number of
> potentially-accessible stacks, and will BUG() if no stack is found.
>
> The core stackleak_erase() code determines the upper bound of stack to
> erase with:
>
> | if (on_thread_stack())
> | boundary = current_stack_pointer;
> | else
> | boundary = current_top_of_stack();
>
> On arm64 stackleak_erase() is always called on a task stack, and
> on_thread_stack() should always be true. On x86, stackleak_erase() is
> mostly called on a trampoline stack, and is sometimes called on a task
> stack.
>
> Currently, this results in a lot of unnecessary code being generated for
> arm64 for the impossible !on_thread_stack() case. Some of this is
> inlined, bloating stackleak_erase(), while portions of this are left
> out-of-line and permitted to be instrumented (which would be a
> functional problem if that code were reachable).
>
> As a first step towards improving this, this patch aligns arm64's
> implementation of current_top_of_stack() with x86's, always returning
> the top of the current task's stack. With GCC 11.1.0 this results in the
> bulk of the unnecessary code being removed, including all of the
> out-of-line instrumentable code.
>
> While I don't believe there's a functional problem in practice I've
> marked this as a fix since the semantic was clearly wrong, the fix
> itself is simple, and other code might rely upon this in future.
>
> Fixes: 0b3e336601b82c6a ("arm64: Add support for STACKLEAK gcc plugin")
> Signed-off-by: Mark Rutland <[email protected]>
> Cc: Alexander Popov <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Will Deacon <[email protected]>
I thought this was queued already but I couldn't find it in -next. So:
Acked-by: Catalin Marinas <[email protected]>
On Wed, May 04, 2022 at 12:01:11PM -0700, Kees Cook wrote:
> On Wed, May 04, 2022 at 05:41:32PM +0100, Catalin Marinas wrote:
> > On Wed, Apr 27, 2022 at 06:31:16PM +0100, Mark Rutland wrote:
> > > [...]
> > > Fixes: 0b3e336601b82c6a ("arm64: Add support for STACKLEAK gcc plugin")
> > > Signed-off-by: Mark Rutland <[email protected]>
> > > Cc: Alexander Popov <[email protected]>
> > > Cc: Andrew Morton <[email protected]>
> > > Cc: Andy Lutomirski <[email protected]>
> > > Cc: Catalin Marinas <[email protected]>
> > > Cc: Kees Cook <[email protected]>
> > > Cc: Will Deacon <[email protected]>
> >
> > I thought this was queued already but I couldn't find it in -next. So:
> >
> > Acked-by: Catalin Marinas <[email protected]>
>
> Should this patch go via the arm64 tree for -rc6, or should I just carry
> it as part of the overall stackleak series?
I'll leave this up to Will (we take turns in managing the kernel
releases) but it doesn't look urgent at all to me since it fixes a
commit in 4.19.
--
Catalin
On Sun, May 08, 2022 at 08:24:38PM +0300, Alexander Popov wrote:
> Hi Mark!
>
> On 27.04.2022 20:31, Mark Rutland wrote:
> > Due to some historical confusion, arm64's current_top_of_stack() isn't
> > what the stackleak code expects. This could in theory result in a number
> > of problems, and practically results in an unnecessary performance hit.
> > We can avoid this by aligning the arm64 implementation with the x86
> > implementation.
> >
> > The arm64 implementation of current_top_of_stack() was added
> > specifically for stackleak in commit:
> >
> > 0b3e336601b82c6a ("arm64: Add support for STACKLEAK gcc plugin")
> >
> > This was intended to be equivalent to the x86 implementation, but the
> > implementation, semantics, and performance characteristics differ
> > wildly:
> >
> > * On x86, current_top_of_stack() returns the top of the current task's
> > task stack, regardless of which stack is in active use.
> >
> > The implementation accesses a percpu variable which the x86 entry code
> > maintains, and returns the location immediately above the pt_regs on
> > the task stack (above which x86 has some padding).
> >
> > * On arm64 current_top_of_stack() returns the top of the stack in active
> > use (i.e. the one which is currently being used).
> >
> > The implementation checks the SP against a number of
> > potentially-accessible stacks, and will BUG() if no stack is found.
>
> As I could understand, for arm64, calling stackleak_erase() not from the
> thread stack would bring troubles because current_top_of_stack() would
> return an unexpected address from a foreign stack. Is this correct?
Yes.
> But this bug doesn't happen because arm64 always calls stackleak_erase()
> from the current thread stack. Right?
Yes.
> > The core stackleak_erase() code determines the upper bound of stack to
> > erase with:
> >
> > | if (on_thread_stack())
> > | boundary = current_stack_pointer;
> > | else
> > | boundary = current_top_of_stack();
> >
> > On arm64 stackleak_erase() is always called on a task stack, and
> > on_thread_stack() should always be true. On x86, stackleak_erase() is
> > mostly called on a trampoline stack, and is sometimes called on a task
> > stack.
> >
> > Currently, this results in a lot of unnecessary code being generated for
> > arm64 for the impossible !on_thread_stack() case. Some of this is
> > inlined, bloating stackleak_erase(), while portions of this are left
> > out-of-line and permitted to be instrumented (which would be a
> > functional problem if that code were reachable).
>
> Sorry, I didn't understand this part about instrumentation. Could you
> elaborate please?
Portions of the code are regular .text, and are subject to instrumentation by
KASAN/UBSAN/KCOV, ftrace, etc, where the compiler will (implicitly) insert
calls to out-of-line instrumentation callbacks. Some (but not all) of those are
disabled in the Makefile. For example, ftrace instrumentation is possible.
Generally, the instrumentation callbacks expect to run with a full kernel
environment (e.g. with RCU watching, IRQ tracing state correct), but at the
time stackleak_erase() is called, this is not the case. so those could go wrong
and corrupt state.
Additionally, since those calls are added implicitly by the compiler, they can
manipulate state at arbitrary points throughout the function where we might not
expect it (e.g. changing current->lowest_stack).
The general stance is that we should use noinstr to disable instrumentation,
and anything that needs to be inlined into noinstr needs to be marked with
__always_inline (which is guaranteed to either inline or cause a compile-time
error if it is not possible to inline).
This patch reworks things to avoid the potential problems; as per the commit
message I don't beleive anything goes wrong in practice today.
Thanks,
Mark.
> > As a first step towards improving this, this patch aligns arm64's
> > implementation of current_top_of_stack() with x86's, always returning
> > the top of the current task's stack. With GCC 11.1.0 this results in the
> > bulk of the unnecessary code being removed, including all of the
> > out-of-line instrumentable code.
> >
> > While I don't believe there's a functional problem in practice I've
> > marked this as a fix since the semantic was clearly wrong, the fix
> > itself is simple, and other code might rely upon this in future.
> >
> > Fixes: 0b3e336601b82c6a ("arm64: Add support for STACKLEAK gcc plugin")
> > Signed-off-by: Mark Rutland <[email protected]>
> > Cc: Alexander Popov <[email protected]>
> > Cc: Andrew Morton <[email protected]>
> > Cc: Andy Lutomirski <[email protected]>
> > Cc: Catalin Marinas <[email protected]>
> > Cc: Kees Cook <[email protected]>
> > Cc: Will Deacon <[email protected]>
> > ---
> > arch/arm64/include/asm/processor.h | 10 ++++------
> > 1 file changed, 4 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
> > index 73e38d9a540ce..6b1a12c23fe77 100644
> > --- a/arch/arm64/include/asm/processor.h
> > +++ b/arch/arm64/include/asm/processor.h
> > @@ -381,12 +381,10 @@ long get_tagged_addr_ctrl(struct task_struct *task);
> > * of header definitions for the use of task_stack_page.
> > */
> > -#define current_top_of_stack() \
> > -({ \
> > - struct stack_info _info; \
> > - BUG_ON(!on_accessible_stack(current, current_stack_pointer, 1, &_info)); \
> > - _info.high; \
> > -})
> > +/*
> > + * The top of the current task's task stack
> > + */
> > +#define current_top_of_stack() ((unsigned long)current->stack + THREAD_SIZE)
> > #define on_thread_stack() (on_task_stack(current, current_stack_pointer, 1, NULL))
> > #endif /* __ASSEMBLY__ */
>