2022-04-25 17:26:16

by Mark Rutland

[permalink] [raw]
Subject: [PATCH 0/8] stackleak: fixes and rework

This series reworks the stackleak code. The first patch fixes some
latent issues on arm64, and the subsequent patches improve the code to
improve clarity and permit better code generation.

I started working on this as a tangent from rework to arm64's stacktrace
code. Looking at users of the `on_*_stack()` helpers I noticed that the
assembly generated for stackleak was particularly awful as it performed
a lot of redundant work and also called instrumentable code, which isn't
sound.

The first patch fixes the major issues on arm64, and is Cc'd to stable
for backporting.

The second patch is a trivial optimization for when stackleak is
dynamically disabled.

The subsequent patches rework the way stackleak manipulates the stack
boundary values. This is partically for clarity (e.g. with separate
'low' and 'high' boundary variables), and also permits the compiler to
generate more optimal assembly by generating the high and low bounds
from the same base.

Patch 5 changes the way that `current->lowest_stack` is reset prior to
return to userspace. The existing code uses an undocumented offset
relative to the top of the stack which doesn't make much sense (as thie
sometimes falls within the task's pt_regs, or sometimes adds 600+ bytes
to erase upon the next exit to userspace). For now I've removed the
offset entirely.

Patch 7 adds stackleak_erase_on_task_stack() and
stackleak_erase_off_task_stack() that can be used when a caller knows
they're always on or off the task stack respectively, avoiding redundant
logic to check this and generate the high boundary value. On arm64 we
always call stackleak_erase() while on the task stack, so this is used
in patch 8.

Testing the series on arm64 with a QEMU HVF VM on an M1 Macbook Pro with
a few microbenchmarks shows a small but measureable improvement when
stackleak is enabled (relative to v5.18-rc1):

* Calling getpid 1^22 times in a loop (avg 50 runs)

Before: 0.652099387 seconds ( +- 0.13% )
After: 0.641005661 seconds ( +- 0.13% )

~1.7% time decrease

* perf bench sched pipe (single run)

Before: 2.138 seconds total
After: 2.118 seconds total

~0.93% time decrease

I also tested "perf bench sched messaging" but the noise outweighed the
difference.

While the improvement is small, I think the improvement to clarity and
code generation is a win regardless.

Thanks,
Mark.

Mark Rutland (8):
arm64: stackleak: fix current_top_of_stack()
stackleak: move skip_erasing() check earlier
stackleak: rework stack low bound handling
stackleak: clarify variable names
stackleak: rework stack high bound handling
stackleak: remove redundant check
stackleak: add on/off stack variants
arm64: entry: use stackleak_erase_on_task_stack()

arch/arm64/include/asm/processor.h | 10 ++-
arch/arm64/kernel/entry.S | 2 +-
include/linux/stackleak.h | 29 ++++++++-
kernel/stackleak.c | 99 ++++++++++++++++++++----------
4 files changed, 98 insertions(+), 42 deletions(-)

--
2.30.2


2022-04-25 18:55:44

by Mark Rutland

[permalink] [raw]
Subject: [PATCH 5/8] stackleak: rework stack high bound handling

Prior to returning to userpace, we reset current->lowest_stack to a
reasonable high bound. Currently we do this by subtracting the arbitrary
value `THREAD_SIZE/64` from the top of the stack, for reasons lost to
history.

Looking at configurations today:

* On i386 where THREAD_SIZE is 8K, the bound will be 128 bytes. The
pt_regs at the top of the stack is 68 bytes (with 0 to 16 bytes of
padding above), and so this covers an additional portion of 44 to 60
bytes.

* On x86_64 where THREAD_SIZE is at least 16K (up to 32K with KASAN) the
bound will be at least 256 bytes (up to 512 with KASAN). The pt_regs
at the top of the stack is 168 bytes, and so this cover an additional
88 bytes of stack (up to 344 with KASAN).

* On arm64 where THREAD_SIZE is at least 16K (up to 64K with 64K pages
and VMAP_STACK), the bound will be at least 256 bytes (up to 1024 with
KASAN). The pt_regs at the top of the stack is 336 bytes, so this can
fall within the pt_regs, or can cover an additional 688 bytes of
stack.

Clearly the `THREAD_SIZE/64` value doesn't make much sense -- in the
worst case, this will cause more than 600 bytes of stack to be erased
for every syscall, even if actual stack usage were substantially
smaller.

This patches makes this slightly less nonsensical by consistently
resetting current->lowest_stack to the base of the task pt_regs. For
clarity and for consistency with the handling of the low bound, the
generation of the high bound is split into a helper with commentary
explaining why.

Since the pt_regs at the top of the stack will be clobbered upon the
next exception entry, we don't need to poison these at exception exit.
By using task_pt_regs() as the high stack boundary instead of
current_top_of_stack() we avoid some redundant poisoning, and the
compiler can share the address generation between the poisoning and
restting of `current->lowest_stack`, making the generated code more
optimal.

It's not clear to me whether the existing `THREAD_SIZE/64` offset was a
dodgy heuristic to skip the pt_regs, or whether it was attempting to
minimize the number of times stackleak_check_stack() would have to
update `current->lowest_stack` when stack usage was shallow at the cost
of unconditionally poisoning a small portion of the stack for every exit
to userspace.

For now I've simply removed the offset, and if we need/want to minimize
updates for shallow stack usage it should be easy to add a better
heuristic atop, with appropriate commentary so we know what's going on.

Signed-off-by: Mark Rutland <[email protected]>
Cc: Alexander Popov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Kees Cook <[email protected]>
---
include/linux/stackleak.h | 14 ++++++++++++++
kernel/stackleak.c | 19 ++++++++++++++-----
2 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/include/linux/stackleak.h b/include/linux/stackleak.h
index 67430faa5c518..467661aeb4136 100644
--- a/include/linux/stackleak.h
+++ b/include/linux/stackleak.h
@@ -28,6 +28,20 @@ stackleak_task_low_bound(const struct task_struct *tsk)
return (unsigned long)end_of_stack(tsk) + sizeof(unsigned long);
}

+/*
+ * The address immediately after the highest address on tsk's stack which we
+ * can plausibly erase.
+ */
+static __always_inline unsigned long
+stackleak_task_high_bound(const struct task_struct *tsk)
+{
+ /*
+ * The task's pt_regs lives at the top of the task stack and will be
+ * overwritten by exception entry, so there's no need to erase them.
+ */
+ return (unsigned long)task_pt_regs(tsk);
+}
+
static inline void stackleak_task_init(struct task_struct *t)
{
t->lowest_stack = stackleak_task_low_bound(t);
diff --git a/kernel/stackleak.c b/kernel/stackleak.c
index 8fbc1e4c21435..f597d3323729d 100644
--- a/kernel/stackleak.c
+++ b/kernel/stackleak.c
@@ -73,6 +73,7 @@ late_initcall(stackleak_sysctls_init);
static __always_inline void __stackleak_erase(void)
{
const unsigned long task_stack_low = stackleak_task_low_bound(current);
+ const unsigned long task_stack_high = stackleak_task_high_bound(current);
unsigned long erase_low = current->lowest_stack;
unsigned long erase_high;
unsigned int poison_count = 0;
@@ -97,14 +98,22 @@ static __always_inline void __stackleak_erase(void)
#endif

/*
- * Now write the poison value to the kernel stack between 'erase_low'
- * and 'erase_high'. We assume that the stack pointer doesn't change
- * when we write poison.
+ * Write poison to the task's stack between 'erase_low' and
+ * 'erase_high'.
+ *
+ * If we're running on a different stack (e.g. an entry trampoline
+ * stack) we can erase everything below the pt_regs at the top of the
+ * task stack.
+ *
+ * If we're running on the task stack itself, we must not clobber any
+ * stack used by this function and its caller. We assume that this
+ * function has a fixed-size stack frame, and the current stack pointer
+ * doesn't change while we write poison.
*/
if (on_thread_stack())
erase_high = current_stack_pointer;
else
- erase_high = current_top_of_stack();
+ erase_high = task_stack_high;

while (erase_low < erase_high) {
*(unsigned long *)erase_low = STACKLEAK_POISON;
@@ -112,7 +121,7 @@ static __always_inline void __stackleak_erase(void)
}

/* Reset the 'lowest_stack' value for the next syscall */
- current->lowest_stack = current_top_of_stack() - THREAD_SIZE/64;
+ current->lowest_stack = task_stack_high;
}

asmlinkage void noinstr stackleak_erase(void)
--
2.30.2

2022-04-25 19:54:59

by Mark Rutland

[permalink] [raw]
Subject: [PATCH 1/8] arm64: stackleak: fix current_top_of_stack()

Due to some historical confusion, arm64's current_top_of_stack() isn't
what the stackleak code expects. This could in theory result in a number
of problems, and practically results in an unnecessary performance hit.
We can avoid this by aligning the arm64 implementation with the x86
implementation.

The arm64 implementation of current_top_of_stack() was added
specifically for stackleak in commit:

0b3e336601b82c6a ("arm64: Add support for STACKLEAK gcc plugin")

This was intended to be equivalent to the x86 implementation, but the
implementation, semantics, and performance characteristics differ
wildly:

* On x86, current_top_of_stack() returns the top of the current task's
task stack, regardless of which stack is in active use.

The implementation accesses a percpu variable which the x86 entry code
maintains, and returns the location immediately above the pt_regs on
the task stack (above which x86 has some padding).

* On arm64 current_top_of_stack() returns the top of the stack in active
use (i.e. the one which is currently being used).

The implementation checks the SP against a number of
potentially-accessible stacks, and will BUG() if no stack is found.

The core stackleak_erase() code determines the upper bound of stack to
erase with:

| if (on_thread_stack())
| boundary = current_stack_pointer;
| else
| boundary = current_top_of_stack();

On arm64 stackleak_erase() is always called on a task stack, and
on_thread_stack() should always be true. On x86, stackleak_erase() is
mostly called on a trampoline stack, and is sometimes called on a task
stack.

Currently, this results in a lot of unnecessary code being generated for
arm64 for the impossible !on_thread_stack() case. Some of this is
inlined, bloating stackleak_erase(), while portions of this are left
out-of-line and permitted to be instrumented (which would be a
functional problem if that code were reachable).

As a first step towards improving this, this patch aligns arm64's
implementation of current_top_of_stack() with x86's, always returning
the top of the current task's stack. With GCC 11.1.0 this results in the
bulk of the unnecessary code being removed, including all of the
out-of-line instrumentable code.

While I don't believe there's a functional problem in practice I've
marked this as a fix since the semantic was clearly wrong, the fix
itself is simple, and other code might rely upon this in future.

Fixes: 0b3e336601b82c6a ("arm64: Add support for STACKLEAK gcc plugin")
Signed-off-by: Mark Rutland <[email protected]>
Cc: Alexander Popov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/include/asm/processor.h | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 73e38d9a540ce..6b1a12c23fe77 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -381,12 +381,10 @@ long get_tagged_addr_ctrl(struct task_struct *task);
* of header definitions for the use of task_stack_page.
*/

-#define current_top_of_stack() \
-({ \
- struct stack_info _info; \
- BUG_ON(!on_accessible_stack(current, current_stack_pointer, 1, &_info)); \
- _info.high; \
-})
+/*
+ * The top of the current task's task stack
+ */
+#define current_top_of_stack() ((unsigned long)current->stack + THREAD_SIZE)
#define on_thread_stack() (on_task_stack(current, current_stack_pointer, 1, NULL))

#endif /* __ASSEMBLY__ */
--
2.30.2

2022-04-25 20:25:42

by Mark Rutland

[permalink] [raw]
Subject: [PATCH 4/8] stackleak: clarify variable names

The logic within __stackleak_erase() can be a little hard to follow, as
`boundary` switches from being the low bound to the high bound mid way
through the function, and `kstack_ptr` is used to represent the start of
the region to erase while `boundary` represents the end of the region to
erase.

Make this a little clearer by consistently using clearer variable names.
The `boundary` variable is removed, the bounds of the region to erase
are described by `erase_low` and `erase_high`, and bounds of the task
stack are described by `task_stack_low` and `task_stck_high`.

As the same time, remove the comment above the variables, since it is
unclear whether it's intended as rationale, a complaint, or a TODO, and
is more confusing than helpful.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <[email protected]>
Cc: Alexander Popov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Kees Cook <[email protected]>
---
kernel/stackleak.c | 34 ++++++++++++++++------------------
1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/kernel/stackleak.c b/kernel/stackleak.c
index 0472956d9a2ce..8fbc1e4c21435 100644
--- a/kernel/stackleak.c
+++ b/kernel/stackleak.c
@@ -73,44 +73,42 @@ late_initcall(stackleak_sysctls_init);
static __always_inline void __stackleak_erase(void)
{
const unsigned long task_stack_low = stackleak_task_low_bound(current);
-
- /* It would be nice not to have 'kstack_ptr' and 'boundary' on stack */
- unsigned long kstack_ptr = current->lowest_stack;
- unsigned long boundary = task_stack_low;
+ unsigned long erase_low = current->lowest_stack;
+ unsigned long erase_high;
unsigned int poison_count = 0;
const unsigned int depth = STACKLEAK_SEARCH_DEPTH / sizeof(unsigned long);

/* Check that 'lowest_stack' value is sane */
- if (unlikely(kstack_ptr - boundary >= THREAD_SIZE))
- kstack_ptr = boundary;
+ if (unlikely(erase_low - task_stack_low >= THREAD_SIZE))
+ erase_low = task_stack_low;

/* Search for the poison value in the kernel stack */
- while (kstack_ptr > boundary && poison_count <= depth) {
- if (*(unsigned long *)kstack_ptr == STACKLEAK_POISON)
+ while (erase_low > task_stack_low && poison_count <= depth) {
+ if (*(unsigned long *)erase_low == STACKLEAK_POISON)
poison_count++;
else
poison_count = 0;

- kstack_ptr -= sizeof(unsigned long);
+ erase_low -= sizeof(unsigned long);
}

#ifdef CONFIG_STACKLEAK_METRICS
- current->prev_lowest_stack = kstack_ptr;
+ current->prev_lowest_stack = erase_low;
#endif

/*
- * Now write the poison value to the kernel stack. Start from
- * 'kstack_ptr' and move up till the new 'boundary'. We assume that
- * the stack pointer doesn't change when we write poison.
+ * Now write the poison value to the kernel stack between 'erase_low'
+ * and 'erase_high'. We assume that the stack pointer doesn't change
+ * when we write poison.
*/
if (on_thread_stack())
- boundary = current_stack_pointer;
+ erase_high = current_stack_pointer;
else
- boundary = current_top_of_stack();
+ erase_high = current_top_of_stack();

- while (kstack_ptr < boundary) {
- *(unsigned long *)kstack_ptr = STACKLEAK_POISON;
- kstack_ptr += sizeof(unsigned long);
+ while (erase_low < erase_high) {
+ *(unsigned long *)erase_low = STACKLEAK_POISON;
+ erase_low += sizeof(unsigned long);
}

/* Reset the 'lowest_stack' value for the next syscall */
--
2.30.2

2022-04-25 22:24:34

by Mark Rutland

[permalink] [raw]
Subject: [PATCH 8/8] arm64: entry: use stackleak_erase_on_task_stack()

On arm64 we always call stackleak_erase() on a task stack, and never
call it on another stack. We can avoid some redundant work by using
stackleak_erase_on_task_stack(), telling the stackleak code that it's
being called on a task stack.

Signed-off-by: Mark Rutland <[email protected]>
Cc: Alexander Popov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/kernel/entry.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index ede028dee81b0..5b82b92924005 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -596,7 +596,7 @@ SYM_CODE_START_LOCAL(ret_to_user)
ldr x19, [tsk, #TSK_TI_FLAGS] // re-check for single-step
enable_step_tsk x19, x2
#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
- bl stackleak_erase
+ bl stackleak_erase_on_task_stack
#endif
kernel_exit 0
SYM_CODE_END(ret_to_user)
--
2.30.2

2022-04-26 00:13:49

by Mark Rutland

[permalink] [raw]
Subject: [PATCH 6/8] stackleak: remove redundant check

In __stackleak_erase() we check that the `erase_low` value derived from
`current->lowest_stack` is above the lowest legitimate stack pointer
value, but this is already enforced by stackleak_track_stack() when
recording the lowest stack value.

Remove the redundant check.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <[email protected]>
Cc: Alexander Popov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Kees Cook <[email protected]>
---
kernel/stackleak.c | 4 ----
1 file changed, 4 deletions(-)

diff --git a/kernel/stackleak.c b/kernel/stackleak.c
index f597d3323729d..ba346d46218f5 100644
--- a/kernel/stackleak.c
+++ b/kernel/stackleak.c
@@ -79,10 +79,6 @@ static __always_inline void __stackleak_erase(void)
unsigned int poison_count = 0;
const unsigned int depth = STACKLEAK_SEARCH_DEPTH / sizeof(unsigned long);

- /* Check that 'lowest_stack' value is sane */
- if (unlikely(erase_low - task_stack_low >= THREAD_SIZE))
- erase_low = task_stack_low;
-
/* Search for the poison value in the kernel stack */
while (erase_low > task_stack_low && poison_count <= depth) {
if (*(unsigned long *)erase_low == STACKLEAK_POISON)
--
2.30.2

2022-04-26 09:09:00

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 0/8] stackleak: fixes and rework

On Mon, Apr 25, 2022 at 12:55:55PM +0100, Mark Rutland wrote:
> This series reworks the stackleak code. The first patch fixes some
> latent issues on arm64, and the subsequent patches improve the code to
> improve clarity and permit better code generation.

This looks nice; thanks! I'll put this through build testing and get it
applied shortly...

> While the improvement is small, I think the improvement to clarity and
> code generation is a win regardless.

Agreed. I also want to manually inspect the resulting memory just to
make sure things didn't accidentally regress. There's also an LKDTM test
for basic functionality.

-Kees

--
Kees Cook

2022-04-26 12:44:43

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 0/8] stackleak: fixes and rework

On Mon, Apr 25, 2022 at 03:54:00PM -0700, Kees Cook wrote:
> On Mon, Apr 25, 2022 at 12:55:55PM +0100, Mark Rutland wrote:
> > This series reworks the stackleak code. The first patch fixes some
> > latent issues on arm64, and the subsequent patches improve the code to
> > improve clarity and permit better code generation.
>
> This looks nice; thanks! I'll put this through build testing and get it
> applied shortly...

Thanks!

Patch 1 is liable to conflict with come other stacktrace bits that may go in
for v5.19, so it'd be good if either that could be queued as a fix for
v5.1-rc4, or we'll have to figure out how to deal with conflicts later.

> > While the improvement is small, I think the improvement to clarity and
> > code generation is a win regardless.
>
> Agreed. I also want to manually inspect the resulting memory just to
> make sure things didn't accidentally regress. There's also an LKDTM test
> for basic functionality.

I assume that's the STACKLEAK_ERASING test?

I gave that a spin, but on arm64 that test is flaky even on baseline v5.18-rc1.
On x86_64 it seems consistent after 100s of runs. I'll go dig into that now.

FWIW, I'm testing with defconfig + STACKLEAK + STACKLEAK_RUNTIME_DISABLE +
LKDTM, using GCC 11.1.0 from the kernel.org crosstool pages. When running the
test it passes a few times, then fails dramatically:

| # uname -a
| Linux buildroot 5.18.0-rc1 #1 SMP PREEMPT Tue Apr 26 10:38:12 BST 2022 aarch64 GNU/Linux
| # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT
| [ 21.899596] lkdtm: Performing direct entry STACKLEAK_ERASING
| [ 21.900102] lkdtm: checking unused part of the thread stack (15480 bytes)...
| [ 21.900752] lkdtm: the erased part begins after 1440 not poisoned bytes
| [ 21.901318] lkdtm: OK: the rest of the thread stack is properly erased
| # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT
| [ 22.551022] lkdtm: Performing direct entry STACKLEAK_ERASING
| [ 22.551625] lkdtm: checking unused part of the thread stack (15480 bytes)...
| [ 22.552314] lkdtm: the erased part begins after 1440 not poisoned bytes
| [ 22.552915] lkdtm: OK: the rest of the thread stack is properly erased
| # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT
| [ 23.137457] lkdtm: Performing direct entry STACKLEAK_ERASING
| [ 23.138521] lkdtm: checking unused part of the thread stack (15480 bytes)...
| [ 23.139173] lkdtm: the erased part begins after 1440 not poisoned bytes
| [ 23.139787] lkdtm: OK: the rest of the thread stack is properly erased
| # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT
| [ 23.601729] lkdtm: Performing direct entry STACKLEAK_ERASING
| [ 23.603159] lkdtm: checking unused part of the thread stack (15480 bytes)...
| [ 23.603982] lkdtm: the erased part begins after 1440 not poisoned bytes
| [ 23.604565] lkdtm: OK: the rest of the thread stack is properly erased
| # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT
| [ 24.046171] lkdtm: Performing direct entry STACKLEAK_ERASING
| [ 24.046525] lkdtm: checking unused part of the thread stack (15480 bytes)...
| [ 24.046965] lkdtm: the erased part begins after 1440 not poisoned bytes
| [ 24.047562] lkdtm: OK: the rest of the thread stack is properly erased
| # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT
| [ 24.481889] lkdtm: Performing direct entry STACKLEAK_ERASING
| [ 24.482682] lkdtm: checking unused part of the thread stack (15480 bytes)...
| [ 24.483361] lkdtm: the erased part begins after 1440 not poisoned bytes
| [ 24.483994] lkdtm: OK: the rest of the thread stack is properly erased
| # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT
| [ 24.930625] lkdtm: Performing direct entry STACKLEAK_ERASING
| [ 24.931168] lkdtm: checking unused part of the thread stack (15480 bytes)...
| [ 24.931914] lkdtm: the erased part begins after 1440 not poisoned bytes
| [ 24.932404] lkdtm: OK: the rest of the thread stack is properly erased
| # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT
| [ 25.351606] lkdtm: Performing direct entry STACKLEAK_ERASING
| [ 25.352181] lkdtm: checking unused part of the thread stack (15480 bytes)...
| [ 25.352827] lkdtm: the erased part begins after 1440 not poisoned bytes
| [ 25.353496] lkdtm: OK: the rest of the thread stack is properly erased
| # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT
| [ 25.762500] lkdtm: Performing direct entry STACKLEAK_ERASING
| [ 25.762970] lkdtm: checking unused part of the thread stack (15480 bytes)...
| [ 25.763396] lkdtm: the erased part begins after 1440 not poisoned bytes
| [ 25.763789] lkdtm: OK: the rest of the thread stack is properly erased
| # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT
| [ 26.157349] lkdtm: Performing direct entry STACKLEAK_ERASING
| [ 26.157880] lkdtm: checking unused part of the thread stack (15480 bytes)...
| [ 26.158381] lkdtm: the erased part begins after 1440 not poisoned bytes
| [ 26.158859] lkdtm: OK: the rest of the thread stack is properly erased
| # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT
| [ 26.527798] lkdtm: Performing direct entry STACKLEAK_ERASING
| [ 26.528621] lkdtm: checking unused part of the thread stack (15480 bytes)...
| [ 26.529451] lkdtm: the erased part begins after 1440 not poisoned bytes
| [ 26.530654] lkdtm: FAIL: bad value number 197 in the erased part: 0xffff8000083d3670
| [ 26.531246] lkdtm: FAIL: bad value number 198 in the erased part: 0xaea4d638c4322298
| [ 26.531760] lkdtm: FAIL: bad value number 199 in the erased part: 0xffff8000083d3670
| [ 26.532219] lkdtm: FAIL: bad value number 200 in the erased part: 0xdead000000000122
| [ 26.532640] lkdtm: FAIL: bad value number 201 in the erased part: 0x0
| [ 26.532991] lkdtm: FAIL: bad value number 202 in the erased part: 0xdead000000000122
| [ 26.533412] lkdtm: FAIL: bad value number 203 in the erased part: 0x101
| [ 26.533773] lkdtm: FAIL: bad value number 204 in the erased part: 0xffff2f22033d0000
| [ 26.535385] lkdtm: FAIL: bad value number 205 in the erased part: 0xffff8000083d3650
| [ 26.536150] lkdtm: FAIL: bad value number 206 in the erased part: 0x2fc3d638c4321e2c
| [ 26.536798] lkdtm: FAIL: bad value number 207 in the erased part: 0xffffd638c61c3880
| [ 26.537487] lkdtm: FAIL: bad value number 208 in the erased part: 0xffff2f227fbd4878
| [ 26.538444] lkdtm: FAIL: bad value number 209 in the erased part: 0xffff8000083d3600
| [ 26.539094] lkdtm: FAIL: bad value number 210 in the erased part: 0xfd5d638c4311244
| [ 26.539736] lkdtm: FAIL: bad value number 211 in the erased part: 0xffffd638c6139a38
| [ 26.540383] lkdtm: FAIL: bad value number 212 in the erased part: 0x0
| [ 26.540919] lkdtm: FAIL: bad value number 213 in the erased part: 0x0
| [ 26.541458] lkdtm: FAIL: bad value number 214 in the erased part: 0x3eb4d638c43111dc
| [ 26.542399] lkdtm: FAIL: bad value number 215 in the erased part: 0xfffffcbc880fa8c0
| [ 26.543051] lkdtm: FAIL: bad value number 216 in the erased part: 0xffff2f2203ea3100
| [ 26.543698] lkdtm: FAIL: bad value number 217 in the erased part: 0xffff2f2202817500
| [ 26.544353] lkdtm: FAIL: bad value number 218 in the erased part: 0xe184d638c447df3c
| [ 26.545004] lkdtm: FAIL: bad value number 219 in the erased part: 0xffff8000083d3600
| [ 26.545652] lkdtm: FAIL: bad value number 220 in the erased part: 0xffff2f22033d0000
| [ 26.546571] lkdtm: FAIL: bad value number 221 in the erased part: 0xffff2f227fbd3b80
| [ 26.547110] lkdtm: FAIL: bad value number 222 in the erased part: 0xc5d1d638c42cb164
| [ 26.547643] lkdtm: FAIL: bad value number 223 in the erased part: 0xffff8000083d35a0
| [ 26.548180] lkdtm: FAIL: bad value number 224 in the erased part: 0x2f94d638c42cb150
| [ 26.548716] lkdtm: FAIL: bad value number 225 in the erased part: 0xffff8000083d35a0
| [ 26.549263] lkdtm: FAIL: bad value number 226 in the erased part: 0xffff2f22033d0000
| [ 26.549798] lkdtm: FAIL: bad value number 227 in the erased part: 0xffffd638c61c3880
| [ 26.550684] lkdtm: FAIL: bad value number 228 in the erased part: 0x96ccd638c4477ac8
| [ 26.551216] lkdtm: FAIL: bad value number 229 in the erased part: 0xffff8000083d35a0
| [ 26.551754] lkdtm: FAIL: bad value number 230 in the erased part: 0x99abd638c4499888
| [ 26.552289] lkdtm: FAIL: bad value number 231 in the erased part: 0xffff8000083d3580
| [ 26.552821] lkdtm: FAIL: bad value number 232 in the erased part: 0x6ccd638c447e0e0
| [ 26.553348] lkdtm: FAIL: bad value number 233 in the erased part: 0xffff8000083d3600
| [ 26.554135] lkdtm: FAIL: bad value number 234 in the erased part: 0xffff2f227fbd3b00
| [ 26.554674] lkdtm: FAIL: bad value number 235 in the erased part: 0xffff2f220288ba00
| [ 26.555210] lkdtm: FAIL: bad value number 236 in the erased part: 0x3da6d638c42c1e34
| [ 26.555739] lkdtm: FAIL: bad value number 237 in the erased part: 0xffff8000083d3540
| [ 26.556271] lkdtm: FAIL: bad value number 238 in the erased part: 0xc0
| [ 26.556723] lkdtm: FAIL: bad value number 239 in the erased part: 0x0
| [ 26.557182] lkdtm: FAIL: bad value number 240 in the erased part: 0xffff2f220288ba00
| [ 26.557719] lkdtm: FAIL: bad value number 241 in the erased part: 0xffff2f227fbd3b00
| [ 26.558478] lkdtm: FAIL: bad value number 242 in the erased part: 0xf882d638c447a134
| [ 26.558944] lkdtm: FAIL: bad value number 243 in the erased part: 0xffff8000083d3530
| [ 26.559407] lkdtm: FAIL: bad value number 244 in the erased part: 0x14bcd638c4494bf4
| [ 26.559862] lkdtm: FAIL: bad value number 245 in the erased part: 0xffff8000083d3510
| [ 26.560320] lkdtm: FAIL: bad value number 246 in the erased part: 0x33a7d638c44939c4
| [ 26.560774] lkdtm: FAIL: bad value number 247 in the erased part: 0xffff8000083d34e0
| [ 26.561227] lkdtm: FAIL: bad value number 248 in the erased part: 0x1
| [ 26.561606] lkdtm: FAIL: bad value number 249 in the erased part: 0xffff2f22028701b0
| [ 26.562293] lkdtm: FAIL: bad value number 250 in the erased part: 0xfff3d638c448fb6c
| [ 26.562753] lkdtm: FAIL: bad value number 251 in the erased part: 0xffff8000083d34c0
| [ 26.563206] lkdtm: FAIL: bad value number 252 in the erased part: 0x1
| [ 26.563596] lkdtm: FAIL: bad value number 253 in the erased part: 0xffff2f22028701b0
| [ 26.564055] lkdtm: FAIL: bad value number 254 in the erased part: 0xc3b1d638c448f978
| [ 26.564509] lkdtm: FAIL: bad value number 255 in the erased part: 0xffff8000083d34a0
| [ 26.564963] lkdtm: FAIL: bad value number 256 in the erased part: 0x4399d638c42c1cec
| [ 26.565420] lkdtm: FAIL: bad value number 257 in the erased part: 0xffff8000083d34b0
| [ 26.566045] lkdtm: FAIL: bad value number 258 in the erased part: 0xffff2f227fbd3b80
| [ 26.566507] lkdtm: FAIL: bad value number 259 in the erased part: 0xffff2f220288ba80
| [ 26.566965] lkdtm: FAIL: bad value number 260 in the erased part: 0xe9e9d638c42cca74
| [ 26.567421] lkdtm: FAIL: bad value number 261 in the erased part: 0xffff8000083d34b0
| [ 26.567821] lkdtm: FAIL: bad value number 262 in the erased part: 0xffff2f220288ba80
| [ 26.568221] lkdtm: FAIL: bad value number 263 in the erased part: 0xffff2f227fbd3b80
| [ 26.568620] lkdtm: FAIL: bad value number 264 in the erased part: 0xf697d638c42cb164
| [ 26.569015] lkdtm: FAIL: bad value number 265 in the erased part: 0xffff8000083d3450
| [ 26.569410] lkdtm: FAIL: bad value number 266 in the erased part: 0x47e5d638c42cb150
| [ 26.569947] lkdtm: FAIL: bad value number 267 in the erased part: 0xffff8000083d3450
| [ 26.570391] lkdtm: FAIL: bad value number 268 in the erased part: 0xf0b1d638c447ad28
| [ 26.570788] lkdtm: FAIL: bad value number 269 in the erased part: 0xffff8000083d3430
| [ 26.571189] lkdtm: FAIL: the thread stack is NOT properly erased!

Thanks,
Mark.

2022-04-26 16:29:23

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 0/8] stackleak: fixes and rework

On Tue, Apr 26, 2022 at 11:37:47AM +0100, Mark Rutland wrote:
> On Tue, Apr 26, 2022 at 11:10:52AM +0100, Mark Rutland wrote:
> > On Mon, Apr 25, 2022 at 03:54:00PM -0700, Kees Cook wrote:
> > > On Mon, Apr 25, 2022 at 12:55:55PM +0100, Mark Rutland wrote:
> > > > This series reworks the stackleak code. The first patch fixes some
> > > > latent issues on arm64, and the subsequent patches improve the code to
> > > > improve clarity and permit better code generation.
> > >
> > > This looks nice; thanks! I'll put this through build testing and get it
> > > applied shortly...
> >
> > Thanks!
> >
> > Patch 1 is liable to conflict with come other stacktrace bits that may go in
> > for v5.19, so it'd be good if either that could be queued as a fix for
> > v5.1-rc4, or we'll have to figure out how to deal with conflicts later.
> >
> > > > While the improvement is small, I think the improvement to clarity and
> > > > code generation is a win regardless.
> > >
> > > Agreed. I also want to manually inspect the resulting memory just to
> > > make sure things didn't accidentally regress. There's also an LKDTM test
> > > for basic functionality.
> >
> > I assume that's the STACKLEAK_ERASING test?
> >
> > I gave that a spin, but on arm64 that test is flaky even on baseline v5.18-rc1.
> > On x86_64 it seems consistent after 100s of runs. I'll go dig into that now.
>
> I hacked in some debug, and it looks like the sp used in the test is far above
> the current lowest_sp. The test is slightly wrong since it grabs the address of
> a local variable rather than using current_stack_pointer, but the offset I see
> is much larger:
>
> # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT
> [ 27.665221] lkdtm: Performing direct entry STACKLEAK_ERASING
> [ 27.665986] lkdtm: FAIL: lowest_stack 0xffff8000083a39e0 is lower than test sp 0xffff8000083a3c80
> [ 27.667530] lkdtm: FAIL: the thread stack is NOT properly erased!
>
> That's off by 0x2a0 (AKA 672) bytes, and it seems to be consistent from run to
> run.
>
> I note that an interrupt occuring could cause similar (since on arm64 those are
> taken/triaged on the task stack before moving to the irq stack, and the irq
> regs alone will take 300+ bytes), but that doesn't seem to be the problem here
> given this is consistent, and it appears some prior function consumed a lot of
> stack.
>
> I *think* the same irq problem would apply to x86, but maybe that initial
> triage happens on a trampoline stack.
>
> I'll dig a bit more into the arm64 side...

That offset above seems to be due to the earlier logic in direct_entry(), which
I guess is running out-of-line. With that hacked to:

----------------
diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c
index e2228b6fc09bb..53f3027e8202d 100644
--- a/drivers/misc/lkdtm/core.c
+++ b/drivers/misc/lkdtm/core.c
@@ -378,8 +378,9 @@ static ssize_t direct_entry(struct file *f, const char __user *user_buf,
size_t count, loff_t *off)
{
const struct crashtype *crashtype;
- char *buf;
+ char *buf = "STACKLEAK_ERASING";

+#if 0
if (count >= PAGE_SIZE)
return -EINVAL;
if (count < 1)
@@ -395,13 +396,17 @@ static ssize_t direct_entry(struct file *f, const char __user *user_buf,
/* NULL-terminate and remove enter */
buf[count] = '\0';
strim(buf);
+#endif

crashtype = find_crashtype(buf);
+
+#if 0
free_page((unsigned long) buf);
if (!crashtype)
return -EINVAL;
+#endif

- pr_info("Performing direct entry %s\n", crashtype->name);
+ // pr_info("Performing direct entry %s\n", crashtype->name);
lkdtm_do_action(crashtype);
*off += count;

----------------

... the SP check doesn't fail, but I still see intermittent bad value failures.
Those might be due to interrupt frames.

Thanks,
Mark.

2022-04-26 18:19:22

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 0/8] stackleak: fixes and rework

On Mon, Apr 25, 2022 at 03:54:00PM -0700, Kees Cook wrote:
> On Mon, Apr 25, 2022 at 12:55:55PM +0100, Mark Rutland wrote:
> > This series reworks the stackleak code. The first patch fixes some
> > latent issues on arm64, and the subsequent patches improve the code to
> > improve clarity and permit better code generation.
>
> This looks nice; thanks! I'll put this through build testing and get it
> applied shortly...

FWIW, looking at testing I've spotted a fencepost error, so I'll spin a
v2 with that fixed (and with updates to the LKDTM test).

Thanks,
Mark.

2022-04-26 21:51:17

by Alexander Popov

[permalink] [raw]
Subject: Re: [PATCH 0/8] stackleak: fixes and rework

On 26.04.2022 01:54, Kees Cook wrote:
> On Mon, Apr 25, 2022 at 12:55:55PM +0100, Mark Rutland wrote:
>> This series reworks the stackleak code. The first patch fixes some
>> latent issues on arm64, and the subsequent patches improve the code to
>> improve clarity and permit better code generation.
>
> This looks nice; thanks! I'll put this through build testing and get it
> applied shortly...
>
>> While the improvement is small, I think the improvement to clarity and
>> code generation is a win regardless.
>
> Agreed. I also want to manually inspect the resulting memory just to
> make sure things didn't accidentally regress. There's also an LKDTM test
> for basic functionality.

Hi Mark and Kees!

Glad to see this patch series.

I've looked at it briefly. Mark, I see your questions in the patches that I can
answer.

Please give me some time, I'm going to work on your patch series next week. I'll
return with review and testing.

Thanks!
Alexander

2022-04-27 03:23:16

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 0/8] stackleak: fixes and rework

On Tue, Apr 26, 2022 at 11:10:52AM +0100, Mark Rutland wrote:
> On Mon, Apr 25, 2022 at 03:54:00PM -0700, Kees Cook wrote:
> > On Mon, Apr 25, 2022 at 12:55:55PM +0100, Mark Rutland wrote:
> > > This series reworks the stackleak code. The first patch fixes some
> > > latent issues on arm64, and the subsequent patches improve the code to
> > > improve clarity and permit better code generation.
> >
> > This looks nice; thanks! I'll put this through build testing and get it
> > applied shortly...
>
> Thanks!
>
> Patch 1 is liable to conflict with come other stacktrace bits that may go in
> for v5.19, so it'd be good if either that could be queued as a fix for
> v5.1-rc4, or we'll have to figure out how to deal with conflicts later.
>
> > > While the improvement is small, I think the improvement to clarity and
> > > code generation is a win regardless.
> >
> > Agreed. I also want to manually inspect the resulting memory just to
> > make sure things didn't accidentally regress. There's also an LKDTM test
> > for basic functionality.
>
> I assume that's the STACKLEAK_ERASING test?
>
> I gave that a spin, but on arm64 that test is flaky even on baseline v5.18-rc1.
> On x86_64 it seems consistent after 100s of runs. I'll go dig into that now.

I hacked in some debug, and it looks like the sp used in the test is far above
the current lowest_sp. The test is slightly wrong since it grabs the address of
a local variable rather than using current_stack_pointer, but the offset I see
is much larger:

# echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT
[ 27.665221] lkdtm: Performing direct entry STACKLEAK_ERASING
[ 27.665986] lkdtm: FAIL: lowest_stack 0xffff8000083a39e0 is lower than test sp 0xffff8000083a3c80
[ 27.667530] lkdtm: FAIL: the thread stack is NOT properly erased!

That's off by 0x2a0 (AKA 672) bytes, and it seems to be consistent from run to
run.

I note that an interrupt occuring could cause similar (since on arm64 those are
taken/triaged on the task stack before moving to the irq stack, and the irq
regs alone will take 300+ bytes), but that doesn't seem to be the problem here
given this is consistent, and it appears some prior function consumed a lot of
stack.

I *think* the same irq problem would apply to x86, but maybe that initial
triage happens on a trampoline stack.

I'll dig a bit more into the arm64 side...

Thanks,
Mark.

2022-04-27 08:56:36

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 0/8] stackleak: fixes and rework

On Tue, Apr 26, 2022 at 06:51:04PM +0300, Alexander Popov wrote:
> On 26.04.2022 01:54, Kees Cook wrote:
> > On Mon, Apr 25, 2022 at 12:55:55PM +0100, Mark Rutland wrote:
> > > This series reworks the stackleak code. The first patch fixes some
> > > latent issues on arm64, and the subsequent patches improve the code to
> > > improve clarity and permit better code generation.
> >
> > This looks nice; thanks! I'll put this through build testing and get it
> > applied shortly...
> >
> > > While the improvement is small, I think the improvement to clarity and
> > > code generation is a win regardless.
> >
> > Agreed. I also want to manually inspect the resulting memory just to
> > make sure things didn't accidentally regress. There's also an LKDTM test
> > for basic functionality.
>
> Hi Mark and Kees!
>
> Glad to see this patch series.
>
> I've looked at it briefly. Mark, I see your questions in the patches that I
> can answer.
>
> Please give me some time, I'm going to work on your patch series next week.
> I'll return with review and testing.

Sure thing, thanks!

FWIW, I spotted a couple of issues in my patches today while testing,
and if you're happy I can post a v2 later this week with those fixed, so
you don't need to waste time with those.

Thanks,
Mark.

2022-04-27 10:23:10

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 0/8] stackleak: fixes and rework

On Mon, 25 Apr 2022 12:55:55 +0100, Mark Rutland wrote:
> This series reworks the stackleak code. The first patch fixes some
> latent issues on arm64, and the subsequent patches improve the code to
> improve clarity and permit better code generation.
>
> I started working on this as a tangent from rework to arm64's stacktrace
> code. Looking at users of the `on_*_stack()` helpers I noticed that the
> assembly generated for stackleak was particularly awful as it performed
> a lot of redundant work and also called instrumentable code, which isn't
> sound.
>
> [...]

Applied to for-next/hardening, thanks!

[1/8] arm64: stackleak: fix current_top_of_stack()
https://git.kernel.org/kees/c/b9f8167d08e9
[2/8] stackleak: move skip_erasing() check earlier
https://git.kernel.org/kees/c/b7d6315d1d7c
[3/8] stackleak: rework stack low bound handling
https://git.kernel.org/kees/c/1f4f72d1d99e
[4/8] stackleak: clarify variable names
https://git.kernel.org/kees/c/52a2aa794e0a
[5/8] stackleak: rework stack high bound handling
https://git.kernel.org/kees/c/83301ac044c9
[6/8] stackleak: remove redundant check
https://git.kernel.org/kees/c/0cd7ee6880c7
[7/8] stackleak: add on/off stack variants
https://git.kernel.org/kees/c/9bb0b174fd2b
[8/8] arm64: entry: use stackleak_erase_on_task_stack()
https://git.kernel.org/kees/c/6a5927e73497

--
Kees Cook

2022-04-27 11:12:08

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 0/8] stackleak: fixes and rework

On Tue, Apr 26, 2022 at 05:01:27PM +0100, Mark Rutland wrote:
> On Mon, Apr 25, 2022 at 03:54:00PM -0700, Kees Cook wrote:
> > On Mon, Apr 25, 2022 at 12:55:55PM +0100, Mark Rutland wrote:
> > > This series reworks the stackleak code. The first patch fixes some
> > > latent issues on arm64, and the subsequent patches improve the code to
> > > improve clarity and permit better code generation.
> >
> > This looks nice; thanks! I'll put this through build testing and get it
> > applied shortly...
>
> FWIW, looking at testing I've spotted a fencepost error, so I'll spin a
> v2 with that fixed (and with updates to the LKDTM test).

Ah! Oops, ok, I'll unpush the series, I missed this. :)

--
Kees Cook