2018-01-08 21:29:56

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch

From: Andi Kleen <[email protected]>

[This is on top of David's retpoline branch, as of 08-01 this morning]

This patch further hardens retpoline

CPUs have return buffers which store the return address for
RET to predict function returns. Some CPUs (Skylake, some Broadwells)
can fall back to indirect branch prediction on return buffer underflow.

With retpoline we want to avoid uncontrolled indirect branches,
which could be poisoned by ring 3, so we need to avoid uncontrolled
return buffer underflows in the kernel.

This can happen when we're context switching from a shallower to a
deeper kernel stack. The deeper kernel stack would eventually underflow
the return buffer, which again would fall back to the indirect branch predictor.

To guard against this fill the return buffer with controlled
content during context switch. This prevents any underflows.

We always fill the buffer with 30 entries: 32 minus 2 for at
least one call from entry_{64,32}.S to C code and another into
the function doing the filling.

That's pessimistic because we likely did more controlled kernel calls.
So in principle we could do less. However it's hard to maintain such an
invariant, and it may be broken with more aggressive compilers.
So err on the side of safety and always fill 30.

Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/entry/entry_32.S | 15 +++++++++++++++
arch/x86/entry/entry_64.S | 15 +++++++++++++++
arch/x86/include/asm/nospec-branch.h | 29 +++++++++++++++++++++++++++++
3 files changed, 59 insertions(+)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index cf9ef33d299b..5404a9b2197c 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -250,6 +250,21 @@ ENTRY(__switch_to_asm)
popl %ebx
popl %ebp

+ /*
+ * When we switch from a shallower to a deeper call stack
+ * the call stack will underflow in the kernel in the next task.
+ * This could cause the CPU to fall back to indirect branch
+ * prediction, which may be poisoned.
+ *
+ * To guard against that always fill the return stack with
+ * known values.
+ *
+ * We do this in assembler because it needs to be before
+ * any calls on the new stack, and this can be difficult to
+ * ensure in a complex C function like __switch_to.
+ */
+ ALTERNATIVE "jmp __switch_to", "", X86_FEATURE_RETPOLINE
+ FILL_RETURN_BUFFER
jmp __switch_to
END(__switch_to_asm)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 9bce6ed03353..0f28d0ea57e8 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -495,6 +495,21 @@ ENTRY(__switch_to_asm)
popq %rbx
popq %rbp

+ /*
+ * When we switch from a shallower to a deeper call stack
+ * the call stack will underflow in the kernel in the next task.
+ * This could cause the CPU to fall back to indirect branch
+ * prediction, which may be poisoned.
+ *
+ * To guard against that always fill the return stack with
+ * known values.
+ *
+ * We do this in assembler because it needs to be before
+ * any calls on the new stack, and this can be difficult to
+ * ensure in a complex C function like __switch_to.
+ */
+ ALTERNATIVE "jmp __switch_to", "", X86_FEATURE_RETPOLINE
+ FILL_RETURN_BUFFER
jmp __switch_to
END(__switch_to_asm)

diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index b8c8eeacb4be..e84e231248c2 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -53,6 +53,35 @@
#endif
.endm

+/*
+ * We use 32-N: 32 is the max return buffer size,
+ * but there should have been at a minimum two
+ * controlled calls already: one into the kernel
+ * from entry*.S and another into the function
+ * containing this macro. So N=2, thus 30.
+ */
+#define NUM_BRANCHES_TO_FILL 30
+
+/*
+ * Fill the CPU return branch buffer to prevent
+ * indirect branch prediction on underflow.
+ * Caller should check for X86_FEATURE_SMEP and X86_FEATURE_RETPOLINE
+ */
+.macro FILL_RETURN_BUFFER
+#ifdef CONFIG_RETPOLINE
+ .rept NUM_BRANCHES_TO_FILL
+ call 1221f
+ pause /* stop speculation */
+1221:
+ .endr
+#ifdef CONFIG_64BIT
+ addq $8*NUM_BRANCHES_TO_FILL, %rsp
+#else
+ addl $4*NUM_BRANCHES_TO_FILL, %esp
+#endif
+#endif
+.endm
+
#else /* __ASSEMBLY__ */

#if defined(CONFIG_X86_64) && defined(RETPOLINE)
--
2.14.3


2018-01-08 21:38:45

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch

On Mon, 2018-01-08 at 12:15 -0800, Andi Kleen wrote:
> From: Andi Kleen <[email protected]>
>
> [This is on top of David's retpoline branch, as of 08-01 this morning]
>
> This patch further hardens retpoline
>
> CPUs have return buffers which store the return address for
> RET to predict function returns. Some CPUs (Skylake, some Broadwells)
> can fall back to indirect branch prediction on return buffer underflow.
>
> With retpoline we want to avoid uncontrolled indirect branches,
> which could be poisoned by ring 3, so we need to avoid uncontrolled
> return buffer underflows in the kernel.
>
> This can happen when we're context switching from a shallower to a
> deeper kernel stack.  The deeper kernel stack would eventually underflow
> the return buffer, which again would fall back to the indirect branch predictor.
>
> To guard against this fill the return buffer with controlled
> content during context switch. This prevents any underflows.
>
> We always fill the buffer with 30 entries: 32 minus 2 for at
> least one call from entry_{64,32}.S to C code and another into
> the function doing the filling.
>
> That's pessimistic because we likely did more controlled kernel calls.
> So in principle we could do less.  However it's hard to maintain such an
> invariant, and it may be broken with more aggressive compilers.
> So err on the side of safety and always fill 30.
>
> Signed-off-by: Andi Kleen <[email protected]>

Thanks.

Acked-by: David Woodhouse <[email protected]>

We want this on vmexit too, right? And the IBRS/IBPB patch set is going
to want to do similar things. But picking the RSB stuffing out of that
patch set and putting it in with the retpoline support is absolutely
the right thing to do.


Attachments:
smime.p7s (5.09 kB)

2018-01-08 22:11:32

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch

On Mon, Jan 08, 2018 at 12:15:31PM -0800, Andi Kleen wrote:
> diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
> index b8c8eeacb4be..e84e231248c2 100644
> --- a/arch/x86/include/asm/nospec-branch.h
> +++ b/arch/x86/include/asm/nospec-branch.h
> @@ -53,6 +53,35 @@
> #endif
> .endm
>
> +/*
> + * We use 32-N: 32 is the max return buffer size,
> + * but there should have been at a minimum two
> + * controlled calls already: one into the kernel
> + * from entry*.S and another into the function
> + * containing this macro. So N=2, thus 30.
> + */
> +#define NUM_BRANCHES_TO_FILL 30
> +
> +/*
> + * Fill the CPU return branch buffer to prevent
> + * indirect branch prediction on underflow.
> + * Caller should check for X86_FEATURE_SMEP and X86_FEATURE_RETPOLINE
> + */
> +.macro FILL_RETURN_BUFFER
> +#ifdef CONFIG_RETPOLINE
> + .rept NUM_BRANCHES_TO_FILL
> + call 1221f
> + pause /* stop speculation */
> +1221:
> + .endr
> +#ifdef CONFIG_64BIT
> + addq $8*NUM_BRANCHES_TO_FILL, %rsp
> +#else
> + addl $4*NUM_BRANCHES_TO_FILL, %esp
> +#endif
> +#endif
> +.endm

So pjt did alignment, a single unroll and per discussion earlier today
(CET) or late last night (PST), he only does 16.

Why is none of that done here? Also, can we pretty please stop using
those retarded number labels, they make this stuff unreadable.

Also, pause is unlikely to stop speculation, that comment doesn't make
sense. Looking at PJT's version there used to be a speculation trap in
there, but I can't see that here.


2018-01-08 22:25:22

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch

> So pjt did alignment, a single unroll and per discussion earlier today
> (CET) or late last night (PST), he only does 16.

I used the Intel recommended sequence, which recommends 32.

Not sure if alignment makes a difference. I can check.

> Why is none of that done here? Also, can we pretty please stop using
> those retarded number labels, they make this stuff unreadable.

Personally I find the magic labels with strange ASCII characters
far less readable than a simple number.

But can change it if you insist.

> Also, pause is unlikely to stop speculation, that comment doesn't make
> sense. Looking at PJT's version there used to be a speculation trap in
> there, but I can't see that here.

My understanding is that it stops speculation. But could also
use LFENCE.

-Andi

2018-01-08 22:26:25

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch

On Mon, Jan 08, 2018 at 10:17:19PM +0000, Woodhouse, David wrote:
> On Mon, 2018-01-08 at 23:11 +0100, Peter Zijlstra wrote:
> >
> > So pjt did alignment, a single unroll and per discussion earlier today
> > (CET) or late last night (PST), he only does 16.
>
> Hey Intel, please tell us precisely how many RSB entries there are, on
> each family of CPU... :)

Right, and we can always fall back to 32 for unknown models.

> > Also, pause is unlikely to stop speculation, that comment doesn't make
> > sense. Looking at PJT's version there used to be a speculation trap in
> > there, but I can't see that here.
>
> In this particular code we don't need a speculation trap; that's
> elsewhere. This one is *just* about the call stack. And the reason we
> don't just have...
>
> ?call . + 5
> ?call . + 5
> ?call . + 5
> ?...
>
> is because that might get interpreted as a "push %rip" and not go on
> the RSB at all. Hence the 'pause' between each one.

OK, then make the comment say that.

2018-01-08 22:54:20

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch

> We want this on vmexit too, right?

Yes. KVM patches are done separately.

-Andi


2018-01-08 22:58:40

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch

> > Why is none of that done here? Also, can we pretty please stop using
> > those retarded number labels, they make this stuff unreadable.
>
> Personally I find the magic labels with strange ASCII characters
> far less readable than a simple number.

Tried it and \@ is incompatible with .rept.

-Andi

2018-01-08 23:26:19

by Woodhouse, David

[permalink] [raw]
Subject: Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch

On Mon, 2018-01-08 at 23:26 +0100, Peter Zijlstra wrote:
>
> > is because that might get interpreted as a "push %rip" and not go on
> > the RSB at all. Hence the 'pause' between each one.
>
> OK, then make the comment say that.

Fixed. I've also shifted the #ifdef CONFIG_RETPOLINE to the call sites
instead of inside the FILL_RETURN_BUFFER macro itself. This is going to
get used with IBRS code too, Real Soon Now™.

http://git.infradead.org/users/dwmw2/linux-retpoline.git/commitdiff/6e961b86558


Attachments:
smime.p7s (5.09 kB)

2018-01-09 00:16:04

by Paul Turner

[permalink] [raw]
Subject: Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch

On Mon, Jan 8, 2018 at 2:11 PM, Peter Zijlstra <[email protected]> wrote:
> On Mon, Jan 08, 2018 at 12:15:31PM -0800, Andi Kleen wrote:
>> diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
>> index b8c8eeacb4be..e84e231248c2 100644
>> --- a/arch/x86/include/asm/nospec-branch.h
>> +++ b/arch/x86/include/asm/nospec-branch.h
>> @@ -53,6 +53,35 @@
>> #endif
>> .endm
>>
>> +/*
>> + * We use 32-N: 32 is the max return buffer size,
>> + * but there should have been at a minimum two
>> + * controlled calls already: one into the kernel
>> + * from entry*.S and another into the function
>> + * containing this macro. So N=2, thus 30.
>> + */
>> +#define NUM_BRANCHES_TO_FILL 30
>> +
>> +/*
>> + * Fill the CPU return branch buffer to prevent
>> + * indirect branch prediction on underflow.
>> + * Caller should check for X86_FEATURE_SMEP and X86_FEATURE_RETPOLINE
>> + */
>> +.macro FILL_RETURN_BUFFER
>> +#ifdef CONFIG_RETPOLINE
>> + .rept NUM_BRANCHES_TO_FILL
>> + call 1221f
>> + pause /* stop speculation */
>> +1221:
>> + .endr
>> +#ifdef CONFIG_64BIT
>> + addq $8*NUM_BRANCHES_TO_FILL, %rsp
>> +#else
>> + addl $4*NUM_BRANCHES_TO_FILL, %esp
>> +#endif
>> +#endif
>> +.endm
>
> So pjt did alignment, a single unroll and per discussion earlier today
> (CET) or late last night (PST), he only does 16.
>
> Why is none of that done here? Also, can we pretty please stop using
> those retarded number labels, they make this stuff unreadable.
>
> Also, pause is unlikely to stop speculation, that comment doesn't make
> sense. Looking at PJT's version there used to be a speculation trap in
> there, but I can't see that here.
>

You definitely want the speculation traps.. these entries are
potentially consumed.
Worse: The first entry that will be consumed is the last call in your
linear chain, meaning that it immediately gets to escape into
alternative execution.
(When I was experimenting with icache-minimizing constructions here I
actually used intentional backwards jumps in linear chains to avoid
this.)

The sequence I reported is what ended up seeming optimal.

>

2018-01-09 00:16:25

by Paul Turner

[permalink] [raw]
Subject: Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch

On Mon, Jan 8, 2018 at 2:25 PM, Andi Kleen <[email protected]> wrote:
>> So pjt did alignment, a single unroll and per discussion earlier today
>> (CET) or late last night (PST), he only does 16.
>
> I used the Intel recommended sequence, which recommends 32.
>
> Not sure if alignment makes a difference. I can check.
>
>> Why is none of that done here? Also, can we pretty please stop using
>> those retarded number labels, they make this stuff unreadable.
>
> Personally I find the magic labels with strange ASCII characters
> far less readable than a simple number.
>
> But can change it if you insist.
>
>> Also, pause is unlikely to stop speculation, that comment doesn't make
>> sense. Looking at PJT's version there used to be a speculation trap in
>> there, but I can't see that here.
>
> My understanding is that it stops speculation. But could also
> use LFENCE.
>

Neither pause nor lfence stop speculation.

> -Andi

2018-01-09 00:45:33

by Woodhouse, David

[permalink] [raw]
Subject: Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch

On Mon, 2018-01-08 at 16:15 -0800, Paul Turner wrote:
> On Mon, Jan 8, 2018 at 2:11 PM, Peter Zijlstra <[email protected]> wrote:
> > On Mon, Jan 08, 2018 at 12:15:31PM -0800, Andi Kleen wrote:
> >> diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
> >> index b8c8eeacb4be..e84e231248c2 100644
> >> --- a/arch/x86/include/asm/nospec-branch.h
> >> +++ b/arch/x86/include/asm/nospec-branch.h
> >> @@ -53,6 +53,35 @@
> >>  #endif
> >>  .endm
> >>
> >> +/*
> >> + * We use 32-N: 32 is the max return buffer size,
> >> + * but there should have been at a minimum two
> >> + * controlled calls already: one into the kernel
> >> + * from entry*.S and another into the function
> >> + * containing this macro. So N=2, thus 30.
> >> + */
> >> +#define NUM_BRANCHES_TO_FILL 30
> >> +
> >> +/*
> >> + * Fill the CPU return branch buffer to prevent
> >> + * indirect branch prediction on underflow.
> >> + * Caller should check for X86_FEATURE_SMEP and X86_FEATURE_RETPOLINE
> >> + */
> >> +.macro FILL_RETURN_BUFFER
> >> +#ifdef CONFIG_RETPOLINE
> >> +     .rept   NUM_BRANCHES_TO_FILL
> >> +     call    1221f
> >> +     pause   /* stop speculation */
> >> +1221:
> >> +     .endr
> >> +#ifdef CONFIG_64BIT
> >> +     addq    $8*NUM_BRANCHES_TO_FILL, %rsp
> >> +#else
> >> +     addl    $4*NUM_BRANCHES_TO_FILL, %esp
> >> +#endif
> >> +#endif
> >> +.endm
> >
> > So pjt did alignment, a single unroll and per discussion earlier today
> > (CET) or late last night (PST), he only does 16.
> >
> > Why is none of that done here? Also, can we pretty please stop using
> > those retarded number labels, they make this stuff unreadable.
> >
> > Also, pause is unlikely to stop speculation, that comment doesn't make
> > sense. Looking at PJT's version there used to be a speculation trap in
> > there, but I can't see that here.
> >
>
> You definitely want the speculation traps.. these entries are
> potentially consumed.
> Worse: The first entry that will be consumed is the last call in your
> linear chain, meaning that it immediately gets to escape into
> alternative execution.
> (When I was experimenting with icache-minimizing constructions here I
> actually used intentional backwards jumps in linear chains to avoid
> this.)
>
> The sequence I reported is what ended up seeming optimal.

On IRC, Arjan assures me that 'pause' here really is sufficient as a
speculation trap. If we do end up returning back here as a
misprediction, that 'pause' will stop the speculative execution on
affected CPUs even though it isn't *architecturally* documented to do
so.

Arjan, can you confirm that in email please?


Attachments:
smime.p7s (5.09 kB)

2018-01-09 00:48:11

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch

On Tue, 2018-01-09 at 00:44 +0000, Woodhouse, David wrote:
> On IRC, Arjan assures me that 'pause' here really is sufficient as a
> speculation trap. If we do end up returning back here as a
> misprediction, that 'pause' will stop the speculative execution on
> affected CPUs even though it isn't *architecturally* documented to do
> so.
>
> Arjan, can you confirm that in email please?


That actually doesn't make sense to me. If 'pause' alone is sufficient,
then why in $DEITY's name would we need a '1:pause;jmp 1b' loop in the
retpoline itself?

Arjan?


Attachments:
smime.p7s (5.09 kB)

2018-01-09 02:49:22

by Paul Turner

[permalink] [raw]
Subject: Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch

On Mon, Jan 8, 2018 at 4:48 PM, David Woodhouse <[email protected]> wrote:
> On Tue, 2018-01-09 at 00:44 +0000, Woodhouse, David wrote:
>> On IRC, Arjan assures me that 'pause' here really is sufficient as a
>> speculation trap. If we do end up returning back here as a
>> misprediction, that 'pause' will stop the speculative execution on
>> affected CPUs even though it isn't *architecturally* documented to do
>> so.
>>
>> Arjan, can you confirm that in email please?
>
>
> That actually doesn't make sense to me. If 'pause' alone is sufficient,
> then why in $DEITY's name would we need a '1:pause;jmp 1b' loop in the
> retpoline itself?
>
> Arjan?

On further investigation, I don't understand any of the motivation for
the changes here:
- It micro-benchmarks several cycles slower than the suggested
implementation on average (38 vs 44 cycles) [likely due to lost 16-byte call
alignment]
- It's much larger in terms of .text size (120 bytes @ 16 calls, 218
bytes @ 30 calls) vs (61 bytes)
- I'm not sure it's universally correct in preventing speculation:

(1) I am able to observe a small timing difference between executing
"1: pause; jmp 1b;" and "pause" in the speculative path.
Given that alignment is otherwise identical, this should only
occur if execution is non-identical, which would require speculative
execution to proceed beyond the pause.
(2) When we proposed and reviewed the sequence. This was not cited by
architects as a way of presenting speculation. Indeed, as David
points out, we'd consider using this within the sequence without the
loop.


If the claim above is true -- which (1) actually appears to contradict
-- it seems to bear stronger validation. Particularly since that in
the suggested sequences we can fit the jmps within the space we get
for free by aligning the call targets.

2018-01-09 03:05:46

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch

On Mon, 2018-01-08 at 18:48 -0800, Paul Turner wrote:
> On Mon, Jan 8, 2018 at 4:48 PM, David Woodhouse <[email protected]> wrote:
> >
> > On Tue, 2018-01-09 at 00:44 +0000, Woodhouse, David wrote:
> > >
> > > On IRC, Arjan assures me that 'pause' here really is sufficient as a
> > > speculation trap. If we do end up returning back here as a
> > > misprediction, that 'pause' will stop the speculative execution on
> > > affected CPUs even though it isn't *architecturally* documented to do
> > > so.
> > >
> > > Arjan, can you confirm that in email please?
> >
> > That actually doesn't make sense to me. If 'pause' alone is sufficient,
> > then why in $DEITY's name would we need a '1:pause;jmp 1b' loop in the
> > retpoline itself?
> >
> > Arjan?
> On further investigation, I don't understand any of the motivation for
> the changes here:
> - It micro-benchmarks several cycles slower than the suggested
> implementation on average (38 vs 44 cycles) [likely due to lost 16-byte call
> alignment]
> - It's much larger in terms of .text size (120 bytes @ 16 calls, 218
> bytes @ 30 calls) vs (61 bytes)
> - I'm not sure it's universally correct in preventing speculation:
>
> (1) I am able to observe a small timing difference between executing
> "1: pause; jmp 1b;" and "pause" in the speculative path.
>      Given that alignment is otherwise identical, this should only
> occur if execution is non-identical, which would require speculative
> execution to proceed beyond the pause.
> (2) When we proposed and reviewed the sequence.  This was not cited by
> architects as a way of presenting speculation.  Indeed, as David
> points out, we'd consider using this within the sequence without the
> loop.
>
>
> If the claim above is true -- which (1) actually appears to contradict
> -- it seems to bear stronger validation.  Particularly since that in
> the suggested sequences we can fit the jmps within the space we get
> for free by aligning the call targets.

Some of the discrepancies are because it's been filtered through Intel
and I may not have had your latest version.

I'm going to revert to your version from
https://support.google.com/faqs/answer/7625886 — for the retpoline
(i.e. adding back the alignment) and the RSB stuffing.

If Intel have a sequence which is simpler and guaranteed to work, I'll
let them post that with an authoritative statement from the CPU
architects. At this point, we really need to get on with rolling in the
other parts on top.


Attachments:
smime.p7s (5.09 kB)

2018-01-12 11:14:49

by David Laight

[permalink] [raw]
Subject: RE: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch

From: Andi Kleen
> Sent: 08 January 2018 20:16
>
> [This is on top of David's retpoline branch, as of 08-01 this morning]
>
> This patch further hardens retpoline
>
> CPUs have return buffers which store the return address for
> RET to predict function returns. Some CPUs (Skylake, some Broadwells)
> can fall back to indirect branch prediction on return buffer underflow.
>
> With retpoline we want to avoid uncontrolled indirect branches,
> which could be poisoned by ring 3, so we need to avoid uncontrolled
> return buffer underflows in the kernel.
>
> This can happen when we're context switching from a shallower to a
> deeper kernel stack. The deeper kernel stack would eventually underflow
> the return buffer, which again would fall back to the indirect branch predictor.
...

Is that really a usable attack vector?

Isn't it actually more likely to leak kernel addresses to userspace
in the return stack buffer - which might be usable to get around KASR.

David