2018-08-13 22:24:13

by Kees Cook

[permalink] [raw]
Subject: [GIT PULL] gcc-plugin updates for v4.19-rc1

Hi Linus,

Please pull these gcc-plugin changes for v4.19-rc1. This has some Kconfig
and Makefile cleanups from Masahiro and myself, but the bulk of this
is the STACKLEAK plugin ported by Alexander Popov. As discussed in its
commit logs, it provides efficient stack content poisoning at syscall
exit. This creates a defense against several classes of flaws:

- uninitialized stack usage (while we continue to work on improving the
compiler to do this in other ways: e.g. unconditional zero init was
proposed to gcc and clang, and more plugin work has started too)

- stack content exposure (by greatly reducing the lifetime of valid stack
contents, exposures via either direct read bugs or unknown cache
side-channels become much more difficult to exploit. This complements
the existing buddy and heap poisoning options, but provides the coverage
for stacks)

- stack exhaustion/guard-page skipping (while we continue to work to
remove all VLAs in the kernel: of the ~115 cases found in v4.16, after
the v4.19 merge window we should be down to about 13 remaining, most of
them in crypto code, all of which have patches under review)

The x86 hooks are included in this series (which have been reviewed by
Ingo, Dave Hansen, and Thomas Gleixner), and have hopefully addressed
your concerns with regard to the size of assembly changes which are now
minimal. The arm64 hooks are expected to be coming through the arm64
tree during the v4.19 merge window as well (written by Laura Abbott and
reviewed by Mark Rutland and Will Deacon).

Thanks!

-Kees

The following changes since commit 7daf201d7fe8334e2d2364d4e8ed3394ec9af819:

Linux 4.18-rc2 (2018-06-24 20:54:29 +0800)

are available in the Git repository at:

https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git tags/gcc-plugins-v4.19-rc1

for you to fetch changes up to b1310d137bc578f0032b6b990628a366d5f0910e:

stackleak: Allow runtime disabling of kernel stack erasing (2018-07-26 09:04:15 -0700)

----------------------------------------------------------------
- Kconfig and Makefile clean ups (Masahiro Yamada, Kees Cook)
- Add STACKLEAK plugin, metrics, docs, knob and x86 hooks (Alexander Popov)

----------------------------------------------------------------
Alexander Popov (7):
gcc-plugins: Clean up the cgraph_create_edge* macros
x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls
gcc-plugins: Add STACKLEAK plugin for tracking the kernel stack
lkdtm: Add a test for STACKLEAK
fs/proc: Show STACKLEAK metrics in the /proc file system
doc: self-protection: Add information about STACKLEAK feature
stackleak: Allow runtime disabling of kernel stack erasing

Kees Cook (1):
gcc-plugins: Regularize Makefile.gcc-plugins

Masahiro Yamada (2):
gcc-plugins: remove unused GCC_PLUGIN_SUBDIR
gcc-plugins: split out Kconfig entries to scripts/gcc-plugins/Kconfig

Documentation/security/self-protection.rst | 23 +-
Documentation/sysctl/kernel.txt | 18 ++
Documentation/x86/x86_64/mm.txt | 2 +
arch/Kconfig | 147 +--------
arch/x86/Kconfig | 1 +
arch/x86/entry/calling.h | 14 +
arch/x86/entry/entry_32.S | 7 +
arch/x86/entry/entry_64.S | 3 +
arch/x86/entry/entry_64_compat.S | 5 +
arch/x86/kernel/dumpstack.c | 31 ++
drivers/misc/lkdtm/Makefile | 3 +
drivers/misc/lkdtm/core.c | 3 +
drivers/misc/lkdtm/lkdtm.h | 5 +
drivers/misc/lkdtm/stackleak.c | 146 +++++++++
fs/proc/base.c | 18 ++
include/linux/sched.h | 5 +
include/linux/stackleak.h | 35 +++
kernel/Makefile | 4 +
kernel/fork.c | 3 +
kernel/stackleak.c | 132 ++++++++
kernel/sysctl.c | 15 +-
scripts/Makefile.gcc-plugins | 47 ++-
scripts/gcc-plugins/Kconfig | 196 ++++++++++++
scripts/gcc-plugins/Makefile | 5 -
scripts/gcc-plugins/gcc-common.h | 26 +-
scripts/gcc-plugins/stackleak_plugin.c | 480 +++++++++++++++++++++++++++++
26 files changed, 1195 insertions(+), 179 deletions(-)
create mode 100644 drivers/misc/lkdtm/stackleak.c
create mode 100644 include/linux/stackleak.h
create mode 100644 kernel/stackleak.c
create mode 100644 scripts/gcc-plugins/Kconfig
create mode 100644 scripts/gcc-plugins/stackleak_plugin.c

--
Kees Cook
Pixel Security


2018-08-15 16:43:30

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] gcc-plugin updates for v4.19-rc1

On Mon, Aug 13, 2018 at 2:43 PM Kees Cook <[email protected]> wrote:
>
> Please pull these gcc-plugin changes for v4.19-rc1.

No.

It adds yet another BUG_ON() without having been merged.

I'm not pulling this. Dammit, have you learnt *nothing*?

I'm, disappointed in the whole feature, but I'm also tired of having
to go and even look for these things.

Then actually *finding* them makes me just pissed off.

Linus

2018-08-15 18:36:39

by Kees Cook

[permalink] [raw]
Subject: Re: [GIT PULL] gcc-plugin updates for v4.19-rc1

On Wed, Aug 15, 2018 at 9:41 AM, Linus Torvalds
<[email protected]> wrote:
> On Mon, Aug 13, 2018 at 2:43 PM Kees Cook <[email protected]> wrote:
>>
>> Please pull these gcc-plugin changes for v4.19-rc1.
>
> No.
>
> It adds yet another BUG_ON() without having been merged.
>
> I'm not pulling this. Dammit, have you learnt *nothing*?

I swear I'm doing my best. Are you speaking of
stackleak_check_alloca() or stackleak_erase()? These were both
discussed on the list, and we weren't able to come up with
alternatives: in both cases we're off the stack, and recovery is
seemingly impossible. What would you prefer in these cases? If I need
to take a hard line of "never BUG", how do I handle legitimate system
corruption? (i.e. I have interpreted this as different from narrowing
copy_*_user() usage: if we let execution continue, we'll just crash
somewhere else with likely less information on how to handle it.)

> I'm, disappointed in the whole feature, but I'm also tired of having
> to go and even look for these things.

I am trying to make these patches easier to review. I even made sure
to get Ingo's Ack and Alexander implemented additional features Ingo
suggested, before sending them your way, as Ingo has a very
conservative eye on.

> Then actually *finding* them makes me just pissed off.

I'm sorry we've disappointed you. I've been pushing back on patches
that use BUG (with, I think, good success), but there are cases where
our imagination fails us.

I'd really like to find a way for this plugin to be acceptable, given
the coverage is provides. Even if we solve stack initialization and
finish VLA removal, we still would benefit from something doing
post-syscall stack poisoning just to keep future cache attacks against
the stack minimized.

In the meantime, I will send the gcc-plugin cleanups separately...

-Kees

--
Kees Cook
Pixel Security

2018-08-15 19:05:16

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] gcc-plugin updates for v4.19-rc1

On Wed, Aug 15, 2018 at 11:35 AM Kees Cook <[email protected]> wrote:
>
> I swear I'm doing my best. Are you speaking of
> stackleak_check_alloca() or stackleak_erase()? These were both
> discussed on the list, and we weren't able to come up with
> alternatives: in both cases we're off the stack, and recovery is
> seemingly impossible.

Why do you even *test* that thing? Why don't you just allocate stack
and clear it.

Dammit, the whole f*cking point of this patch-set is to clear the
stack used. It is *not* supposed to do anything else. If the process
runs out of stack, that's caught by the vmalloc'ed stack.

And if you don't have vmalloc'ed stack, then clearly you don't care.

I refuse to take this kind of code that does stupid things, and then
*because* it does those initial stupid things it does even more stupid
things to correct for it.

I hated the thing to begin with, told people that there are better
approaches that don't have the downsides, got ignored, and then I'm
pushed crap.

No.

Linus

2018-08-15 19:45:33

by Alexander Popov

[permalink] [raw]
Subject: Re: [GIT PULL] gcc-plugin updates for v4.19-rc1

Hello Linus,

On 15.08.2018 22:04, Linus Torvalds wrote:
> On Wed, Aug 15, 2018 at 11:35 AM Kees Cook <[email protected]> wrote:
>>
>> I swear I'm doing my best. Are you speaking of
>> stackleak_check_alloca() or stackleak_erase()? These were both
>> discussed on the list, and we weren't able to come up with
>> alternatives: in both cases we're off the stack, and recovery is
>> seemingly impossible.
>
> Why do you even *test* that thing? Why don't you just allocate stack
> and clear it.
>
> Dammit, the whole f*cking point of this patch-set is to clear the
> stack used. It is *not* supposed to do anything else. If the process
> runs out of stack, that's caught by the vmalloc'ed stack.
>
> And if you don't have vmalloc'ed stack, then clearly you don't care.
>
> I refuse to take this kind of code that does stupid things, and then
> *because* it does those initial stupid things it does even more stupid
> things to correct for it.

Could you please have a look at the commit messages (or at the code)? You are
really arguing with wrong things! Let me correct Kees and give you the details.
Please don't be angry.

Again, this plugin provides two features: kernel stack erasing and blocking
Stack Clash (ability to jump over the guard page provided by VMAP_STACK).

So:

1. stackleak_erase() erases the stack. It has a BUG_ON() to detect
'task_struct.lowest_stack' corruption. It's not a security violation BUG(),
which you hate. We just don't want to erase wrong memory. We have discussed that
with Ingo and others.

2. stackleak_check_alloca() detects 'Stack Clash' and it does absolutely similar
things with VMAP_STACK and SCHED_STACK_END_CHECK. Having VMAP_STACK + STACKLEAK
+ THREAD_INFO_IN_TASK together protects us from all known stack depth overflows.

Yes, one day we will remove all VLA's from the mainline kernel. But STACKLEAK
plugin protects un-upstreamed code as well.


I've put so much effort (1.5 years) to polish it and make you, Ingo and others
satisfied!

Best regards,
Alexander

2018-08-15 19:47:27

by Kees Cook

[permalink] [raw]
Subject: Re: [GIT PULL] gcc-plugin updates for v4.19-rc1

On Wed, Aug 15, 2018 at 12:04 PM, Linus Torvalds
<[email protected]> wrote:
> On Wed, Aug 15, 2018 at 11:35 AM Kees Cook <[email protected]> wrote:
>>
>> I swear I'm doing my best. Are you speaking of
>> stackleak_check_alloca() or stackleak_erase()? These were both
>> discussed on the list, and we weren't able to come up with
>> alternatives: in both cases we're off the stack, and recovery is
>> seemingly impossible.
>
> Why do you even *test* that thing? Why don't you just allocate stack
> and clear it.

I feel like we're talking cross purposes. The BUG() cases were for
places where we detect that we're executing with an impossible stack
pointer. It seems like trying to recover from that would just hide the
corruption for a later time that would be much harder to debug. These
weren't left in here to upset you. :) I have tried to take your "make
it debuggable" declaration to heart.

> Dammit, the whole f*cking point of this patch-set is to clear the
> stack used. It is *not* supposed to do anything else. If the process
> runs out of stack, that's caught by the vmalloc'ed stack.

It also handles VLA abuse, since those could (and have in past
exploits) been used to jump over guard pages. If you're saying you
want to see VLAs entirely removed and this feature dropped from the
plugin before you'll accept it, that's what we can do. I was trying to
help things develop in parallel since we're now three releases into
removing VLAs and it continues to be slow work.

> And if you don't have vmalloc'ed stack, then clearly you don't care.

Agreed: this is why the plugin already does an "imply VMAP_STACK" for
Kconfig. Are you suggesting we should make it a hard "depends on
VMAP_STACK"?

> I refuse to take this kind of code that does stupid things, and then
> *because* it does those initial stupid things it does even more stupid
> things to correct for it.
>
> I hated the thing to begin with, told people that there are better
> approaches that don't have the downsides, got ignored, and then I'm
> pushed crap.

I tried to detail in the pull request how we absolutely did not ignore
you. Something like 15 people have been helping to remove VLAs, and
I've been testing both gcc's stack forced-initialization patch and
Clang's, and doing it via a plugin (none are really "there" yet).

-Kees

--
Kees Cook
Pixel Security

2018-08-15 20:21:06

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] gcc-plugin updates for v4.19-rc1

On Wed, Aug 15, 2018 at 12:45 PM Kees Cook <[email protected]> wrote:
>
> I feel like we're talking cross purposes. The BUG() cases were for
> places where we detect that we're executing with an impossible stack
> pointer. It seems like trying to recover from that would just hide the
> corruption for a later time that would be much harder to debug. These
> weren't left in here to upset you. :) I have tried to take your "make
> it debuggable" declaration to heart.

The thing is, BUG() is not debuggable.

I absolutely refuse to take any hardening patches at all that have
BUG() or panic() or similar machine-killing in it.

I care not one whit about the reason for them. In fact, if the reason
is stated as "it makes debugging easiler", then I fart in your general
direction and call your mother a hamster.

Dammit, I suspect you guys are "testing" this by running things in a
VM, and then a BUG() looks like a good thing to do.

But if people run things on real machines, then BUG() is absolutely
the last thing you EVER want to do for "debugging".

This is why I scanned your pull request for BUG() and similar. Because
I simply will not take "hardening" that kills the machine. That's a
hard requirement. No excuses, and absolutely zero exceptions.

After a year or two, when the hardening has actually been in place,
and you can say "hey, look, none of the warnings happened", I may be
ok with turning them into BUG() calls.

> It also handles VLA abuse, since those could (and have in past
> exploits) been used to jump over guard pages.

I really don't think that's a valid model at all. We need to get rid
of the VLA abuse, not make complex compiler plugins for it.

I thought VLA's were mostly gone. I think we can at this point almost
just mark them broken, or disable any code that uses them when people
enable the stack options.

Adding a few

depends on !SAFE_STACK

to the drivers or code that still uses VLA's and then making the stack plugin do

select SAFE_STACK

and simply refuse to compile alloca sounds reasonable to me.

People who then want some stack validation or clearing can either
decide they don't care about those pieces, or fix them one by one.

Linus

2018-08-15 20:57:44

by Kees Cook

[permalink] [raw]
Subject: Re: [GIT PULL] gcc-plugin updates for v4.19-rc1

On Wed, Aug 15, 2018 at 1:18 PM, Linus Torvalds
<[email protected]> wrote:
> I absolutely refuse to take any hardening patches at all that have
> BUG() or panic() or similar machine-killing in it.

Okay, mental model adjusted. :) It was only "strong discouraged" until now.

> I thought VLA's were mostly gone.

Yes. Out of the ~115 instances we counted when we started with v4.16,
we've chipped away at them pretty steadily. Right now there are two
"one-off"s that haven't been picked up by maintainers:

https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=vla/leftovers

and the remaining series against crypto, for which I am waiting on
further review for Herbert. All the really odd-ball crypto cases have
been handled (and are up for the merge window for v4.19), but there's
still some minor changes that Herbert is examining:

https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=vla/crypto

And after that, there's a single patch to move -Wvla up into the
top-level Makefile:

https://patchwork.kernel.org/patch/10489873/

So, we're basically done, but the timing with the merge window wasn't
great since crypto continues to get tweaked and has taken much longer
than I had expected.

-Kees

--
Kees Cook
Pixel Security

2018-08-15 21:20:29

by Alexander Popov

[permalink] [raw]
Subject: Re: [GIT PULL] gcc-plugin updates for v4.19-rc1

On 15.08.2018 23:56, Kees Cook wrote:
> On Wed, Aug 15, 2018 at 1:18 PM, Linus Torvalds <[email protected]> wrote:
>> I absolutely refuse to take any hardening patches at all that have
>> BUG() or panic() or similar machine-killing in it.
>
> Okay, mental model adjusted. :) It was only "strong discouraged" until now.

I've just got the insight, how to avoid having BUG_ON() in stackleak_erase().
If 'task_struct.lowest_stack' is corrupted, we can erase once starting from the
stack bottom and reset the 'lowest_stack' value.

>> I care not one whit about the reason for them. In fact, if the reason
>> is stated as "it makes debugging easiler", then I fart in your general
>> direction and call your mother a hamster.

That is plain harassment. I ask to *stop* it!

--
Alexander

2018-08-15 21:34:50

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] gcc-plugin updates for v4.19-rc1

On Wed, Aug 15, 2018 at 2:19 PM Alexander Popov <[email protected]> wrote:
>
> >> I fart in your general
> >> direction and call your mother a hamster.
>
> That is plain harassment. I ask to *stop* it!

The correct reply is

"Is there someone else up there we can talk to?"

just google for it if you haven't seen the Holy Grail.

(And I got the quote wrong too. I forgot about how your father smelt
of elderberries)

Linus

2018-08-16 18:07:05

by David Laight

[permalink] [raw]
Subject: RE: [GIT PULL] gcc-plugin updates for v4.19-rc1

From: Linus Torvalds
> Sent: 15 August 2018 21:19
...
> But if people run things on real machines, then BUG() is absolutely
> the last thing you EVER want to do for "debugging".

I'm not sure you want it on a live system either.
Live systems are where the 'hard' bugs show up.

I've just spent a couple of days pulling my hair out trying to work
out how to debug a customer system that was locking up solid when
running some new (and not completely testable by us) kernel code.

At 4am I suddenly realised that the distribution they are using
might be enabling 'panic_on_oops' by default.
Turning that off showed what was going wrong.

It wouldn't be as bad if Linux implemented 'dump to swap'.

For 'errors' that aren't completely fatal the system could
'fast shutdown' a lot of processes (maybe just refuse to schedule
them) while leaving enough running for fault diagnosis.
I'm not sure how you'd decide what to allow to run though.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2018-08-16 22:19:36

by Alexander Popov

[permalink] [raw]
Subject: Re: [GIT PULL] gcc-plugin updates for v4.19-rc1

Hello,

On 15.08.2018 23:56, Kees Cook wrote:
> On Wed, Aug 15, 2018 at 1:18 PM, Linus Torvalds
> <[email protected]> wrote:
>> I absolutely refuse to take any hardening patches at all that have
>> BUG() or panic() or similar machine-killing in it.
>
> Okay, mental model adjusted. :) It was only "strong discouraged" until now.
>
>> I thought VLA's were mostly gone.
...
> And after that, there's a single patch to move -Wvla up into the
> top-level Makefile:
>
> https://patchwork.kernel.org/patch/10489873/
>
> So, we're basically done

I've just sent the 15th version of the series with changes according to the
feedback from Linus:

1. BUG_ON() in stackleak_erase() is safely eliminated;

2. Stack Clash detection (alloca() check) is completely dropped, since global
'-Wvla' should arrive soon. stackleak_check_alloca() for arm64 is dropped as
well in a separate commit.

This version is rebased onto Linus' tree.

Best regards,
Alexander