2015-11-24 21:39:31

by Kees Cook

[permalink] [raw]
Subject: [PATCH 0/2] introduce post-init read-only memory

One of the easiest ways to protect the kernel from attack is to reduce
the internal attack surface exposed when a "write" flaw is available. By
making as much of the kernel read-only as possible, we reduce the
attack surface.

Many things are written to only during __init, and never changed
again. These cannot be made "const" since the compiler will do the wrong
thing (we do actually need to write to them). Instead, move these items
into a memory region that will be made read-only during mark_rodata_ro()
which happens after all kernel __init code has finished.

This introduces __read_only as a way to mark such memory, and uses it on
the x86 vDSO to kill an extant kernel exploitation method.

-Kees


2015-11-24 21:39:35

by Kees Cook

[permalink] [raw]
Subject: [PATCH 1/2] x86: introduce post-init read-only memory

One of the easiest ways to protect the kernel from attack is to reduce
the internal attack surface exposed when a "write" flaw is available. By
making as much of the kernel read-only as possible, we reduce the
attack surface.

Many things are written to only during __init, and never changed
again. These cannot be made "const" since the compiler will do the wrong
thing (we do actually need to write to them). Instead, move these items
into a memory region that will be made read-only during mark_rodata_ro()
which happens after all kernel __init code has finished.

This introduces __read_only as a way to mark such memory, and adds some
documentation about the existing __read_mostly marking.

Based on work by PaX Team and Brad Spengler.

Signed-off-by: Kees Cook <[email protected]>
---
arch/x86/include/asm/cache.h | 1 +
include/asm-generic/vmlinux.lds.h | 1 +
include/linux/cache.h | 15 +++++++++++++++
3 files changed, 17 insertions(+)

diff --git a/arch/x86/include/asm/cache.h b/arch/x86/include/asm/cache.h
index 48f99f15452e..982b21c5eb1f 100644
--- a/arch/x86/include/asm/cache.h
+++ b/arch/x86/include/asm/cache.h
@@ -8,6 +8,7 @@
#define L1_CACHE_BYTES (1 << L1_CACHE_SHIFT)

#define __read_mostly __attribute__((__section__(".data..read_mostly")))
+#define __read_only __attribute__((__section__(".data..read_only")))

#define INTERNODE_CACHE_SHIFT CONFIG_X86_INTERNODE_CACHE_SHIFT
#define INTERNODE_CACHE_BYTES (1 << INTERNODE_CACHE_SHIFT)
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index c4bd0e2c173c..998a09d7731c 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -256,6 +256,7 @@
.rodata : AT(ADDR(.rodata) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start_rodata) = .; \
*(.rodata) *(.rodata.*) \
+ *(.data..read_only) /* Read only after init */ \
*(__vermagic) /* Kernel version magic */ \
. = ALIGN(8); \
VMLINUX_SYMBOL(__start___tracepoints_ptrs) = .; \
diff --git a/include/linux/cache.h b/include/linux/cache.h
index 17e7e82d2aa7..b2967e711a75 100644
--- a/include/linux/cache.h
+++ b/include/linux/cache.h
@@ -12,10 +12,25 @@
#define SMP_CACHE_BYTES L1_CACHE_BYTES
#endif

+/*
+ * __read_mostly is used to keep rarely changing variables out of frequently
+ * updated cachelines. If an architecture doesn't support it, ignore the
+ * hint.
+ */
#ifndef __read_mostly
#define __read_mostly
#endif

+/*
+ * __read_only is used to mark things that are read-only after init (i.e.
+ * after mark_rodata_ro() has been called). These are effectively read-only,
+ * but may get written to during init, so can't live in .rodata (via "const").
+ * Hint to __read_mostly if the architecture hasn't wired this up.
+ */
+#ifndef __read_only
+#define __read_only __read_mostly
+#endif
+
#ifndef ____cacheline_aligned
#define ____cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES)))
#endif
--
1.9.1

2015-11-24 21:41:27

by Kees Cook

[permalink] [raw]
Subject: [PATCH 2/2] x86, vdso: mark vDSO read-only after init

The vDSO does not need to be writable after __init, so mark it as
__read_only. The result kills the exploit method of writing to the vDSO
from kernel space resulting in userspace executing the modified code,
as shown here to bypass SMEP restrictions: http://itszn.com/blog/?p=21

The memory map (with added vDSO address reporting) shows the vDSO moving
into read-only memory:

Before:
[ 0.143067] vDSO @ ffffffff82004000
[ 0.143551] vDSO @ ffffffff82006000
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffff81000000 16M pmd
0xffffffff81000000-0xffffffff81800000 8M ro PSE GLB x pmd
0xffffffff81800000-0xffffffff819f3000 1996K ro GLB x pte
0xffffffff819f3000-0xffffffff81a00000 52K ro NX pte
0xffffffff81a00000-0xffffffff81e00000 4M ro PSE GLB NX pmd
0xffffffff81e00000-0xffffffff81e05000 20K ro GLB NX pte
0xffffffff81e05000-0xffffffff82000000 2028K ro NX pte
0xffffffff82000000-0xffffffff8214f000 1340K RW GLB NX pte
0xffffffff8214f000-0xffffffff82281000 1224K RW NX pte
0xffffffff82281000-0xffffffff82400000 1532K RW GLB NX pte
0xffffffff82400000-0xffffffff83200000 14M RW PSE GLB NX pmd
0xffffffff83200000-0xffffffffc0000000 974M pmd

After:
[ 0.145062] vDSO @ ffffffff81da1000
[ 0.146057] vDSO @ ffffffff81da4000
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffff81000000 16M pmd
0xffffffff81000000-0xffffffff81800000 8M ro PSE GLB x pmd
0xffffffff81800000-0xffffffff819f3000 1996K ro GLB x pte
0xffffffff819f3000-0xffffffff81a00000 52K ro NX pte
0xffffffff81a00000-0xffffffff81e00000 4M ro PSE GLB NX pmd
0xffffffff81e00000-0xffffffff81e0b000 44K ro GLB NX pte
0xffffffff81e0b000-0xffffffff82000000 2004K ro NX pte
0xffffffff82000000-0xffffffff8214c000 1328K RW GLB NX pte
0xffffffff8214c000-0xffffffff8227e000 1224K RW NX pte
0xffffffff8227e000-0xffffffff82400000 1544K RW GLB NX pte
0xffffffff82400000-0xffffffff83200000 14M RW PSE GLB NX pmd
0xffffffff83200000-0xffffffffc0000000 974M pmd

Based on work by Brad Spengler.

Signed-off-by: Kees Cook <[email protected]>
---
arch/x86/entry/vdso/vdso2c.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/vdso/vdso2c.h b/arch/x86/entry/vdso/vdso2c.h
index 0224987556ce..677ce3ac4d34 100644
--- a/arch/x86/entry/vdso/vdso2c.h
+++ b/arch/x86/entry/vdso/vdso2c.h
@@ -140,7 +140,7 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
fprintf(outfile, "#include <asm/vdso.h>\n");
fprintf(outfile, "\n");
fprintf(outfile,
- "static unsigned char raw_data[%lu] __page_aligned_data = {",
+ "static unsigned char raw_data[%lu] __read_only __aligned(PAGE_SIZE) = {",
mapping_size);
for (j = 0; j < stripped_len; j++) {
if (j % 10 == 0)
@@ -150,7 +150,7 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
}
fprintf(outfile, "\n};\n\n");

- fprintf(outfile, "static struct page *pages[%lu];\n\n",
+ fprintf(outfile, "static struct page *pages[%lu] __read_only;\n\n",
mapping_size / 4096);

fprintf(outfile, "const struct vdso_image %s = {\n", name);
--
1.9.1

2015-11-25 00:34:37

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86: introduce post-init read-only memory

On Nov 24, 2015 1:38 PM, "Kees Cook" <[email protected]> wrote:
>
> One of the easiest ways to protect the kernel from attack is to reduce
> the internal attack surface exposed when a "write" flaw is available. By
> making as much of the kernel read-only as possible, we reduce the
> attack surface.
>
> Many things are written to only during __init, and never changed
> again. These cannot be made "const" since the compiler will do the wrong
> thing (we do actually need to write to them). Instead, move these items
> into a memory region that will be made read-only during mark_rodata_ro()
> which happens after all kernel __init code has finished.
>
> This introduces __read_only as a way to mark such memory, and adds some
> documentation about the existing __read_mostly marking.

Obligatory bikeshed: __ro_after_init, please. It's barely longer,
and it directly explains what's going on. __read_only makes me think
that it's really read-only and could, for example, actually be in ROM.

--Andy

2015-11-25 00:44:12

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86: introduce post-init read-only memory

On Tue, Nov 24, 2015 at 4:34 PM, Andy Lutomirski <[email protected]> wrote:
> On Nov 24, 2015 1:38 PM, "Kees Cook" <[email protected]> wrote:
>>
>> One of the easiest ways to protect the kernel from attack is to reduce
>> the internal attack surface exposed when a "write" flaw is available. By
>> making as much of the kernel read-only as possible, we reduce the
>> attack surface.
>>
>> Many things are written to only during __init, and never changed
>> again. These cannot be made "const" since the compiler will do the wrong
>> thing (we do actually need to write to them). Instead, move these items
>> into a memory region that will be made read-only during mark_rodata_ro()
>> which happens after all kernel __init code has finished.
>>
>> This introduces __read_only as a way to mark such memory, and adds some
>> documentation about the existing __read_mostly marking.
>
> Obligatory bikeshed: __ro_after_init, please. It's barely longer,
> and it directly explains what's going on. __read_only makes me think
> that it's really read-only and could, for example, actually be in ROM.

I'm fine with that. Anyone else want to chime in before I send a v2?

-Kees

--
Kees Cook
Chrome OS & Brillo Security

2015-11-25 00:54:47

by Michael Ellerman

[permalink] [raw]
Subject: Re: [kernel-hardening] Re: [PATCH 1/2] x86: introduce post-init read-only memory

On Tue, 2015-11-24 at 16:44 -0800, Kees Cook wrote:
> On Tue, Nov 24, 2015 at 4:34 PM, Andy Lutomirski <[email protected]> wrote:
> > On Nov 24, 2015 1:38 PM, "Kees Cook" <[email protected]> wrote:
> > >
> > > One of the easiest ways to protect the kernel from attack is to reduce
> > > the internal attack surface exposed when a "write" flaw is available. By
> > > making as much of the kernel read-only as possible, we reduce the
> > > attack surface.
> > >
> > > Many things are written to only during __init, and never changed
> > > again. These cannot be made "const" since the compiler will do the wrong
> > > thing (we do actually need to write to them). Instead, move these items
> > > into a memory region that will be made read-only during mark_rodata_ro()
> > > which happens after all kernel __init code has finished.
> > >
> > > This introduces __read_only as a way to mark such memory, and adds some
> > > documentation about the existing __read_mostly marking.
> >
> > Obligatory bikeshed: __ro_after_init, please. It's barely longer,
> > and it directly explains what's going on. __read_only makes me think
> > that it's really read-only and could, for example, actually be in ROM.
>
> I'm fine with that. Anyone else want to chime in before I send a v2?

I'm not clear on why this is x86 only?

It looks like it would work on any arch, or is there some toolchain
requirement?

cheers

2015-11-25 09:14:02

by Mathias Krause

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory

On 24 November 2015 at 22:38, Kees Cook <[email protected]> wrote:
> Many things are written to only during __init, and never changed
> again. These cannot be made "const" since the compiler will do the wrong
> thing (we do actually need to write to them). Instead, move these items
> into a memory region that will be made read-only during mark_rodata_ro()
> which happens after all kernel __init code has finished.
>
> This introduces __read_only as a way to mark such memory, and uses it on
> the x86 vDSO to kill an extant kernel exploitation method.

...just some random notes on the experience with kernels implementing
such a feature for quite a lot of locations, not just the vDSO.

While having that annotation makes perfect sense, not only from a
security perspective but also from a micro-optimization point of view
(much like the already existing __read_mostly annotation), it has its
drawbacks. Violating the "r/o after init" rule by writing to such
annotated variables from non-init code goes unnoticed as far as it
concerns the toolchain. Neither the compiler nor the linker will flag
that incorrect use. It'll just trap at runtime and that's bad.

I myself had some educating experience seeing my machine triple fault
when resuming from a S3 sleep. The root cause was a variable that was
annotated __read_only but that was (unnecessarily) modified during CPU
bring-up phase. Debugging that kind of problems is sort of a PITA, you
could imagine.

So, prior extending the usage of the __read_only annotation some
toolchain support is needed. Maybe a gcc plugin that'll warn/error on
code that writes to such a variable but is not __init itself. The
initify and checker plugins from the PaX patch might be worth to look
at for that purpose, as they're doing similar things already. Adding
such a check to sparse might be worth it, too.
A modpost check probably won't work as it's unable to tell if it's a
legitimate access (r/o) or a violation (/w access). So the gcc plugin
is the way to go, IMHO.


Regards,
Mathias

2015-11-25 10:06:45

by Clemens Ladisch

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory

Mathias Krause wrote:
> [...]
> So, prior extending the usage of the __read_only annotation some
> toolchain support is needed. Maybe a gcc plugin that'll warn/error on
> code that writes to such a variable but is not __init itself.

Or mark them as "const". This would require the initialization code to
cast it away, probably with a helper macro.


Regards,
Clemens

2015-11-25 11:06:17

by PaX Team

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory

On 25 Nov 2015 at 10:13, Mathias Krause wrote:

> I myself had some educating experience seeing my machine triple fault
> when resuming from a S3 sleep. The root cause was a variable that was
> annotated __read_only but that was (unnecessarily) modified during CPU
> bring-up phase. Debugging that kind of problems is sort of a PITA, you
> could imagine.

actually the kernel could silently recover from this given how the
page fault handler could easily determine that the fault address fell
into the data..read_only section and just silently undo the read-only
property, log the event to dmesg and retry the faulting access.

> So, prior extending the usage of the __read_only annotation some
> toolchain support is needed. Maybe a gcc plugin that'll warn/error on
> code that writes to such a variable but is not __init itself.

this is exactly what i suggested earlier in the constify thread ;).
note that this will produce false positives because __init* annotations
are not propagated everywhere they could be.

> The initify and checker plugins from the PaX patch might be worth to
> look at for that purpose, as they're doing similar things already.

one of our plans for initify is to add the discovery and propagation
of _init* annotations as well, it'd not only fix the false positives
mentioned above but also help reduce the kernel size (code/data/rodata).

2015-11-25 11:14:58

by PaX Team

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory

On 25 Nov 2015 at 11:06, Clemens Ladisch wrote:

> Mathias Krause wrote:
> > [...]
> > So, prior extending the usage of the __read_only annotation some
> > toolchain support is needed. Maybe a gcc plugin that'll warn/error on
> > code that writes to such a variable but is not __init itself.
>
> Or mark them as "const". This would require the initialization code to
> cast it away, probably with a helper macro.

no, that'd be undefined behaviour and in practice gcc would take advantage
of it and produce unintended (and quite broken) code. if the constified object
is modified from a different translation unit then the compiler is free
to assume that it can constant propagate its initialization value into uses,
completely breaking the code that (forcibly) writes to it.

however as a poor man's detector for such __read_only violations it's
possible to just make the object const temporarily (without casting away
the write attempts!), recompile the tree and see if any writes outside
__init functions pop up.

2015-11-25 15:03:42

by Kees Cook

[permalink] [raw]
Subject: Re: [kernel-hardening] Re: [PATCH 1/2] x86: introduce post-init read-only memory

On Tue, Nov 24, 2015 at 4:54 PM, Michael Ellerman <[email protected]> wrote:
> On Tue, 2015-11-24 at 16:44 -0800, Kees Cook wrote:
>> On Tue, Nov 24, 2015 at 4:34 PM, Andy Lutomirski <[email protected]> wrote:
>> > On Nov 24, 2015 1:38 PM, "Kees Cook" <[email protected]> wrote:
>> > >
>> > > One of the easiest ways to protect the kernel from attack is to reduce
>> > > the internal attack surface exposed when a "write" flaw is available. By
>> > > making as much of the kernel read-only as possible, we reduce the
>> > > attack surface.
>> > >
>> > > Many things are written to only during __init, and never changed
>> > > again. These cannot be made "const" since the compiler will do the wrong
>> > > thing (we do actually need to write to them). Instead, move these items
>> > > into a memory region that will be made read-only during mark_rodata_ro()
>> > > which happens after all kernel __init code has finished.
>> > >
>> > > This introduces __read_only as a way to mark such memory, and adds some
>> > > documentation about the existing __read_mostly marking.
>> >
>> > Obligatory bikeshed: __ro_after_init, please. It's barely longer,
>> > and it directly explains what's going on. __read_only makes me think
>> > that it's really read-only and could, for example, actually be in ROM.
>>
>> I'm fine with that. Anyone else want to chime in before I send a v2?
>
> I'm not clear on why this is x86 only?

I was initially looking at how __read_mostly got implemented, and it
seemed like section names were done on a per-arch basis. But it
doesn't seem like that needs to be true.

> It looks like it would work on any arch, or is there some toolchain
> requirement?

Given that the other sections are in the common linux.lds.h file, it
seems unlikely to me. I'll try it in an arch-agnostic way and see what
happens. :)

-Kees

--
Kees Cook
Chrome OS & Brillo Security

2015-11-25 17:26:50

by Kees Cook

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory

On Wed, Nov 25, 2015 at 1:13 AM, Mathias Krause <[email protected]> wrote:
> On 24 November 2015 at 22:38, Kees Cook <[email protected]> wrote:
>> Many things are written to only during __init, and never changed
>> again. These cannot be made "const" since the compiler will do the wrong
>> thing (we do actually need to write to them). Instead, move these items
>> into a memory region that will be made read-only during mark_rodata_ro()
>> which happens after all kernel __init code has finished.
>>
>> This introduces __read_only as a way to mark such memory, and uses it on
>> the x86 vDSO to kill an extant kernel exploitation method.
>
> ...just some random notes on the experience with kernels implementing
> such a feature for quite a lot of locations, not just the vDSO.
>
> While having that annotation makes perfect sense, not only from a
> security perspective but also from a micro-optimization point of view
> (much like the already existing __read_mostly annotation), it has its
> drawbacks. Violating the "r/o after init" rule by writing to such
> annotated variables from non-init code goes unnoticed as far as it
> concerns the toolchain. Neither the compiler nor the linker will flag
> that incorrect use. It'll just trap at runtime and that's bad.

Well, or, it's good: that's the point of it. Either from the
perspective of robustness or from that of security.

> I myself had some educating experience seeing my machine triple fault
> when resuming from a S3 sleep. The root cause was a variable that was
> annotated __read_only but that was (unnecessarily) modified during CPU
> bring-up phase. Debugging that kind of problems is sort of a PITA, you
> could imagine.

As PaX Team mentions, it should be easy to catch these traps and
report them. That could certainly be a nice addition.

> So, prior extending the usage of the __read_only annotation some
> toolchain support is needed. Maybe a gcc plugin that'll warn/error on
> code that writes to such a variable but is not __init itself. The
> initify and checker plugins from the PaX patch might be worth to look
> at for that purpose, as they're doing similar things already. Adding
> such a check to sparse might be worth it, too.
> A modpost check probably won't work as it's unable to tell if it's a
> legitimate access (r/o) or a violation (/w access). So the gcc plugin
> is the way to go, IMHO.

There are many more pieces to add to the annotation use, but I don't
want to risk getting us into a Catch-22. Emese's work on the plugin
side will see this usage and utility grow, and I think getting these
basic building blocks in place is the right place to start.

-Kees


--
Kees Cook
Chrome OS & Brillo Security

2015-11-25 17:31:58

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory

On 11/25/15 01:13, Mathias Krause wrote:
>
> While having that annotation makes perfect sense, not only from a
> security perspective but also from a micro-optimization point of view
> (much like the already existing __read_mostly annotation), it has its
> drawbacks. Violating the "r/o after init" rule by writing to such
> annotated variables from non-init code goes unnoticed as far as it
> concerns the toolchain. Neither the compiler nor the linker will flag
> that incorrect use. It'll just trap at runtime and that's bad.
>
> I myself had some educating experience seeing my machine triple fault
> when resuming from a S3 sleep. The root cause was a variable that was
> annotated __read_only but that was (unnecessarily) modified during CPU
> bring-up phase. Debugging that kind of problems is sort of a PITA, you
> could imagine.
>
> So, prior extending the usage of the __read_only annotation some
> toolchain support is needed. Maybe a gcc plugin that'll warn/error on
> code that writes to such a variable but is not __init itself. The
> initify and checker plugins from the PaX patch might be worth to look
> at for that purpose, as they're doing similar things already. Adding
> such a check to sparse might be worth it, too.
> A modpost check probably won't work as it's unable to tell if it's a
> legitimate access (r/o) or a violation (/w access). So the gcc plugin
> is the way to go, IMHO.
>

We should not wait for compile-time support, that doesn't make any
sense. What would be useful would be a way to override this on the
command line -- that way, if disabling RO or RO-after-init memory makes
something work, we have an instant diagnosis.

-hpa

2015-11-25 18:54:19

by Kees Cook

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory

On Wed, Nov 25, 2015 at 9:31 AM, H. Peter Anvin <[email protected]> wrote:
> On 11/25/15 01:13, Mathias Krause wrote:
>>
>> While having that annotation makes perfect sense, not only from a
>> security perspective but also from a micro-optimization point of view
>> (much like the already existing __read_mostly annotation), it has its
>> drawbacks. Violating the "r/o after init" rule by writing to such
>> annotated variables from non-init code goes unnoticed as far as it
>> concerns the toolchain. Neither the compiler nor the linker will flag
>> that incorrect use. It'll just trap at runtime and that's bad.
>>
>> I myself had some educating experience seeing my machine triple fault
>> when resuming from a S3 sleep. The root cause was a variable that was
>> annotated __read_only but that was (unnecessarily) modified during CPU
>> bring-up phase. Debugging that kind of problems is sort of a PITA, you
>> could imagine.
>>
>> So, prior extending the usage of the __read_only annotation some
>> toolchain support is needed. Maybe a gcc plugin that'll warn/error on
>> code that writes to such a variable but is not __init itself. The
>> initify and checker plugins from the PaX patch might be worth to look
>> at for that purpose, as they're doing similar things already. Adding
>> such a check to sparse might be worth it, too.
>> A modpost check probably won't work as it's unable to tell if it's a
>> legitimate access (r/o) or a violation (/w access). So the gcc plugin
>> is the way to go, IMHO.
>>
>
> We should not wait for compile-time support, that doesn't make any
> sense. What would be useful would be a way to override this on the
> command line -- that way, if disabling RO or RO-after-init memory makes
> something work, we have an instant diagnosis.

Seems easiest to have an arg just skip calling mark_rodata_ro(). I can add that.

-Kees

--
Kees Cook
Chrome OS & Brillo Security

2015-11-25 19:07:37

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory

On 11/25/2015 10:54 AM, Kees Cook wrote:
>>
>> We should not wait for compile-time support, that doesn't make any
>> sense. What would be useful would be a way to override this on the
>> command line -- that way, if disabling RO or RO-after-init memory makes
>> something work, we have an instant diagnosis.
>
> Seems easiest to have an arg just skip calling mark_rodata_ro(). I can add that.
>

Exactly.

-hpa

2015-11-25 23:05:55

by Michael Ellerman

[permalink] [raw]
Subject: Re: [kernel-hardening] Re: [PATCH 1/2] x86: introduce post-init read-only memory

On Wed, 2015-11-25 at 07:03 -0800, Kees Cook wrote:
> On Tue, Nov 24, 2015 at 4:54 PM, Michael Ellerman <[email protected]> wrote:
> > On Tue, 2015-11-24 at 16:44 -0800, Kees Cook wrote:
> > > On Tue, Nov 24, 2015 at 4:34 PM, Andy Lutomirski <[email protected]> wrote:
> > > > On Nov 24, 2015 1:38 PM, "Kees Cook" <[email protected]> wrote:
> > > > >
> > > > > One of the easiest ways to protect the kernel from attack is to reduce
> > > > > the internal attack surface exposed when a "write" flaw is available. By
> > > > > making as much of the kernel read-only as possible, we reduce the
> > > > > attack surface.
> > > > >
> > > > > Many things are written to only during __init, and never changed
> > > > > again. These cannot be made "const" since the compiler will do the wrong
> > > > > thing (we do actually need to write to them). Instead, move these items
> > > > > into a memory region that will be made read-only during mark_rodata_ro()
> > > > > which happens after all kernel __init code has finished.
> > > > >
> > > > > This introduces __read_only as a way to mark such memory, and adds some
> > > > > documentation about the existing __read_mostly marking.
> > > >
> > > > Obligatory bikeshed: __ro_after_init, please. It's barely longer,
> > > > and it directly explains what's going on. __read_only makes me think
> > > > that it's really read-only and could, for example, actually be in ROM.
> > >
> > > I'm fine with that. Anyone else want to chime in before I send a v2?
> >
> > I'm not clear on why this is x86 only?
>
> I was initially looking at how __read_mostly got implemented, and it
> seemed like section names were done on a per-arch basis. But it
> doesn't seem like that needs to be true.

Yeah I saw that too, but I couldn't see anything in the commit history that
explained why it was per-arch.

> > It looks like it would work on any arch, or is there some toolchain
> > requirement?
>
> Given that the other sections are in the common linux.lds.h file, it
> seems unlikely to me. I'll try it in an arch-agnostic way and see what
> happens. :)

That'd be great, I can test on powerpc, and build test other arches too.

cheers

2015-11-25 23:33:00

by Kees Cook

[permalink] [raw]
Subject: Re: [kernel-hardening] Re: [PATCH 1/2] x86: introduce post-init read-only memory

On Wed, Nov 25, 2015 at 3:05 PM, Michael Ellerman <[email protected]> wrote:
> On Wed, 2015-11-25 at 07:03 -0800, Kees Cook wrote:
>> On Tue, Nov 24, 2015 at 4:54 PM, Michael Ellerman <[email protected]> wrote:
>> > On Tue, 2015-11-24 at 16:44 -0800, Kees Cook wrote:
>> > > On Tue, Nov 24, 2015 at 4:34 PM, Andy Lutomirski <[email protected]> wrote:
>> > > > On Nov 24, 2015 1:38 PM, "Kees Cook" <[email protected]> wrote:
>> > > > >
>> > > > > One of the easiest ways to protect the kernel from attack is to reduce
>> > > > > the internal attack surface exposed when a "write" flaw is available. By
>> > > > > making as much of the kernel read-only as possible, we reduce the
>> > > > > attack surface.
>> > > > >
>> > > > > Many things are written to only during __init, and never changed
>> > > > > again. These cannot be made "const" since the compiler will do the wrong
>> > > > > thing (we do actually need to write to them). Instead, move these items
>> > > > > into a memory region that will be made read-only during mark_rodata_ro()
>> > > > > which happens after all kernel __init code has finished.
>> > > > >
>> > > > > This introduces __read_only as a way to mark such memory, and adds some
>> > > > > documentation about the existing __read_mostly marking.
>> > > >
>> > > > Obligatory bikeshed: __ro_after_init, please. It's barely longer,
>> > > > and it directly explains what's going on. __read_only makes me think
>> > > > that it's really read-only and could, for example, actually be in ROM.
>> > >
>> > > I'm fine with that. Anyone else want to chime in before I send a v2?
>> >
>> > I'm not clear on why this is x86 only?
>>
>> I was initially looking at how __read_mostly got implemented, and it
>> seemed like section names were done on a per-arch basis. But it
>> doesn't seem like that needs to be true.
>
> Yeah I saw that too, but I couldn't see anything in the commit history that
> explained why it was per-arch.

Best I was able to see was that architectures weren't (aren't?) using
the common RODATA section macros in their linker scripts. From a quick
inspection, I think these are all okay now.

-Kees

>
>> > It looks like it would work on any arch, or is there some toolchain
>> > requirement?
>>
>> Given that the other sections are in the common linux.lds.h file, it
>> seems unlikely to me. I'll try it in an arch-agnostic way and see what
>> happens. :)
>
> That'd be great, I can test on powerpc, and build test other arches too.
>
> cheers
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/



--
Kees Cook
Chrome OS & Brillo Security

2015-11-26 08:54:38

by Ingo Molnar

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory


* PaX Team <[email protected]> wrote:

> On 25 Nov 2015 at 10:13, Mathias Krause wrote:
>
> > I myself had some educating experience seeing my machine triple fault
> > when resuming from a S3 sleep. The root cause was a variable that was
> > annotated __read_only but that was (unnecessarily) modified during CPU
> > bring-up phase. Debugging that kind of problems is sort of a PITA, you
> > could imagine.

( Sidenote: I don't think a ro-faults typically result in triple faults, but yeah,
even having a regular oops (followed by a hang or reboot) during such an
undebuggable state of the system is a major PITA. )

> actually the kernel could silently recover from this given how the page fault
> handler could easily determine that the fault address fell into the
> data..read_only section and just silently undo the read-only property, log the
> event to dmesg and retry the faulting access.

So a safer method would be to decode the faulting instruction, to skip it by
fixing up the return RIP and to log the event. It would be mostly equivalent to
trying to write to ROM (which get ignored as well), so it's a recoverable (and
debuggable) event.

We have all the necessary code in place in the kprobes code, see
arch/x86/lib/insn.c, it's a simplified x86 decoder that knows about instruction
length (but not about semantics).

Simple skipping plus setting arithmetic flags to init value should be enough I
think: I don't think we use fancy instructions to write to ro variables, such as
PUSH/POP with other side effects. If such instructions exist we could minimally
extend the decoder to do those fixups as well - in addition to double checking
that we skip simple instructions only with no side effects.

Can you see any fragility in such a technique?

Thanks,

Ingo

2015-11-26 09:57:55

by PaX Team

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory

On 26 Nov 2015 at 9:54, Ingo Molnar wrote:

> * PaX Team <[email protected]> wrote:
>
> > actually the kernel could silently recover from this given how the page fault
> > handler could easily determine that the fault address fell into the
> > data..read_only section and just silently undo the read-only property, log the
> > event to dmesg and retry the faulting access.
>
> So a safer method would be to decode the faulting instruction, to skip it by
> fixing up the return RIP and to log the event. It would be mostly equivalent to
> trying to write to ROM (which get ignored as well), so it's a recoverable (and
> debuggable) event.

if by skipping you mean ignoring the write attempt then it's not a good idea
as it has a good chance to cause unexpected behaviour down the line.

e.g., imagine that the write was to a function pointer (even an entire ops
structure) or a boolean that controls some important feature for after-init
code. ignoring/dropping such writes could cause all kinds of logic bugs (if
not worse).

my somewhat related war story is that i once tried to constify machine_ops
(both the struct and the variable of the same name) directly and just forced
the writes in kvm/xen/etc via type casts. now i knew it was all undefined
behaviour but i didn't expect gcc to take advantage of it but it did (const
propagated the *initial* fptr values into the indirect calls by turning them
into direct calls) and which in turn prevented proper reboots for guests
(an event which obviously happens much later after init/boot to the great
puzzlement of end users and myself).

misusing __read_only and ignoring write attempts would effectively produce
the same misbehaviour as above so i strongly advise against it.

now i understand that the motivation is probably to preserve the read-only
property despite wrong use of __read_only but for this i'd suggest to simply
not make the silent recovery the default behaviour and enable it on the
kernel command line only. after all, this is expected to be a reproducible
problem and affected users can just reboot (in fact, they'd be advised to)
when it happens and get a proper report that they could then send back to lkml.

i also don't expect this to be a frequent problem as __read_only will have
to be handled carefully in every case and not just at the time of adding it
to a variable but also later during code evolution. as i suggested in the
constify plugin thread earlier, use of __read_only should be tied to compile
time checking of all uses of the affected variable to eliminate the entire
problem of problematic writes.

2015-11-26 10:42:36

by Ingo Molnar

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory


* PaX Team <[email protected]> wrote:

> On 26 Nov 2015 at 9:54, Ingo Molnar wrote:
>
> > * PaX Team <[email protected]> wrote:
> >
> > > actually the kernel could silently recover from this given how the page fault
> > > handler could easily determine that the fault address fell into the
> > > data..read_only section and just silently undo the read-only property, log the
> > > event to dmesg and retry the faulting access.
> >
> > So a safer method would be to decode the faulting instruction, to skip it by
> > fixing up the return RIP and to log the event. It would be mostly equivalent
> > to trying to write to ROM (which get ignored as well), so it's a recoverable
> > (and debuggable) event.
>
> if by skipping you mean ignoring the write attempt then it's not a good idea as
> it has a good chance to cause unexpected behaviour down the line.
>
> e.g., imagine that the write was to a function pointer (even an entire ops
> structure) or a boolean that controls some important feature for after-init
> code. ignoring/dropping such writes could cause all kinds of logic bugs (if not
> worse).

Well, the typical case is that it's a logic bug to _do_ the write: the structure
was marked readonly for a reason but some init code re-runs during suspend or so.

But yes, logic bugs might trigger - but that is true in the opposite case as well,
if we do the write despite it being marked readonly:

> my somewhat related war story is that i once tried to constify machine_ops (both
> the struct and the variable of the same name) directly and just forced the
> writes in kvm/xen/etc via type casts. now i knew it was all undefined behaviour
> but i didn't expect gcc to take advantage of it but it did (const propagated the
> *initial* fptr values into the indirect calls by turning them into direct calls)
> and which in turn prevented proper reboots for guests (an event which obviously
> happens much later after init/boot to the great puzzlement of end users and
> myself).
>
> misusing __read_only and ignoring write attempts would effectively produce the
> same misbehaviour as above so i strongly advise against it.

No, the difference to the GCC related aliasing bug is that with my technique the
kernel would immediately produce a very visible kernel warning, which is a very
clear sign that is wrong - and with a very clear backtrace in the warning that
points right to the problematic code - which signature shows us (and users) what
is wrong.

So your example is not comparable at all.

Plus the truly paranoid might panic/halt the system on such warnings, so for
highly secure systems there's a way to not even allow the possibility of logic
bugs. (at the cost of stopping the system when a bug triggers.)

Thanks,

Ingo

2015-11-26 12:15:09

by PaX Team

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory

On 26 Nov 2015 at 11:42, Ingo Molnar wrote:

> * PaX Team <[email protected]> wrote:
>
> > On 26 Nov 2015 at 9:54, Ingo Molnar wrote:
>
> > e.g., imagine that the write was to a function pointer (even an entire ops
> > structure) or a boolean that controls some important feature for after-init
> > code. ignoring/dropping such writes could cause all kinds of logic bugs (if not
> > worse).
>
> Well, the typical case is that it's a logic bug to _do_ the write: the structure
> was marked readonly for a reason but some init code re-runs during suspend or so.

that's actually not the typical case in my experience, but rather these two:

1. initial mistake: someone didn't actually check whether the given object can
be __read_only

2. code evolution: an object that was once written by __init code only (and
thus proactively subjected to __read_only) gets modified by non-init code
due to later changes

what you described above is a third case where there's a latent bug to begin
(unintended write) with that __read_only merely exposes but doesn't create
itself, unlike the two cases above (intended writes getting caught by wrong
use of __read_only).

> But yes, logic bugs might trigger - but that is true in the opposite case as well,
> if we do the write despite it being marked readonly:

not really, the two cases above are not a priori bugs, they become bugs due
to using __read_only without due care (which is why i suggested to detect
them at compile time).

> > misusing __read_only and ignoring write attempts would effectively produce the
> > same misbehaviour as above so i strongly advise against it.
>
> No, the difference to the GCC related aliasing bug is that with my technique the
> kernel would immediately produce a very visible kernel warning, which is a very
> clear sign that is wrong - and with a very clear backtrace in the warning that
> points right to the problematic code - which signature shows us (and users) what
> is wrong.

my proposal would produce the exact same reports, the difference is in letting
the write attempt succeed vs. skipping it. this latter step is what is wrong
since it introduces at least a logic bug the same way the constprop optimization
created a logic bug.

> Plus the truly paranoid might panic/halt the system on such warnings, so for
> highly secure systems there's a way to not even allow the possibility of logic
> bugs. (at the cost of stopping the system when a bug triggers.)

this would/should be the default behaviour, i.e., no attempt at being smart by
either allowing or skipping the faulting insn, just report the event and trigger
whatever fancy future exploit reaction mechanism will get into the kernel.

2015-11-26 16:12:06

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory

On Thu, Nov 26, 2015 at 12:54 AM, Ingo Molnar <[email protected]> wrote:
>
> * PaX Team <[email protected]> wrote:
>
>> On 25 Nov 2015 at 10:13, Mathias Krause wrote:
>>
>> > I myself had some educating experience seeing my machine triple fault
>> > when resuming from a S3 sleep. The root cause was a variable that was
>> > annotated __read_only but that was (unnecessarily) modified during CPU
>> > bring-up phase. Debugging that kind of problems is sort of a PITA, you
>> > could imagine.
>
> ( Sidenote: I don't think a ro-faults typically result in triple faults, but yeah,
> even having a regular oops (followed by a hang or reboot) during such an
> undebuggable state of the system is a major PITA. )
>
>> actually the kernel could silently recover from this given how the page fault
>> handler could easily determine that the fault address fell into the
>> data..read_only section and just silently undo the read-only property, log the
>> event to dmesg and retry the faulting access.
>
> So a safer method would be to decode the faulting instruction, to skip it by
> fixing up the return RIP and to log the event. It would be mostly equivalent to
> trying to write to ROM (which get ignored as well), so it's a recoverable (and
> debuggable) event.
>
> We have all the necessary code in place in the kprobes code, see
> arch/x86/lib/insn.c, it's a simplified x86 decoder that knows about instruction
> length (but not about semantics).
>
> Simple skipping plus setting arithmetic flags to init value should be enough I
> think: I don't think we use fancy instructions to write to ro variables, such as
> PUSH/POP with other side effects. If such instructions exist we could minimally
> extend the decoder to do those fixups as well - in addition to double checking
> that we skip simple instructions only with no side effects.
>
> Can you see any fragility in such a technique?
>

After Linus shot down my rdmsr/rwmsr decoding patch, good luck...

More seriously, though, I think this is mostly just like any other
in-kernel fault. We failed, me might be under attack, let's oops. In
the particular case of suspend/resume, we could consider a debug flag
to allow writes to these variables during suspend/resume. In fact,
that might even be a reasonable default. We might want to allow
writes during module unload as well.

For everything else, we should probably focus more on getting OOPSes
to display reliably, which is supposed to work but, on my shiny new
i915-based laptop, is clearly not ready yet (I oopsed it yesterday due
to my own bug and all I had to show for it was a blinking capslock
key, and yes, modesetting works).

--Andy

2015-11-27 08:00:11

by Ingo Molnar

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory


* Andy Lutomirski <[email protected]> wrote:

> > Can you see any fragility in such a technique?
>
> After Linus shot down my rdmsr/rwmsr decoding patch, good luck...

I think that case was entirely different, but I've Cc:-ed Linus to shoot my idea
down if it's crap.

> More seriously, though, I think this is mostly just like any other in-kernel
> fault. We failed, me might be under attack, let's oops. In the particular case
> of suspend/resume, we could consider a debug flag to allow writes to these
> variables during suspend/resume. In fact, that might even be a reasonable
> default. We might want to allow writes during module unload as well.

We are getting the _same_ information: we generate a reliable stack trace right
there. We don't ignore anything.

What my suggestion would do is to turn a 'sure system crasher' into a
'informational debug message'.

On today's typical desktop systems I can tell you with 110% confidence that the
vast majority of 'system crasher' oopses never reaches a kernel developer's
attention, because the oops message is not propagated to the user, while the 'dump
stack trace and try to continue' approach will result in proper bugzillas.

And that's really an important distinction IMHO. Getting debug info out of the
system is very important - and those who are paranoid can set a Kconfig value to
crash their systems on any hint of a problem.

> For everything else, we should probably focus more on getting OOPSes to display
> reliably, which is supposed to work but, on my shiny new i915-based laptop, is
> clearly not ready yet (I oopsed it yesterday due to my own bug and all I had to
> show for it was a blinking capslock key, and yes, modesetting works).

That's absolutely true as well but an independent issue: it does not invalidate my
argument that is based on the status quo, which is that the vast majority
panics/oopses, _especially_ during suspend/resume that was mentioned in this case,
does not reach any kernel developer.

Thanks,

Ingo

2015-11-27 08:06:22

by Ingo Molnar

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory


* PaX Team <[email protected]> wrote:

> On 26 Nov 2015 at 11:42, Ingo Molnar wrote:
>
> > * PaX Team <[email protected]> wrote:
> >
> > > On 26 Nov 2015 at 9:54, Ingo Molnar wrote:
> >
> > > e.g., imagine that the write was to a function pointer (even an entire ops
> > > structure) or a boolean that controls some important feature for after-init
> > > code. ignoring/dropping such writes could cause all kinds of logic bugs (if not
> > > worse).
> >
> > Well, the typical case is that it's a logic bug to _do_ the write: the structure
> > was marked readonly for a reason but some init code re-runs during suspend or so.
>
> that's actually not the typical case in my experience, but rather these two:
>
> 1. initial mistake: someone didn't actually check whether the given object can
> be __read_only
>
> 2. code evolution: an object that was once written by __init code only (and
> thus proactively subjected to __read_only) gets modified by non-init code
> due to later changes
>
> what you described above is a third case where there's a latent bug to begin
> (unintended write) with that __read_only merely exposes but doesn't create
> itself, unlike the two cases above (intended writes getting caught by wrong use
> of __read_only).

You are right, I concede this part of the argument - what you describe is probably
the most typical way to get ro-faults.

I do maintain the (sub-)argument that oopsing or relying on tooling help years
down the line is vastly inferior to fixing up the problem and generating a
one-time stack dump so that kernel developers have a chance to fix the bug. The
sooner we detect and dump such information the more likely it is that such bugs
don't get into end user kernel versions.

> my proposal would produce the exact same reports, the difference is in letting
> the write attempt succeed vs. skipping it. this latter step is what is wrong
> since it introduces at least a logic bug the same way the constprop optimization
> created a logic bug.

Yes, you are right and I agree.

Does anyone want to submit such a patch for upstream? Looks like a good change.

Thanks,

Ingo

2015-11-27 15:30:50

by PaX Team

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory

On 27 Nov 2015 at 9:05, Ingo Molnar wrote:

> * PaX Team <[email protected]> wrote:
>
> > On 26 Nov 2015 at 11:42, Ingo Molnar wrote:
> >
> > > * PaX Team <[email protected]> wrote:
> > that's actually not the typical case in my experience, but rather these two:
> >
> > 1. initial mistake: someone didn't actually check whether the given object can
> > be __read_only
> >
> > 2. code evolution: an object that was once written by __init code only (and
> > thus proactively subjected to __read_only) gets modified by non-init code
> > due to later changes
> >
> > what you described above is a third case where there's a latent bug to begin
> > (unintended write) with that __read_only merely exposes but doesn't create
> > itself, unlike the two cases above (intended writes getting caught by wrong use
> > of __read_only).
>
> You are right, I concede this part of the argument - what you describe is probably
> the most typical way to get ro-faults.
>
> I do maintain the (sub-)argument that oopsing or relying on tooling help years
> down the line is vastly inferior to fixing up the problem and generating a
> one-time stack dump so that kernel developers have a chance to fix the bug. The
> sooner we detect and dump such information the more likely it is that such bugs
> don't get into end user kernel versions.

i don't see the compile time vs. runtime detection as 'competing' approaches,
both have their own role. in general, i think it's safe to say that compile
time problem detection is preferred to the runtime one since it subjects less
users to the side effects of the bug. runtime detection is needed to augment
(even complete) the coverage that compile time detection may not be able to
provide.

that said, for __read_only related problems the compiler can actually do a
pretty good job, basically it could detect most of them except special cases
where the 'bad' write is somehow hidden from it. the only examples i recall
are like the one that Mathias already mentioned where the 'bad' write was
done from asm code or out-of-kernel code (think UEFI runtime services) that
is obviously not visible to the compiler (the resume/mmu_cr4_features problem
also happens to be an example where runtime detection did not help due to the
circumstances).

so let me summarize how i expect the runtime detection part to work:

1. in normal use any write attempt to read-only kernel data should only
be reported as usual (the oops info already has rip/cr2/backtrace),
but no smart recovery attempts should be made since they may end up
actually helping a real exploit attempt.

2. if necessary for debugging purposes (i.e., when the above reporting
mechanism didn't produce the necessary logs and the problem is
reproducible and wasn't an attack), a kernel command line option can
be used to make an attempt at smart recovery instead of oopsing (but
the same information would still be reported of course).

for this smart recovery we differ(ed?) in opinion, i say that allowing
the write in this case (vs. ignoring it) is the least likely to introduce
a logic bug (and its cascading effects) since the expected problem is
to be case #1 or #2 above (i.e., the write is intended but prevented
by __read_only).

2015-11-27 16:31:10

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory

On Fri, Nov 27, 2015 at 7:29 AM, PaX Team <[email protected]> wrote:
> On 27 Nov 2015 at 9:05, Ingo Molnar wrote:
>
>> * PaX Team <[email protected]> wrote:
>>
>> > On 26 Nov 2015 at 11:42, Ingo Molnar wrote:
>> >
>> > > * PaX Team <[email protected]> wrote:
>> > that's actually not the typical case in my experience, but rather these two:
>> >
>> > 1. initial mistake: someone didn't actually check whether the given object can
>> > be __read_only
>> >
>> > 2. code evolution: an object that was once written by __init code only (and
>> > thus proactively subjected to __read_only) gets modified by non-init code
>> > due to later changes
>> >
>> > what you described above is a third case where there's a latent bug to begin
>> > (unintended write) with that __read_only merely exposes but doesn't create
>> > itself, unlike the two cases above (intended writes getting caught by wrong use
>> > of __read_only).
>>
>> You are right, I concede this part of the argument - what you describe is probably
>> the most typical way to get ro-faults.
>>
>> I do maintain the (sub-)argument that oopsing or relying on tooling help years
>> down the line is vastly inferior to fixing up the problem and generating a
>> one-time stack dump so that kernel developers have a chance to fix the bug. The
>> sooner we detect and dump such information the more likely it is that such bugs
>> don't get into end user kernel versions.
>
> i don't see the compile time vs. runtime detection as 'competing' approaches,
> both have their own role. in general, i think it's safe to say that compile
> time problem detection is preferred to the runtime one since it subjects less
> users to the side effects of the bug. runtime detection is needed to augment
> (even complete) the coverage that compile time detection may not be able to
> provide.
>
> that said, for __read_only related problems the compiler can actually do a
> pretty good job, basically it could detect most of them except special cases
> where the 'bad' write is somehow hidden from it. the only examples i recall
> are like the one that Mathias already mentioned where the 'bad' write was
> done from asm code or out-of-kernel code (think UEFI runtime services) that
> is obviously not visible to the compiler (the resume/mmu_cr4_features problem
> also happens to be an example where runtime detection did not help due to the
> circumstances).
>
> so let me summarize how i expect the runtime detection part to work:
>
> 1. in normal use any write attempt to read-only kernel data should only
> be reported as usual (the oops info already has rip/cr2/backtrace),
> but no smart recovery attempts should be made since they may end up
> actually helping a real exploit attempt.
>
> 2. if necessary for debugging purposes (i.e., when the above reporting
> mechanism didn't produce the necessary logs and the problem is
> reproducible and wasn't an attack), a kernel command line option can
> be used to make an attempt at smart recovery instead of oopsing (but
> the same information would still be reported of course).
>
> for this smart recovery we differ(ed?) in opinion, i say that allowing
> the write in this case (vs. ignoring it) is the least likely to introduce
> a logic bug (and its cascading effects) since the expected problem is
> to be case #1 or #2 above (i.e., the write is intended but prevented
> by __read_only).
>

So maybe we should think about doing this recovery as part of oops
processing. That is, we oops as usual, but rather than killing the
task or spinning, we allow the post-oops code to try to recover (if
enabled). That recovery step decodes the instruction and takes some
action. In this example, if it's a write to ro-after-init memory,
then maybe we un-write-protect it and resume.

I agree with Linus' old sentiment as least insofar as trying to decode
things pre-OOPS is a bad idea: we don't want that decoding to
interfere with our primary goal, which is printing the OOPS.

I would argue that, if this is okay, we do exactly the same thing for
failed msr access. We still oops, but we try to march on.

We could consider a default setting in which we "recover" from oops
until init starts and then we revert to old behavior (oopses kill the
task).

--Andy

2015-11-27 18:00:40

by Linus Torvalds

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory

On Thu, Nov 26, 2015 at 11:59 PM, Ingo Molnar <[email protected]> wrote:
>
> * Andy Lutomirski <[email protected]> wrote:
>
>> > Can you see any fragility in such a technique?
>>
>> After Linus shot down my rdmsr/rwmsr decoding patch, good luck...
>
> I think that case was entirely different, but I've Cc:-ed Linus to shoot my idea
> down if it's crap.

Yeah, no, I hate it. I'm with the PaX team on this one - I think there
are three valid responses, and I think we might want to have a dynamic
config option (kernel command line or proc or whatever) to pick
between the two:

- just oops and kill the machine, like for any other unhandled kernel
page fault. This is probably what you should have on a server

- print a warning and a backtrace, and just mark the page read-write
so that the machine survives, but we get notified and can fix whatever
broken code

- have an option to disable the RO data logic.

I think that second option is good for debugging. In some places,
oopses that kill things are just too hard to debug (ie it might be the
modesetting or early boot or whatever).

In fact, I think we should _start_ with the second option - perhaps
just during the rc's - and then when we're pretty sure all the silly
bugs it finds (maybe none, who knows) are handled, we should go to the
first one.

The third option would be purely for "user that cannot fix things
directly and has reported the problem can now turn off the distracting
warning". We should never default to it.

Trying to actually *recover* any other way thanm by turning the area
read-write is just too damn fragile. You can't just skip over the
instruction that does the write - there are flags values etc that get
updated by read-modify-write instructions, but as PaX says, there nmay
also be subsequent logic that gets confused and actually introduces
even *more* problems downstream if the write is just discarded.

So maybe we could have some kind of "mark it read-only again later"
thing that tries to make sure it doesn't stay writable for a long
time, but quite frankly, I don't think it's worth it. Once the write
has been done, and the warning has been emitted, there's likely very
little upside to then trying to close the barn doors after that horse
has bolted.

Linus

2015-11-27 18:03:04

by Linus Torvalds

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory

On Fri, Nov 27, 2015 at 10:00 AM, Linus Torvalds
<[email protected]> wrote:
>
> - just oops and kill the machine, like for any other unhandled kernel
> page fault. This is probably what you should have on a server

Just to clarify: the "just oops" obviously doesn't have to kill the
machine, it depends on what your oops policy is, with the default
obviously being the normal "kill that particular thread" if at all
possible.

Machine-killing is appropriate in some secure situations, but most of
the time it just makes it too damn hard to debug since the error often
doesn't get logged. In some situations we obviously can't avoid it,
but..

Linus

2015-11-27 20:03:44

by Kees Cook

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory

On Fri, Nov 27, 2015 at 10:00 AM, Linus Torvalds
<[email protected]> wrote:
> On Thu, Nov 26, 2015 at 11:59 PM, Ingo Molnar <[email protected]> wrote:
>>
>> * Andy Lutomirski <[email protected]> wrote:
>>
>>> > Can you see any fragility in such a technique?
>>>
>>> After Linus shot down my rdmsr/rwmsr decoding patch, good luck...
>>
>> I think that case was entirely different, but I've Cc:-ed Linus to shoot my idea
>> down if it's crap.
>
> Yeah, no, I hate it. I'm with the PaX team on this one - I think there
> are three valid responses, and I think we might want to have a dynamic
> config option (kernel command line or proc or whatever) to pick
> between the two:
>
> - just oops and kill the machine, like for any other unhandled kernel
> page fault. This is probably what you should have on a server

This is how the v2 series works now.

> - print a warning and a backtrace, and just mark the page read-write
> so that the machine survives, but we get notified and can fix whatever
> broken code

This seems very easy to add. Should I basically reverse the effects of
mark_rodata_ro(), or should I only make the new ro-after-init section
as RW? (I think the former would be easier.)

> - have an option to disable the RO data logic.

I added this as "rodata=off" in the v2 series.

> I think that second option is good for debugging. In some places,
> oopses that kill things are just too hard to debug (ie it might be the
> modesetting or early boot or whatever).
>
> In fact, I think we should _start_ with the second option - perhaps
> just during the rc's - and then when we're pretty sure all the silly
> bugs it finds (maybe none, who knows) are handled, we should go to the
> first one.
>
> The third option would be purely for "user that cannot fix things
> directly and has reported the problem can now turn off the distracting
> warning". We should never default to it.
>
> Trying to actually *recover* any other way thanm by turning the area
> read-write is just too damn fragile. You can't just skip over the
> instruction that does the write - there are flags values etc that get
> updated by read-modify-write instructions, but as PaX says, there nmay
> also be subsequent logic that gets confused and actually introduces
> even *more* problems downstream if the write is just discarded.
>
> So maybe we could have some kind of "mark it read-only again later"
> thing that tries to make sure it doesn't stay writable for a long
> time, but quite frankly, I don't think it's worth it. Once the write
> has been done, and the warning has been emitted, there's likely very
> little upside to then trying to close the barn doors after that horse
> has bolted.
>
> Linus

-Kees

--
Kees Cook
Chrome OS & Brillo Security

2015-11-27 20:09:46

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory

On Fri, Nov 27, 2015 at 12:03 PM, Kees Cook <[email protected]> wrote:
> On Fri, Nov 27, 2015 at 10:00 AM, Linus Torvalds
> <[email protected]> wrote:
>> On Thu, Nov 26, 2015 at 11:59 PM, Ingo Molnar <[email protected]> wrote:
>>>
>>> * Andy Lutomirski <[email protected]> wrote:
>>>
>>>> > Can you see any fragility in such a technique?
>>>>
>>>> After Linus shot down my rdmsr/rwmsr decoding patch, good luck...
>>>
>>> I think that case was entirely different, but I've Cc:-ed Linus to shoot my idea
>>> down if it's crap.
>>
>> Yeah, no, I hate it. I'm with the PaX team on this one - I think there
>> are three valid responses, and I think we might want to have a dynamic
>> config option (kernel command line or proc or whatever) to pick
>> between the two:
>>
>> - just oops and kill the machine, like for any other unhandled kernel
>> page fault. This is probably what you should have on a server
>
> This is how the v2 series works now.
>
>> - print a warning and a backtrace, and just mark the page read-write
>> so that the machine survives, but we get notified and can fix whatever
>> broken code
>
> This seems very easy to add. Should I basically reverse the effects of
> mark_rodata_ro(), or should I only make the new ro-after-init section
> as RW? (I think the former would be easier.)

I'd suggest verifying that the page in question is
.data..ro_after_init and, if so, marking that one page RW.

--Andy

2015-11-29 08:05:12

by Ingo Molnar

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory


* Andy Lutomirski <[email protected]> wrote:

> >> - print a warning and a backtrace, and just mark the page read-write
> >> so that the machine survives, but we get notified and can fix whatever
> >> broken code
> >
> > This seems very easy to add. Should I basically reverse the effects of
> > mark_rodata_ro(), or should I only make the new ro-after-init section as RW?
> > (I think the former would be easier.)
>
> I'd suggest verifying that the page in question is .data..ro_after_init and, if
> so, marking that one page RW.

Yes, this was PaX's suggestion as well, and I agree: doing that turns a quite
possibly unrecoverable boot/shutdown time or suspend/resume time (suspend is
really a special category of 'bootup') crasher oops into a more informative stack
dump.

These ro related faults tend to trigger when init/deinit is running, and oopsing
in those sequences is typically a lot less survivable than say oopsing in a high
level system call while not holding locks.

Thanks,

Ingo

2015-11-29 08:09:01

by Ingo Molnar

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory


* PaX Team <[email protected]> wrote:

> i don't see the compile time vs. runtime detection as 'competing' approaches,
> both have their own role. [...]

That's true - but only as long as 'this can be solved in tooling!' is not used as
an excuse to oppose the runtime solution and we end up doing neither.

Thanks,

Ingo

2015-11-29 11:15:51

by PaX Team

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory

On 29 Nov 2015 at 9:08, Ingo Molnar wrote:

>
> * PaX Team <[email protected]> wrote:
>
> > i don't see the compile time vs. runtime detection as 'competing' approaches,
> > both have their own role. [...]
>
> That's true - but only as long as 'this can be solved in tooling!' is not used as
> an excuse to oppose the runtime solution and we end up doing neither.

actually, i already voiced my opinion elsewhere in the constify thread on
the kernel hardening list that adding/using __read_only is somewhat premature
without also adding the compile time verification part (as part of the
constify plugin for example). right now its use on the embedded vdso image
is simple and easy to verify but once people begin to add it to variables
that the compiler knows and cares about (say, the ops structures) then
things can become fragile without compile checking. so yes, i'd also advise
to get such tooling in *before* more __read_only usage is added.

2015-11-29 15:39:26

by Ingo Molnar

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory


* PaX Team <[email protected]> wrote:

> On 29 Nov 2015 at 9:08, Ingo Molnar wrote:
>
> >
> > * PaX Team <[email protected]> wrote:
> >
> > > i don't see the compile time vs. runtime detection as 'competing' approaches,
> > > both have their own role. [...]
> >
> > That's true - but only as long as 'this can be solved in tooling!' is not used as
> > an excuse to oppose the runtime solution and we end up doing neither.
>
> actually, i already voiced my opinion elsewhere in the constify thread on the
> kernel hardening list that adding/using __read_only is somewhat premature
> without also adding the compile time verification part (as part of the constify
> plugin for example). right now its use on the embedded vdso image is simple and
> easy to verify but once people begin to add it to variables that the compiler
> knows and cares about (say, the ops structures) then things can become fragile
> without compile checking. so yes, i'd also advise to get such tooling in
> *before* more __read_only usage is added.

I think you are mistaken there: if we add the page fault fixup to make sure we
don't crash if a read-only variable is accessed, then we'll have most of the
benefits of read-only mappings and no fragility - without having to wait for
tooling.

Thanks,

Ingo

2015-11-29 18:05:57

by Mathias Krause

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory

On 29 November 2015 at 16:39, Ingo Molnar <[email protected]> wrote:
>
> * PaX Team <[email protected]> wrote:
>
>> On 29 Nov 2015 at 9:08, Ingo Molnar wrote:
>>
>> >
>> > * PaX Team <[email protected]> wrote:
>> >
>> > > i don't see the compile time vs. runtime detection as 'competing' approaches,
>> > > both have their own role. [...]
>> >
>> > That's true - but only as long as 'this can be solved in tooling!' is not used as
>> > an excuse to oppose the runtime solution and we end up doing neither.
>>
>> actually, i already voiced my opinion elsewhere in the constify thread on the
>> kernel hardening list that adding/using __read_only is somewhat premature
>> without also adding the compile time verification part (as part of the constify
>> plugin for example). right now its use on the embedded vdso image is simple and
>> easy to verify but once people begin to add it to variables that the compiler
>> knows and cares about (say, the ops structures) then things can become fragile
>> without compile checking. so yes, i'd also advise to get such tooling in
>> *before* more __read_only usage is added.
>
> I think you are mistaken there: if we add the page fault fixup to make sure we
> don't crash if a read-only variable is accessed, then we'll have most of the
> benefits of read-only mappings and no fragility - without having to wait for
> tooling.

I guess the point PaX Team (and me earlier in this thread) wanted to
make is that having misuse detection *only* at run-time will make
those kind of bugs visible too late -- as late as on the wrong write
attempt actually happening. It would be much better to have the
compiler complain about invalid write statements already during
compilation, much like it does when one wants to assign some value to
a const object.

Having the page fault handler being able to recover from
__ro_after_init faults is a nice feature to support users, actually
being able to report bugs. But it shouldn't be the only way to detect
those kinds of bugs. In fact, we've tools in our toolchain that try to
detect and flag wrong usage of __init / __exit, so why not cover
__ro_after_init as well? Admitted, that won't be possible with modpost
in its current state, but would require a compiler extension instead.
Its current non-existence shouldn't be a show-stopper for
__ro_after_init but the very next step to take care of before
extending the usage of that annotation.

Just my 2ct,
Mathias

2015-11-30 08:01:10

by Ingo Molnar

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory


* Mathias Krause <[email protected]> wrote:

> On 29 November 2015 at 16:39, Ingo Molnar <[email protected]> wrote:
> >
> > * PaX Team <[email protected]> wrote:
> >
> >> On 29 Nov 2015 at 9:08, Ingo Molnar wrote:
> >>
> >> >
> >> > * PaX Team <[email protected]> wrote:
> >> >
> >> > > i don't see the compile time vs. runtime detection as 'competing' approaches,
> >> > > both have their own role. [...]
> >> >
> >> > That's true - but only as long as 'this can be solved in tooling!' is not used as
> >> > an excuse to oppose the runtime solution and we end up doing neither.
> >>
> >> actually, i already voiced my opinion elsewhere in the constify thread on the
> >> kernel hardening list that adding/using __read_only is somewhat premature
> >> without also adding the compile time verification part (as part of the constify
> >> plugin for example). right now its use on the embedded vdso image is simple and
> >> easy to verify but once people begin to add it to variables that the compiler
> >> knows and cares about (say, the ops structures) then things can become fragile
> >> without compile checking. so yes, i'd also advise to get such tooling in
> >> *before* more __read_only usage is added.
> >
> > I think you are mistaken there: if we add the page fault fixup to make sure we
> > don't crash if a read-only variable is accessed, then we'll have most of the
> > benefits of read-only mappings and no fragility - without having to wait for
> > tooling.
>
> I guess the point PaX Team (and me earlier in this thread) wanted to
> make is that having misuse detection *only* at run-time will make
> those kind of bugs visible too late -- as late as on the wrong write
> attempt actually happening. It would be much better to have the
> compiler complain about invalid write statements already during
> compilation, much like it does when one wants to assign some value to
> a const object.

Well, the runtime warning comes "later", that does not automatically transform
into "too late".

These things are relatively rare. Whether the warning is in tooling or runtime (or
both) is relatively immaterial in that regard: having it in tooling would be nice,
but it's not fool-proof either: say if tooling leaves an easy backdoor like
allowing a type cast to supress warnings then it turns into a hard to debug
problem again.

So I think we need both, starting with the runtime warning which is easy to
implement and can be merged right away.

But reality is that upstream tooling changes, especially ones involving compiler
changes move at a glacier's pace in comparison. Based on past experience I am also
pretty pessimistic: I'm 90% certain that this 'tooling feature' won't be
implemented in the next 10 years, so I'd like to concentrate on what we can do
here and now: the runtime warning and recovery without crashing.

If anyone sends patches for that I'll apply them.

Thanks,

Ingo

2015-11-30 21:15:21

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory

On 11/29/15 00:05, Ingo Molnar wrote:
>
> * Andy Lutomirski <[email protected]> wrote:
>
>>>> - print a warning and a backtrace, and just mark the page read-write
>>>> so that the machine survives, but we get notified and can fix whatever
>>>> broken code
>>>
>>> This seems very easy to add. Should I basically reverse the effects of
>>> mark_rodata_ro(), or should I only make the new ro-after-init section as RW?
>>> (I think the former would be easier.)
>>
>> I'd suggest verifying that the page in question is .data..ro_after_init and, if
>> so, marking that one page RW.
>
> Yes, this was PaX's suggestion as well, and I agree: doing that turns a quite
> possibly unrecoverable boot/shutdown time or suspend/resume time (suspend is
> really a special category of 'bootup') crasher oops into a more informative stack
> dump.
>
> These ro related faults tend to trigger when init/deinit is running, and oopsing
> in those sequences is typically a lot less survivable than say oopsing in a high
> level system call while not holding locks.
>

I think what should do is have a debug option which can be set to "rw",
"log" or "oops"; the latter should probably be the default.

-hpa

2015-11-30 21:33:07

by Kees Cook

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory

On Mon, Nov 30, 2015 at 1:14 PM, H. Peter Anvin <[email protected]> wrote:
> On 11/29/15 00:05, Ingo Molnar wrote:
>>
>> * Andy Lutomirski <[email protected]> wrote:
>>
>>>>> - print a warning and a backtrace, and just mark the page read-write
>>>>> so that the machine survives, but we get notified and can fix whatever
>>>>> broken code
>>>>
>>>> This seems very easy to add. Should I basically reverse the effects of
>>>> mark_rodata_ro(), or should I only make the new ro-after-init section as RW?
>>>> (I think the former would be easier.)
>>>
>>> I'd suggest verifying that the page in question is .data..ro_after_init and, if
>>> so, marking that one page RW.
>>
>> Yes, this was PaX's suggestion as well, and I agree: doing that turns a quite
>> possibly unrecoverable boot/shutdown time or suspend/resume time (suspend is
>> really a special category of 'bootup') crasher oops into a more informative stack
>> dump.
>>
>> These ro related faults tend to trigger when init/deinit is running, and oopsing
>> in those sequences is typically a lot less survivable than say oopsing in a high
>> level system call while not holding locks.
>>
>
> I think what should do is have a debug option which can be set to "rw",
> "log" or "oops"; the latter should probably be the default.

Can someone write that patch, and then I will include it in the
series? I haven't touched fault handler code, and it would be faster
if someone more familiar with that area did it. :)

-Kees

--
Kees Cook
Chrome OS & Brillo Security

2015-11-30 21:38:28

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory

On Mon, Nov 30, 2015 at 1:33 PM, Kees Cook <[email protected]> wrote:
> On Mon, Nov 30, 2015 at 1:14 PM, H. Peter Anvin <[email protected]> wrote:
>> On 11/29/15 00:05, Ingo Molnar wrote:
>>>
>>> * Andy Lutomirski <[email protected]> wrote:
>>>
>>>>>> - print a warning and a backtrace, and just mark the page read-write
>>>>>> so that the machine survives, but we get notified and can fix whatever
>>>>>> broken code
>>>>>
>>>>> This seems very easy to add. Should I basically reverse the effects of
>>>>> mark_rodata_ro(), or should I only make the new ro-after-init section as RW?
>>>>> (I think the former would be easier.)
>>>>
>>>> I'd suggest verifying that the page in question is .data..ro_after_init and, if
>>>> so, marking that one page RW.
>>>
>>> Yes, this was PaX's suggestion as well, and I agree: doing that turns a quite
>>> possibly unrecoverable boot/shutdown time or suspend/resume time (suspend is
>>> really a special category of 'bootup') crasher oops into a more informative stack
>>> dump.
>>>
>>> These ro related faults tend to trigger when init/deinit is running, and oopsing
>>> in those sequences is typically a lot less survivable than say oopsing in a high
>>> level system call while not holding locks.
>>>
>>
>> I think what should do is have a debug option which can be set to "rw",
>> "log" or "oops"; the latter should probably be the default.
>
> Can someone write that patch, and then I will include it in the
> series? I haven't touched fault handler code, and it would be faster
> if someone more familiar with that area did it. :)

I think I can do it in a week or two if no one beats me to it.

--Andy

2015-11-30 21:43:53

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory

On 11/30/15 13:33, Kees Cook wrote:
>>
>> I think what should do is have a debug option which can be set to "rw",
>> "log" or "oops"; the latter should probably be the default.
>
> Can someone write that patch, and then I will include it in the
> series? I haven't touched fault handler code, and it would be faster
> if someone more familiar with that area did it. :)
>

The "log" option (the only hard one) can be added as a later
enhancement, and probably should be. It shouldn't block the series.

-hpa