2014-02-14 01:07:05

by Kees Cook

[permalink] [raw]
Subject: [PATCH 0/2] ARM: mm: allow for stricter kernel memory perms

This series of patches allows the ARM kernel page tables to gain better
permission separation. With a fixed[1] CONFIG_ARM_PTDUMP enabled, you
can see the before and after in /sys/kernel/debug/kernel_page_tables.

Before:
---[ Kernel Mapping ]---
0xc0000000-0xc0800000 8M RW x SHD
0xc0800000-0xc1e00000 22M RW NX SHD
0xc2000000-0xc3000000 16M RW x SHD
0xc3800000-0xd1000000 216M RW x SHD
0xd1800000-0xef800000 480M RW x SHD

After:
---[ Kernel Mapping ]---
0xc0000000-0xc0100000 1M RW NX SHD
0xc0100000-0xc0700000 6M ro x SHD
0xc0700000-0xc0a00000 3M ro NX SHD
0xc0a00000-0xc1e00000 20M RW NX SHD
0xc2000000-0xc3000000 16M RW NX SHD
0xc3800000-0xd1000000 216M RW NX SHD
0xd1800000-0xef800000 480M RW NX SHD

This is available via CONFIG_ARM_KERNMEM_PERMS and CONFIG_DEBUG_RODATA.
The latter exists to match the x86 option of the same name, and is
left as a configurable since each additional region adds more potential
memory padding.

The series is based on earlier work from Brad Spengler, Larry Bassel,
and Laura Abbott.

Thanks,

-Kees

[1] these patches are needed to get the correct output:
https://lkml.org/lkml/2014/2/12/662
https://lkml.org/lkml/2014/2/12/663


2014-02-14 01:07:34

by Kees Cook

[permalink] [raw]
Subject: [PATCH 1/2] ARM: mm: allow for stricter kernel memory perms

Adds CONFIG_ARM_KERNMEM_PERMS to separate the kernel memory regions
into section-sized areas that can have different permisions. Performs
the permission changes during free_initmem.

This uses section size instead of PMD size to reduce memory caps on
non-LPAE systems.

Based on work by Brad Spengler, Larry Bassel, and Laura Abbott.

Signed-off-by: Kees Cook <[email protected]>
---
arch/arm/kernel/vmlinux.lds.S | 17 +++++++++
arch/arm/mm/Kconfig | 10 +++++
arch/arm/mm/init.c | 84 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 111 insertions(+)

diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 7bcee5c9b604..08fa667ef2f1 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -8,6 +8,9 @@
#include <asm/thread_info.h>
#include <asm/memory.h>
#include <asm/page.h>
+#ifdef CONFIG_ARM_KERNMEM_PERMS
+#include <asm/pgtable.h>
+#endif

#define PROC_INFO \
. = ALIGN(4); \
@@ -90,6 +93,11 @@ SECTIONS
_text = .;
HEAD_TEXT
}
+
+#ifdef CONFIG_ARM_KERNMEM_PERMS
+ . = ALIGN(1<<SECTION_SHIFT);
+#endif
+
.text : { /* Real text segment */
_stext = .; /* Text and read-only data */
__exception_text_start = .;
@@ -145,7 +153,11 @@ SECTIONS
_etext = .; /* End of text and rodata section */

#ifndef CONFIG_XIP_KERNEL
+# ifdef CONFIG_ARM_KERNMEM_PERMS
+ . = ALIGN(1<<SECTION_SHIFT);
+# else
. = ALIGN(PAGE_SIZE);
+# endif
__init_begin = .;
#endif
/*
@@ -220,7 +232,12 @@ SECTIONS
. = PAGE_OFFSET + TEXT_OFFSET;
#else
__init_end = .;
+
+#ifdef CONFIG_ARM_KERNMEM_PERMS
+ . = ALIGN(1<<SECTION_SHIFT);
+#else
. = ALIGN(THREAD_SIZE);
+#endif
__data_loc = .;
#endif

diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index 1f8fed94c2a4..999eb505faee 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -958,3 +958,13 @@ config ARCH_SUPPORTS_BIG_ENDIAN
help
This option specifies the architecture can support big endian
operation.
+
+config ARM_KERNMEM_PERMS
+ bool "Restrict kernel memory permissions"
+ help
+ If this is set, kernel text will be made RX, kernel data and stack
+ RW (otherwise all of the regions of the kernel 1-to-1 mapping
+ outside section boundaries remains RWX). The tradeoff is that each
+ region is padded to section-size (1MiB) boundaries (because their
+ permissions are different and splitting the 1M pages into 4K ones
+ causes TLB performance problems), wasting memory.
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 804d61566a53..f0b1df53f436 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -31,6 +31,11 @@
#include <asm/tlb.h>
#include <asm/fixmap.h>

+#ifdef CONFIG_ARM_KERNMEM_PERMS
+#include <asm/system_info.h>
+#include <asm/cp15.h>
+#endif
+
#include <asm/mach/arch.h>
#include <asm/mach/map.h>

@@ -621,11 +626,90 @@ void __init mem_init(void)
}
}

+#ifdef CONFIG_ARM_KERNMEM_PERMS
+struct section_perm {
+ unsigned long start;
+ unsigned long end;
+ pmdval_t prot;
+};
+
+struct section_perm __initdata section_perms[] = {
+ /* Make pages tables, etc before _stext RW (set NX). */
+ {
+ .start = PAGE_OFFSET,
+ .end = (unsigned long)_stext,
+ .prot = PMD_SECT_XN,
+ },
+ /* Make init RW (set NX). */
+ {
+ .start = (unsigned long)__init_begin,
+ .end = (unsigned long)_sdata,
+ .prot = PMD_SECT_XN,
+ },
+ /* Make kernel code and rodata RX (set RO). */
+ {
+ .start = (unsigned long)_stext,
+ .end = (unsigned long)__init_begin,
+#ifdef CONFIG_ARM_LPAE
+ .prot = PMD_SECT_RDONLY,
+#else
+ .prot = PMD_SECT_APX | PMD_SECT_AP_WRITE,
+#endif
+ },
+};
+
+static inline void section_update(unsigned long addr, pmdval_t prot)
+{
+ pmd_t *pmd = pmd_off_k(addr);
+
+#ifdef CONFIG_ARM_LPAE
+ pmd[0] = __pmd(pmd_val(pmd[0]) | prot);
+#else
+ if (addr & SECTION_SIZE)
+ pmd[1] = __pmd(pmd_val(pmd[1]) | prot);
+ else
+ pmd[0] = __pmd(pmd_val(pmd[0]) | prot);
+#endif
+ flush_pmd_entry(pmd);
+}
+
+static inline void fix_kernmem_perms(void)
+{
+ unsigned long addr;
+ int cpu_arch = cpu_architecture();
+ unsigned int i, cr = get_cr();
+
+ if (cpu_arch < CPU_ARCH_ARMv6 || !(cr & CR_XP))
+ return;
+
+ for (i = 0; i < ARRAY_SIZE(section_perms); i++) {
+ if (!IS_ALIGNED(section_perms[i].start, SECTION_SIZE) ||
+ !IS_ALIGNED(section_perms[i].end, SECTION_SIZE)) {
+ pr_err("BUG: section %lx-%lx not aligned to %lx\n",
+ section_perms[i].start, section_perms[i].end,
+ SECTION_SIZE);
+ continue;
+ }
+
+ for (addr = section_perms[i].start;
+ addr < section_perms[i].end;
+ addr += SECTION_SIZE)
+ section_update(addr, section_perms[i].prot);
+ }
+}
+#else
+static inline void fix_kernmem_perms(void) { }
+#endif /* CONFIG_ARM_KERNMEM_PERMS */
+
void free_initmem(void)
{
#ifdef CONFIG_HAVE_TCM
extern char __tcm_start, __tcm_end;
+#endif
+
+ fix_kernmem_perms();

+#ifdef CONFIG_HAVE_TCM
poison_init_mem(&__tcm_start, &__tcm_end - &__tcm_start);
free_reserved_area(&__tcm_start, &__tcm_end, -1, "TCM link");
#endif
--
1.7.9.5

2014-02-14 01:07:32

by Kees Cook

[permalink] [raw]
Subject: [PATCH 2/2] ARM: mm: keep rodata non-executable

Introduce "CONFIG_DEBUG_RODATA" to mostly match the x86 config, though
the behavior is different: it depends on STRICT_KERNMEM_PERMS, which
sets rodata read-only (but executable), where as this option additionally
splits rodata from the kernel text (resulting in potentially more memory
lost to padding) and sets it non-executable as well. The end result is
that on builds with CONFIG_DEBUG_RODATA=y (like x86) the rodata with be
marked purely read-only.

Signed-off-by: Kees Cook <[email protected]>
---
arch/arm/include/asm/cacheflush.h | 5 +++++
arch/arm/kernel/vmlinux.lds.S | 3 +++
arch/arm/mm/Kconfig | 12 ++++++++++++
arch/arm/mm/init.c | 8 ++++++++
4 files changed, 28 insertions(+)

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index e9a49fe0284e..2b058fc7a188 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -486,4 +486,9 @@ int set_memory_rw(unsigned long addr, int numpages);
int set_memory_x(unsigned long addr, int numpages);
int set_memory_nx(unsigned long addr, int numpages);

+#ifdef CONFIG_DEBUG_RODATA
+/* This has already happened during free_initmem. */
+static inline void mark_rodata_ro(void) { }
+#endif
+
#endif
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 08fa667ef2f1..ec79e7268e09 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -120,6 +120,9 @@ SECTIONS
ARM_CPU_KEEP(PROC_INFO)
}

+#ifdef CONFIG_DEBUG_RODATA
+ . = ALIGN(1<<SECTION_SHIFT);
+#endif
RO_DATA(PAGE_SIZE)

. = ALIGN(4);
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index 999eb505faee..7c8bbe7e2769 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -968,3 +968,15 @@ config ARM_KERNMEM_PERMS
region is padded to section-size (1MiB) boundaries (because their
permissions are different and splitting the 1M pages into 4K ones
causes TLB performance problems), wasting memory.
+
+config DEBUG_RODATA
+ bool "Split rodata from text and set it read-only/non-executable"
+ depends on ARM_KERNMEM_PERMS
+ default y
+ help
+ If this is set, rodata will be split from kernel text and made
+ non-executable. (This option already depends on the option
+ CONFIG_STRICT_KERNMEM_PERMS which makes rodata read-only, though
+ still executable.) This creates another section-size padded
+ region, so it can waste more memory space while gaining a pure
+ read-only rodata region.
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index f0b1df53f436..5b1b049501b9 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -656,6 +656,14 @@ struct section_perm __initdata section_perms[] = {
.prot = PMD_SECT_APX | PMD_SECT_AP_WRITE,
#endif
},
+#ifdef CONFIG_DEBUG_RODATA
+ /* Make rodata RO (set NX). */
+ {
+ .start = (unsigned long)__start_rodata,
+ .end = (unsigned long)__init_begin,
+ .prot = PMD_SECT_XN,
+ }
+#endif
};

static inline void section_update(unsigned long addr, pmdval_t prot)
--
1.7.9.5

2014-02-14 16:24:33

by Dave Martin

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: mm: keep rodata non-executable

On Thu, Feb 13, 2014 at 05:04:10PM -0800, Kees Cook wrote:
> Introduce "CONFIG_DEBUG_RODATA" to mostly match the x86 config, though
> the behavior is different: it depends on STRICT_KERNMEM_PERMS, which
> sets rodata read-only (but executable), where as this option additionally
> splits rodata from the kernel text (resulting in potentially more memory
> lost to padding) and sets it non-executable as well. The end result is
> that on builds with CONFIG_DEBUG_RODATA=y (like x86) the rodata with be
> marked purely read-only.

This triggers an Oops in kexec, because we have a block of code in .text
which is a template for generating baremetal code to relocate the new
kernel, and some literal words are written into it before copying.

Possibly this should be in .rodata, not .text.

There may be a few other instances of this kind of thing.

Are you aware of similar situations on other arches?

Cheers
---Dave

2014-02-14 19:11:11

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: mm: keep rodata non-executable

On Fri, Feb 14, 2014 at 8:22 AM, Dave Martin <[email protected]> wrote:
> On Thu, Feb 13, 2014 at 05:04:10PM -0800, Kees Cook wrote:
>> Introduce "CONFIG_DEBUG_RODATA" to mostly match the x86 config, though
>> the behavior is different: it depends on STRICT_KERNMEM_PERMS, which
>> sets rodata read-only (but executable), where as this option additionally
>> splits rodata from the kernel text (resulting in potentially more memory
>> lost to padding) and sets it non-executable as well. The end result is
>> that on builds with CONFIG_DEBUG_RODATA=y (like x86) the rodata with be
>> marked purely read-only.
>
> This triggers an Oops in kexec, because we have a block of code in .text
> which is a template for generating baremetal code to relocate the new
> kernel, and some literal words are written into it before copying.

You're writing into the text area? I would imagine that
CONFIG_ARM_KERNMEM_PERMS would break that. However, that's not the
right place to be building code -- shouldn't the module area be used
for that?

> Possibly this should be in .rodata, not .text.

Well, rodata should be neither writable nor executable.

> There may be a few other instances of this kind of thing.

This config will certainly find them! :) But, that's why it's behind a config.

> Are you aware of similar situations on other arches?

I think there were some problems a long time ago on x86 for rodata too.

-Kees

--
Kees Cook
Chrome OS Security

2014-02-17 12:35:33

by Dave Martin

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: mm: keep rodata non-executable

On Fri, Feb 14, 2014 at 11:11:07AM -0800, Kees Cook wrote:
> On Fri, Feb 14, 2014 at 8:22 AM, Dave Martin <[email protected]> wrote:
> > On Thu, Feb 13, 2014 at 05:04:10PM -0800, Kees Cook wrote:
> >> Introduce "CONFIG_DEBUG_RODATA" to mostly match the x86 config, though
> >> the behavior is different: it depends on STRICT_KERNMEM_PERMS, which
> >> sets rodata read-only (but executable), where as this option additionally
> >> splits rodata from the kernel text (resulting in potentially more memory
> >> lost to padding) and sets it non-executable as well. The end result is
> >> that on builds with CONFIG_DEBUG_RODATA=y (like x86) the rodata with be
> >> marked purely read-only.
> >
> > This triggers an Oops in kexec, because we have a block of code in .text
> > which is a template for generating baremetal code to relocate the new
> > kernel, and some literal words are written into it before copying.
>
> You're writing into the text area? I would imagine that
> CONFIG_ARM_KERNMEM_PERMS would break that. However, that's not the
> right place to be building code -- shouldn't the module area be used
> for that?
>
> > Possibly this should be in .rodata, not .text.
>
> Well, rodata should be neither writable nor executable.

We're not writing into code exactly.

This code is never executed in-place in vmlinux. It gets copied, and
only copies are ever executed.

Some pointers and offsets get poked into the code to configure it.

I think it would be better simply to put the code in .rodata, and
poke paramaters into the copy, not the original -- but that's a bit
more awkward to code up, since the values can't be poked simply by
writing global variables.

>
> > There may be a few other instances of this kind of thing.
>
> This config will certainly find them! :) But, that's why it's behind a config.

I haven't tested exhaustively, but it think this is sufficient for a
Tested-by. The patch does seem to be doing what it is intended to
do, and doesn't seem to be triggering false positives all over the
place.

>
> > Are you aware of similar situations on other arches?
>
> I think there were some problems a long time ago on x86 for rodata too.

It would be good to get this kexec case fixed -- I'll try to hack up
a separate patch.

Cheers
---Dave

2014-02-18 18:10:06

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: mm: keep rodata non-executable

On Mon, Feb 17, 2014 at 4:34 AM, Dave Martin <[email protected]> wrote:
> On Fri, Feb 14, 2014 at 11:11:07AM -0800, Kees Cook wrote:
>> On Fri, Feb 14, 2014 at 8:22 AM, Dave Martin <[email protected]> wrote:
>> > On Thu, Feb 13, 2014 at 05:04:10PM -0800, Kees Cook wrote:
>> >> Introduce "CONFIG_DEBUG_RODATA" to mostly match the x86 config, though
>> >> the behavior is different: it depends on STRICT_KERNMEM_PERMS, which
>> >> sets rodata read-only (but executable), where as this option additionally
>> >> splits rodata from the kernel text (resulting in potentially more memory
>> >> lost to padding) and sets it non-executable as well. The end result is
>> >> that on builds with CONFIG_DEBUG_RODATA=y (like x86) the rodata with be
>> >> marked purely read-only.
>> >
>> > This triggers an Oops in kexec, because we have a block of code in .text
>> > which is a template for generating baremetal code to relocate the new
>> > kernel, and some literal words are written into it before copying.
>>
>> You're writing into the text area? I would imagine that
>> CONFIG_ARM_KERNMEM_PERMS would break that. However, that's not the
>> right place to be building code -- shouldn't the module area be used
>> for that?
>>
>> > Possibly this should be in .rodata, not .text.
>>
>> Well, rodata should be neither writable nor executable.
>
> We're not writing into code exactly.
>
> This code is never executed in-place in vmlinux. It gets copied, and
> only copies are ever executed.
>
> Some pointers and offsets get poked into the code to configure it.
>
> I think it would be better simply to put the code in .rodata, and
> poke paramaters into the copy, not the original -- but that's a bit
> more awkward to code up, since the values can't be poked simply by
> writing global variables.

Okay, interesting. I'll be curious to see what the patch for this looks like.

>> > There may be a few other instances of this kind of thing.
>>
>> This config will certainly find them! :) But, that's why it's behind a config.
>
> I haven't tested exhaustively, but it think this is sufficient for a
> Tested-by. The patch does seem to be doing what it is intended to
> do, and doesn't seem to be triggering false positives all over the
> place.

Great, thanks for taking the time to check on it!

Should I send this to the patch tracker, or wait for more feedback?

>> > Are you aware of similar situations on other arches?
>>
>> I think there were some problems a long time ago on x86 for rodata too.
>
> It would be good to get this kexec case fixed -- I'll try to hack up
> a separate patch.

Thanks!

-Kees

--
Kees Cook
Chrome OS Security

2014-02-21 12:38:18

by Dave Martin

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: mm: keep rodata non-executable

On Tue, Feb 18, 2014 at 10:10:03AM -0800, Kees Cook wrote:
> On Mon, Feb 17, 2014 at 4:34 AM, Dave Martin <[email protected]> wrote:
> > On Fri, Feb 14, 2014 at 11:11:07AM -0800, Kees Cook wrote:
> >> On Fri, Feb 14, 2014 at 8:22 AM, Dave Martin <[email protected]> wrote:
> >> > On Thu, Feb 13, 2014 at 05:04:10PM -0800, Kees Cook wrote:
> >> >> Introduce "CONFIG_DEBUG_RODATA" to mostly match the x86 config, though
> >> >> the behavior is different: it depends on STRICT_KERNMEM_PERMS, which
> >> >> sets rodata read-only (but executable), where as this option additionally
> >> >> splits rodata from the kernel text (resulting in potentially more memory
> >> >> lost to padding) and sets it non-executable as well. The end result is
> >> >> that on builds with CONFIG_DEBUG_RODATA=y (like x86) the rodata with be
> >> >> marked purely read-only.
> >> >
> >> > This triggers an Oops in kexec, because we have a block of code in .text
> >> > which is a template for generating baremetal code to relocate the new
> >> > kernel, and some literal words are written into it before copying.
> >>
> >> You're writing into the text area? I would imagine that
> >> CONFIG_ARM_KERNMEM_PERMS would break that. However, that's not the
> >> right place to be building code -- shouldn't the module area be used
> >> for that?
> >>
> >> > Possibly this should be in .rodata, not .text.
> >>
> >> Well, rodata should be neither writable nor executable.
> >
> > We're not writing into code exactly.
> >
> > This code is never executed in-place in vmlinux. It gets copied, and
> > only copies are ever executed.
> >
> > Some pointers and offsets get poked into the code to configure it.
> >
> > I think it would be better simply to put the code in .rodata, and
> > poke paramaters into the copy, not the original -- but that's a bit
> > more awkward to code up, since the values can't be poked simply by
> > writing global variables.
>
> Okay, interesting. I'll be curious to see what the patch for this looks like.
>
> >> > There may be a few other instances of this kind of thing.
> >>
> >> This config will certainly find them! :) But, that's why it's behind a config.
> >
> > I haven't tested exhaustively, but it think this is sufficient for a
> > Tested-by. The patch does seem to be doing what it is intended to
> > do, and doesn't seem to be triggering false positives all over the
> > place.
>
> Great, thanks for taking the time to check on it!
>
> Should I send this to the patch tracker, or wait for more feedback?

It would be good if someone who's more familiar with the parms and
vmlinux.lds stuff could take a look at it, though I don't see any
obvious problem yet.

If you don't receive further comments, you could try reposting once
to alert people to the fact that you're still waiting.

Cheers
---Dave

2014-02-21 13:20:55

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: mm: keep rodata non-executable

On Fri, Feb 21, 2014 at 12:37:04PM +0000, Dave Martin wrote:
> It would be good if someone who's more familiar with the parms and
> vmlinux.lds stuff could take a look at it, though I don't see any
> obvious problem yet.

The biggest issue with it is that we end up with:

- the .text section rounded up to 1MB
- the .rodata section rounded up to 1MB

That means we can end up wasting up to 1MB of memory for each (in the
worst case where we encroach into the next 1MB aligned region by a few
bytes) and this memory can't be re-used.

The alternative is to adjust the maps such that we end up mapping the
.text / .rodata overlap 1MB using 4K pages, taking the additional TLB
hit by doing so.

The .text is aligned to 1MB, so the majority of the first 0x8000 to
0x100000 is unused. The end of the .text section is aligned to 1MB,
and the start of the .data section is also aligned to 1MB.

So, the minimum kernel size is: 0x100000 + MB_ALIGN(sizeof(.text)) +
MB_ALIGN(sizeof(.rodata)) + MB_ALIGN(sizeof(init sections)) + sizeof(.data)
- 0x8000

So, looking at this kernel I've recently built:

Idx Name Size VMA LMA File off Algn
0 .head.text 00000204 c0008000 c0008000 00008000 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE

--- .text this gets set to 0xc0100000, we lose 0xc0008240 to 0xc0100000

1 .text 006c4530 c0008240 c0008240 00008240 2**6
CONTENTS, ALLOC, LOAD, READONLY, CODE
2 .text.head 0000004c c06cc770 c06cc770 006cc770 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE

--- sizeof(.text) + sizeof(.text.head) becomes 0x700000
--- .rodata starts at 0xc0800000 instead of 0xc06cd000

3 .rodata 0022f568 c06cd000 c06cd000 006cd000 2**6
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 __bug_table 0000873c c08fc568 c08fc568 008fc568 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 .pci_fixup 00000030 c0904ca4 c0904ca4 00904ca4 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
6 __ksymtab 00008158 c0904cd4 c0904cd4 00904cd4 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
7 __ksymtab_gpl 00006858 c090ce2c c090ce2c 0090ce2c 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
8 __kcrctab 000040ac c0913684 c0913684 00913684 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
9 __kcrctab_gpl 0000342c c0917730 c0917730 00917730 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
10 __ksymtab_strings 00022a08 c091ab5c c091ab5c 0091ab5c 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
11 __param 00000c70 c093d564 c093d564 0093d564 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
12 __modver 00000e2c c093e1d4 c093e1d4 0093e1d4 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
13 __ex_table 00000f18 c093f000 c093f000 0093f000 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
14 .notes 00000024 c093ff18 c093ff18 0093ff18 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
15 .vectors 00000020 00000000 c0940000 00940000 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
16 .stubs 00000240 00001000 c0940020 00941000 2**5
CONTENTS, ALLOC, LOAD, READONLY, CODE
17 .init.text 00051760 c0940260 c0940260 00948260 2**5
CONTENTS, ALLOC, LOAD, READONLY, CODE
18 .exit.text 00002130 c09919c0 c09919c0 009999c0 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
19 .init.arch.info 00000108 c0993af0 c0993af0 0099baf0 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
20 .init.tagtable 00000048 c0993bf8 c0993bf8 0099bbf8 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
21 .init.smpalt 000032f8 c0993c40 c0993c40 0099bc40 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
22 .init.pv_table 00000314 c0996f38 c0996f38 0099ef38 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
23 .init.data 0000c19c c0997250 c0997250 0099f250 2**3
CONTENTS, ALLOC, LOAD, DATA
24 .data..percpu 000035c0 c09a4000 c09a4000 009ac000 2**6
CONTENTS, ALLOC, LOAD, DATA

--- sizeof previous sections is 0x2db000, which becomes 0x300000
--- start of .data becomes 0xc0b00000 instead of 0xc09a8000

25 .data 00062728 c09a8000 c09a8000 009b0000 2**6
CONTENTS, ALLOC, LOAD, DATA
26 .bss 00754870 c0a0a740 c0a0a740 00a12728 2**6
ALLOC

--- which means the kernel image finishes at 0xC12B6FB0 whereas it used
to finish at 0xC115EFB0.

27 .comment 00000011 00000000 00000000 00a12728 2**0
CONTENTS, READONLY
28 .ARM.attributes 00000010 00000000 00000000 00a12739 2**0
CONTENTS, READONLY

That's almost 1.5MB larger on an image size of 18MB. Percentage wise,
that sounds small, but the thing to realise is that growth is independent
of the image size, so a smaller image sees a larger %age wise growth in
its size.

People have already complained bitterly when I've said that stealing
memory and taking out out of memblock should always be 1MB aligned, so
/no one/ has the right to say "it's only 1.5MB, it doesn't matter"
because quite frankly they should've been saying that and supporting me
with the memblock issue. So, I really don't want to hear that argument!

However, if you look at where these boundaries are placed, they're not
quite in the right place. For example, the .init.data section is writable,
and so should be grouped with the .data section. So should .data..percpu.

Now, a few other things stand out from the above:

(a) .text.head - imx, sunxi and tegra need to fix that. There is no
specific meaning to it.

(b) .init.text is executable, and can't be in a NX region when it's
set as non-executable.

(c) we can't free the .init sections (sections 15 through up to and
including 23) anymore with this feature enabled because it's setup
as read-only memory.

--
FTTC broadband for 0.8mile line: 5.8Mbps down 500kbps up. Estimation
in database were 13.1 to 19Mbit for a good line, about 7.5+ for a bad.
Estimate before purchase was "up to 13.2Mbit".

2014-02-21 22:09:25

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: mm: keep rodata non-executable

On Fri, Feb 21, 2014 at 5:20 AM, Russell King - ARM Linux
<[email protected]> wrote:
> On Fri, Feb 21, 2014 at 12:37:04PM +0000, Dave Martin wrote:
>> It would be good if someone who's more familiar with the parms and
>> vmlinux.lds stuff could take a look at it, though I don't see any
>> obvious problem yet.
>
> The biggest issue with it is that we end up with:
>
> - the .text section rounded up to 1MB
> - the .rodata section rounded up to 1MB
>
> That means we can end up wasting up to 1MB of memory for each (in the
> worst case where we encroach into the next 1MB aligned region by a few
> bytes) and this memory can't be re-used.
>
> The alternative is to adjust the maps such that we end up mapping the
> .text / .rodata overlap 1MB using 4K pages, taking the additional TLB
> hit by doing so.
>
> The .text is aligned to 1MB, so the majority of the first 0x8000 to
> 0x100000 is unused. The end of the .text section is aligned to 1MB,
> and the start of the .data section is also aligned to 1MB.
>
> So, the minimum kernel size is: 0x100000 + MB_ALIGN(sizeof(.text)) +
> MB_ALIGN(sizeof(.rodata)) + MB_ALIGN(sizeof(init sections)) + sizeof(.data)
> - 0x8000
>
> So, looking at this kernel I've recently built:
>
> Idx Name Size VMA LMA File off Algn
> 0 .head.text 00000204 c0008000 c0008000 00008000 2**2
> CONTENTS, ALLOC, LOAD, READONLY, CODE
>
> --- .text this gets set to 0xc0100000, we lose 0xc0008240 to 0xc0100000
>
> 1 .text 006c4530 c0008240 c0008240 00008240 2**6
> CONTENTS, ALLOC, LOAD, READONLY, CODE
> 2 .text.head 0000004c c06cc770 c06cc770 006cc770 2**2
> CONTENTS, ALLOC, LOAD, READONLY, CODE
>
> --- sizeof(.text) + sizeof(.text.head) becomes 0x700000
> --- .rodata starts at 0xc0800000 instead of 0xc06cd000
>
> 3 .rodata 0022f568 c06cd000 c06cd000 006cd000 2**6
> CONTENTS, ALLOC, LOAD, READONLY, DATA
> 4 __bug_table 0000873c c08fc568 c08fc568 008fc568 2**0
> CONTENTS, ALLOC, LOAD, READONLY, DATA
> 5 .pci_fixup 00000030 c0904ca4 c0904ca4 00904ca4 2**2
> CONTENTS, ALLOC, LOAD, READONLY, DATA
> 6 __ksymtab 00008158 c0904cd4 c0904cd4 00904cd4 2**2
> CONTENTS, ALLOC, LOAD, READONLY, DATA
> 7 __ksymtab_gpl 00006858 c090ce2c c090ce2c 0090ce2c 2**2
> CONTENTS, ALLOC, LOAD, READONLY, DATA
> 8 __kcrctab 000040ac c0913684 c0913684 00913684 2**2
> CONTENTS, ALLOC, LOAD, READONLY, DATA
> 9 __kcrctab_gpl 0000342c c0917730 c0917730 00917730 2**2
> CONTENTS, ALLOC, LOAD, READONLY, DATA
> 10 __ksymtab_strings 00022a08 c091ab5c c091ab5c 0091ab5c 2**0
> CONTENTS, ALLOC, LOAD, READONLY, DATA
> 11 __param 00000c70 c093d564 c093d564 0093d564 2**2
> CONTENTS, ALLOC, LOAD, READONLY, DATA
> 12 __modver 00000e2c c093e1d4 c093e1d4 0093e1d4 2**2
> CONTENTS, ALLOC, LOAD, READONLY, DATA
> 13 __ex_table 00000f18 c093f000 c093f000 0093f000 2**3
> CONTENTS, ALLOC, LOAD, READONLY, DATA
> 14 .notes 00000024 c093ff18 c093ff18 0093ff18 2**2
> CONTENTS, ALLOC, LOAD, READONLY, CODE
> 15 .vectors 00000020 00000000 c0940000 00940000 2**2
> CONTENTS, ALLOC, LOAD, READONLY, CODE
> 16 .stubs 00000240 00001000 c0940020 00941000 2**5
> CONTENTS, ALLOC, LOAD, READONLY, CODE
> 17 .init.text 00051760 c0940260 c0940260 00948260 2**5
> CONTENTS, ALLOC, LOAD, READONLY, CODE
> 18 .exit.text 00002130 c09919c0 c09919c0 009999c0 2**2
> CONTENTS, ALLOC, LOAD, READONLY, CODE
> 19 .init.arch.info 00000108 c0993af0 c0993af0 0099baf0 2**2
> CONTENTS, ALLOC, LOAD, READONLY, DATA
> 20 .init.tagtable 00000048 c0993bf8 c0993bf8 0099bbf8 2**2
> CONTENTS, ALLOC, LOAD, READONLY, DATA
> 21 .init.smpalt 000032f8 c0993c40 c0993c40 0099bc40 2**2
> CONTENTS, ALLOC, LOAD, READONLY, DATA
> 22 .init.pv_table 00000314 c0996f38 c0996f38 0099ef38 2**0
> CONTENTS, ALLOC, LOAD, READONLY, DATA
> 23 .init.data 0000c19c c0997250 c0997250 0099f250 2**3
> CONTENTS, ALLOC, LOAD, DATA
> 24 .data..percpu 000035c0 c09a4000 c09a4000 009ac000 2**6
> CONTENTS, ALLOC, LOAD, DATA
>
> --- sizeof previous sections is 0x2db000, which becomes 0x300000
> --- start of .data becomes 0xc0b00000 instead of 0xc09a8000
>
> 25 .data 00062728 c09a8000 c09a8000 009b0000 2**6
> CONTENTS, ALLOC, LOAD, DATA
> 26 .bss 00754870 c0a0a740 c0a0a740 00a12728 2**6
> ALLOC
>
> --- which means the kernel image finishes at 0xC12B6FB0 whereas it used
> to finish at 0xC115EFB0.
>
> 27 .comment 00000011 00000000 00000000 00a12728 2**0
> CONTENTS, READONLY
> 28 .ARM.attributes 00000010 00000000 00000000 00a12739 2**0
> CONTENTS, READONLY
>
> That's almost 1.5MB larger on an image size of 18MB. Percentage wise,
> that sounds small, but the thing to realise is that growth is independent
> of the image size, so a smaller image sees a larger %age wise growth in
> its size.
>
> People have already complained bitterly when I've said that stealing
> memory and taking out out of memblock should always be 1MB aligned, so
> /no one/ has the right to say "it's only 1.5MB, it doesn't matter"
> because quite frankly they should've been saying that and supporting me
> with the memblock issue. So, I really don't want to hear that argument!

Absolutely, the memory waste is a problem. This is why I made sure to
leave the rodata as a separate config item. It seems like if people
want this feature, the cost is either going to be this kind of
alignment loss, or TLB hit switching to 4K pages. It seems like the
latter is significantly more "costly". Regardless, that's why it's all
behind config options.

> However, if you look at where these boundaries are placed, they're not
> quite in the right place. For example, the .init.data section is writable,
> and so should be grouped with the .data section. So should .data..percpu.
>
> Now, a few other things stand out from the above:
>
> (a) .text.head - imx, sunxi and tegra need to fix that. There is no
> specific meaning to it.
>
> (b) .init.text is executable, and can't be in a NX region when it's
> set as non-executable.
>
> (c) we can't free the .init sections (sections 15 through up to and
> including 23) anymore with this feature enabled because it's setup
> as read-only memory.

Perhaps I've misunderstood something, but I don't think b) and c) or
the comments about .init.data and .data..percpu are problems for this
series because of when the mapping happens. Until init finishes, these
sections are fully RXW. Only after init is done is the init section
made NX.

-Kees

--
Kees Cook
Chrome OS Security

2014-04-01 22:34:06

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: mm: keep rodata non-executable

On Mon, Mar 24, 2014 at 3:47 AM, Jon Medhurst (Tixy) <[email protected]> wrote:
> On Sun, 2014-03-23 at 16:21 -0600, Kees Cook wrote:
>> For this stage, how about I make this "depends on KEXEC=n &&
>> KPROBES=n"?
>
> There's also ftrace (CONFIG_DYNAMIC_FTRACE I believe) which modifies
> kernel code with a call to probe_kernel_write(), which GDB uses as well.
>
> And grepping for the patch_text() function also shows
> __arch_jump_label_transform() modifies kernel code. Not sure how and
> when that gets used.

Right, so, I'm trying to fix ftrace now, and I've hit a wall. It is as
if changes to the kernel text PMD aren't being noticed after the
kernel is running. Does anyone know why this might be happening?

Code and details here:
https://lkml.org/lkml/2014/4/1/674

-Kees

--
Kees Cook
Chrome OS Security

2014-04-01 22:54:46

by Laura Abbott

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: mm: keep rodata non-executable

On 4/1/2014 3:34 PM, Kees Cook wrote:
> On Mon, Mar 24, 2014 at 3:47 AM, Jon Medhurst (Tixy) <[email protected]> wrote:
>> On Sun, 2014-03-23 at 16:21 -0600, Kees Cook wrote:
>>> For this stage, how about I make this "depends on KEXEC=n &&
>>> KPROBES=n"?
>>
>> There's also ftrace (CONFIG_DYNAMIC_FTRACE I believe) which modifies
>> kernel code with a call to probe_kernel_write(), which GDB uses as well.
>>
>> And grepping for the patch_text() function also shows
>> __arch_jump_label_transform() modifies kernel code. Not sure how and
>> when that gets used.
>
> Right, so, I'm trying to fix ftrace now, and I've hit a wall. It is as
> if changes to the kernel text PMD aren't being noticed after the
> kernel is running. Does anyone know why this might be happening?
>
> Code and details here:
> https://lkml.org/lkml/2014/4/1/674
>
> -Kees
>

We had a flush_tlb_kernel_page after the pmd_flush in our out of tree code
which makes the text writeable in __patch_text.

Laura

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

2014-04-01 23:00:00

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 2/2] ARM: mm: keep rodata non-executable

On Tue, Apr 1, 2014 at 3:54 PM, Laura Abbott <[email protected]> wrote:
> On 4/1/2014 3:34 PM, Kees Cook wrote:
>> On Mon, Mar 24, 2014 at 3:47 AM, Jon Medhurst (Tixy) <[email protected]> wrote:
>>> On Sun, 2014-03-23 at 16:21 -0600, Kees Cook wrote:
>>>> For this stage, how about I make this "depends on KEXEC=n &&
>>>> KPROBES=n"?
>>>
>>> There's also ftrace (CONFIG_DYNAMIC_FTRACE I believe) which modifies
>>> kernel code with a call to probe_kernel_write(), which GDB uses as well.
>>>
>>> And grepping for the patch_text() function also shows
>>> __arch_jump_label_transform() modifies kernel code. Not sure how and
>>> when that gets used.
>>
>> Right, so, I'm trying to fix ftrace now, and I've hit a wall. It is as
>> if changes to the kernel text PMD aren't being noticed after the
>> kernel is running. Does anyone know why this might be happening?
>>
>> Code and details here:
>> https://lkml.org/lkml/2014/4/1/674
>>
>> -Kees
>>
>
> We had a flush_tlb_kernel_page after the pmd_flush in our out of tree code
> which makes the text writeable in __patch_text.

I tried flush_tlb_kernel_range(), which I'd expect to do the same
thing. I can try with _page() too.

-Kees

--
Kees Cook
Chrome OS Security