2009-07-08 23:10:48

by Siarhei Liakh

[permalink] [raw]
Subject: [PATCH v5] RO/NX protection for loadable kernel modules

This patch is a logical extension of the protection provided by
CONFIG_DEBUG_RODATA to LKMs. The protection is provided by splitting
module_core and module_init into three logical parts each and setting
appropriate page access permissions for each individual section:

1. Code: RO+X
2. RO data: RO+NX
3. RW data: RW+NX

In order to achieve proper protection, layout_sections() have been
modified to align each of the three parts mentioned above onto page
boundary. Next, the corresponding page access permissions are set
right before successful exit from load_module(). Further, free_module()
and sys_init_module have been modified to set module_core and
module_init as RW+NX right before calling module_free().

By default, the original section layout is preserved and RO/NX is
enforced only for whole pages of same content.
However, when compiled with CONFIG_DEBUG_RODATA=y, the patch
will page-align each group of sections to ensure that each page contains
only one type of content.

v1: Initial proof-of-concept patch.

v2: The patch have been re-written to reduce the number of #ifdefs and
to make it architecture-agnostic. Code formatting have been corrected also.

v3: Opportunistic RO/NX protectiuon is now unconditional. Section
page-alignment is enabled when CONFIG_DEBUG_RODATA=y.

v4: Removed most macros and improved coding style.

v5: Changed page-alignment and RO/NX section size calculation

The patch have been developed for Linux 2.6.30 by Siarhei Liakh
<[email protected]> and Xuxian Jiang <[email protected]>.

---

Signed-off-by: Siarhei Liakh <[email protected]>
Signed-off-by: Xuxian Jiang <[email protected]>
Acked-by: Arjan van de Ven <[email protected]>

diff --git a/include/linux/module.h b/include/linux/module.h
index 627ac08..5ba770e 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -293,6 +293,9 @@ struct module
/* The size of the executable code in each section. */
unsigned int init_text_size, core_text_size;

+ /* Size of RO sections of the module (text+rodata) */
+ unsigned int init_ro_size, core_ro_size;
+
/* Arch-specific module values */
struct mod_arch_specific arch;

diff --git a/kernel/module.c b/kernel/module.c
index e797812..a51e5aa 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -52,6 +52,7 @@
#include <linux/ftrace.h>
#include <linux/async.h>
#include <linux/percpu.h>
+#include <linux/pfn.h>

#if 0
#define DEBUGP printk
@@ -63,6 +64,22 @@
#define ARCH_SHF_SMALL 0
#endif

+/* Modules' sections will be aligned on page boundaries
+ * to ensure complete separation of code and data, but
+ * only when CONFIG_DEBUG_RODATA=y */
+#ifdef CONFIG_DEBUG_RODATA
+#define debug_align(X) ALIGN(X, PAGE_SIZE)
+#else
+#define debug_align(X) (X)
+#endif
+
+/* Given BASE and SIZE this macro calculates the number of pages the
+ * memory regions occupies */
+#define NUMBER_OF_PAGES(BASE, SIZE) ((SIZE > 0) ? \
+ (PFN_DOWN((unsigned long)BASE + SIZE - 1) - \
+ PFN_DOWN((unsigned long)BASE) + 1) \
+ : (0UL))
+
/* If this is set, the section belongs in the init part of the module */
#define INIT_OFFSET_MASK (1UL << (BITS_PER_LONG-1))

@@ -1471,6 +1488,93 @@ static int __unlink_module(void *_mod)
return 0;
}

+/* LKM RO/NX protection: protect module's text/ro-data
+ * from modification and any data from execution.
+ * Siarhei Liakh, Xuxian Jiang */
+static void set_section_ro_nx(void *base,
+ unsigned long text_size,
+ unsigned long ro_size,
+ unsigned long total_size)
+{
+ /* begin and end PFNs of the current subsection */
+ unsigned long begin_pfn;
+ unsigned long end_pfn;
+
+ /* Initially, all module sections have RWX permissions*/
+
+ DEBUGP("PROTECTING MODULE SECTION: 0x%lx\n"
+ " text size: %lu\n"
+ " ro size: %lu\n"
+ " total size: %lu\n",
+ (unsigned long)base,
+ text_size, ro_size, total_size);
+
+ /* Set RO for module text and RO-data*/
+ if (ro_size > 0) {
+ /* Always protect first page.
+ * Do not protect last partial page */
+ begin_pfn = PFN_DOWN((unsigned long)base);
+ end_pfn = PFN_DOWN((unsigned long)base + ro_size);
+
+ /*Set text RO if there are still pages between begin and end*/
+ if (end_pfn > begin_pfn) {
+ DEBUGP(" RO: 0x%lx %lu\n",
+ begin_pfn << PAGE_SHIFT,
+ end_pfn - begin_pfn);
+ set_memory_ro(begin_pfn << PAGE_SHIFT,
+ end_pfn - begin_pfn);
+ } else {
+ DEBUGP(" RO: less than a page, not enforcing.\n");
+ }
+ } else {
+ DEBUGP(" RO: section not present.\n");
+ }
+
+ /* Set NX permissions for module data */
+ if (total_size > text_size) {
+ /* Do not protect first partial page
+ * Always protect last page. */
+ begin_pfn = PFN_UP((unsigned long)base + text_size);
+ end_pfn = PFN_UP((unsigned long)base + total_size);
+
+ /*Set data NX if there are still pages between begin and end*/
+ if (end_pfn > begin_pfn) {
+ DEBUGP(" NX: 0x%lx %lu\n",
+ begin_pfn << PAGE_SHIFT,
+ end_pfn - begin_pfn);
+ set_memory_nx(begin_pfn << PAGE_SHIFT,
+ end_pfn - begin_pfn);
+ } else {
+ DEBUGP(" NX: less than a page, not enforcing.\n");
+ }
+ } else {
+ DEBUGP(" NX: section not present.\n");
+ }
+}
+
+/* Setting memory back to RW+NX before releasing it */
+void unset_section_ro_nx(struct module *mod, void *module_region)
+{
+ unsigned long total_pages;
+
+ if (mod->module_core == module_region) {
+ /* Set core as NX+RW */
+ total_pages = NUMBER_OF_PAGES(mod->module_core, mod->core_size);
+ DEBUGP("Restoring RW+NX for module's CORE: 0x%lx %lu\n",
+ (unsigned long)mod->module_core, total_pages);
+ set_memory_nx((unsigned long)mod->module_core, total_pages);
+ set_memory_rw((unsigned long)mod->module_core, total_pages);
+
+ } else if (mod->module_init == module_region) {
+ /* Set init as NX+RW */
+ total_pages = NUMBER_OF_PAGES(mod->module_init, mod->init_size);
+ DEBUGP("Restoring RW+NX for module's INIT: 0x%lx %lu\n",
+ (unsigned long)mod->module_init, total_pages);
+ set_memory_nx((unsigned long)mod->module_init, total_pages);
+ set_memory_rw((unsigned long)mod->module_init, total_pages);
+ }
+}
+
/* Free a module, remove from lists, etc (must hold module_mutex). */
static void free_module(struct module *mod)
{
@@ -1493,6 +1597,7 @@ static void free_module(struct module *mod)
ftrace_release(mod->module_core, mod->core_size);

/* This may be NULL, but that's OK */
+ unset_section_ro_nx(mod, mod->module_init);
module_free(mod, mod->module_init);
kfree(mod->args);
if (mod->percpu)
@@ -1505,6 +1610,7 @@ static void free_module(struct module *mod)
lockdep_free_key_range(mod->module_core, mod->core_size);

/* Finally, free the core (containing the module structure) */
+ unset_section_ro_nx(mod, mod->module_core);
module_free(mod, mod->module_core);
}

@@ -1678,8 +1784,20 @@ static void layout_sections(struct module *mod,
s->sh_entsize = get_offset(mod, &mod->core_size, s, i);
DEBUGP("\t%s\n", secstrings + s->sh_name);
}
- if (m == 0)
+ switch (m) {
+ case 0: /* executable */
+ mod->core_size = debug_align(mod->core_size);
mod->core_text_size = mod->core_size;
+ break;
+ case 1: /* RO: text and ro-data */
+ mod->core_size = debug_align(mod->core_size);
+ mod->core_ro_size = mod->core_size;
+ break;
+ case 3: /* whole module core (executable + RO data +
+ * RW data + small alloc) */
+ mod->core_size = debug_align(mod->core_size);
+ break;
+ }
}

DEBUGP("Init section allocation order:\n");
@@ -1696,8 +1814,20 @@ static void layout_sections(struct module *mod,
| INIT_OFFSET_MASK);
DEBUGP("\t%s\n", secstrings + s->sh_name);
}
- if (m == 0)
+ switch (m) {
+ case 0: /* executable */
+ mod->init_size = debug_align(mod->init_size);
mod->init_text_size = mod->init_size;
+ break;
+ case 1: /* RO: text and ro-data */
+ mod->init_size = debug_align(mod->init_size);
+ mod->init_ro_size = mod->init_size;
+ break;
+ case 3: /* whole module init (executable + RO data +
+ * RW data + small alloc) */
+ mod->init_size = debug_align(mod->init_size);
+ break;
+ }
}
}

@@ -2291,6 +2421,18 @@ static noinline struct module *load_module(void
__user *umod,
/* Get rid of temporary copy */
vfree(hdr);

+ /* Set RO and NX regions for core */
+ set_section_ro_nx(mod->module_core,
+ mod->core_text_size,
+ mod->core_ro_size,
+ mod->core_size);
+
+ /* Set RO and NX regions for init */
+ set_section_ro_nx(mod->module_init,
+ mod->init_text_size,
+ mod->init_ro_size,
+ mod->init_size);
+
/* Done! */
return mod;

@@ -2394,6 +2536,7 @@ SYSCALL_DEFINE3(init_module, void __user *, umod,
mutex_lock(&module_mutex);
/* Drop initial reference. */
module_put(mod);
+ unset_section_ro_nx(mod, mod->module_init);
module_free(mod, mod->module_init);
mod->module_init = NULL;
mod->init_size = 0;


2009-07-11 07:31:23

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v5] RO/NX protection for loadable kernel modules


* Rusty Russell <[email protected]> wrote:

> On Fri, 10 Jul 2009 08:54:03 pm Ingo Molnar wrote:
> > * Siarhei Liakh <[email protected]> wrote:
> > > #define ARCH_SHF_SMALL 0
> > > #endif
> > >
> > > +/* Modules' sections will be aligned on page boundaries
> > > + * to ensure complete separation of code and data, but
> > > + * only when CONFIG_DEBUG_RODATA=y */
> >
> > please use the customary comment style:
> >
> > /*
> > * Comment .....
> > * ...... goes here:
> > */
>
> Please don't. There's one spot in that file which does it, and
> frankly it's a distracting waste of space.

What are you talking about? There's _16_ places in kernel/module.c
that use the above standard comment style.

Furthermore, why do we have _five_ different, inconsisent looking
comment styles mixed into this same file, often mixed within the
_same function_.

It is rather distracting and annoying when code uses non-standard
multi-line comments like:

/* Lay out the SHF_ALLOC sections in a way not dissimilar to how ld
might -- code, read-only data, read-write data, small data. Tally
sizes, and place the offsets into sh_entsize fields: high bit means it
belongs in init. */

Mixed with the standard style:

/*
* Ensure that an exported symbol [global namespace] does not already exist
* in the kernel or in some other module's exported symbol table.
*/

Mixed with a weird mix of the two styles:

/* Generate the signature for all relevant module structures here.
* If these change, we don't want to try to parse the module. */

Mixed with a second weird mix of these styles:

/* Now sew it into the lists so we can get lockdep and oops
* info during argument parsing. Noone should access us, since
* strong_try_module_get() will fail.
* lockdep/oops can run asynchronous, so use the RCU list insertion
* function to insert in a way safe to concurrent readers.
* The mutex protects against concurrent writers.
*/

Mixed with a third weird mix of these styles:

/* Format: modulename size refcount deps address

Where refcount is a number or -, and deps is a comma-separated list
of depends or -.
*/

I fully recognize your right to disagree with fine details in the
standard style rules - they _are_ partly arbitrary - but that's
pretty much the point: predictable looking patterns help us
pinpointing weird looking _structure_ in code.

But that finer argument does not even apply here because you dont
actually use any styles consistently - you use absolutely no
consistent style at all in kernel/module.c! You claim that you have
some different comment style from what others use in the kernel -
yet you dont apply it consistently at all.

It is not helpful at all if you lace your code with extra comment
noise, in five different flavors. _You_ apparently do not even
notice and probably you dont even care, but others like me do.

It is doubly not helpful if you compound this by resisting, opposing
and obstructing the review activities of others who do care.

And it is not helpful square two if you teach new contributors to
not care about clean patches.

> But better is to try to stick with pithy one-line comments!

Sure. My remark was limited to multi-line comments.

> > > + " text size: %lu\n"
> > > + " ro size: %lu\n"
> > > + " total size: %lu\n",
> > > + (unsigned long)base,
> > > + text_size, ro_size, total_size);
> >
> > Please remove all DEBUGP() lines - they uglify the code.
>
> I don't know if anyone ever turns them on any more, but this usage
> is entirely consistent with how they work at the moment: detailing
> the module layout.

Just like you requested multi-line comments to be shortened or
eliminated above (because if used incorrectly they distract from
code readability), do i request unused in-source debugging code to
be removed - for exactly the same reasons.

Half of this patch was debug statements that distracted me (and
others) from reviewing the merits of the patch.

> > > + begin_pfn = PFN_DOWN((unsigned long)base);
> > > + end_pfn = PFN_DOWN((unsigned long)base + ro_size);
> >
> > Why is base a void * then cast to unsigned long? Use the more
> > natural type as a parameter to avoid dangerous type-casts.
>
> Because this is how PFN_DOWN works?

You did not read and apparently did not understand my review
suggestion at all. The type of 'base' should be changed to the
natural type: unsigned long.

This has nothing to do with how 'PFN_DOWN works', whatsoever.

> > Please also fix many other instances of this.
>
> Please don't. Fix the credit comment if you want, but I've
> applied the current version for now.

Sigh, and now you apply this incomplete and unclean patch. The thing
is, unpatched upstream kernel/module.c is quite a readability mess
right now, even on the most trivial level and beyond comments:

- Inconsistent function prototypes.

- Mid-string broken up printks.

- Code flow confusion: inconsistent checks for allocation errors
within the same function.

- Huge functions that should be broken up. load_module() is 470
lines long (!).

- Stuff like casting the same variable four times in 2 lines:

static void *module_alloc_update_bounds(unsigned long size)
{
void *ret = module_alloc(size);

if (ret) {
/* Update module bounds. */
if ((unsigned long)ret < module_addr_min)
module_addr_min = (unsigned long)ret;
if ((unsigned long)ret + size > module_addr_max)
module_addr_max = (unsigned long)ret + size;
}
return ret;
}

Can you read that function at a glance? I sure cannot.

And i'm sure there's more - this is what 2 minutes of looking
showed. If we cannot even get the trivial stuff right how can we get
the more complex stuff right?

_Please_ work on making it more readabe before piling more features
on it ...

Thanks,

Ingo

2009-07-11 08:52:04

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH v5] RO/NX protection for loadable kernel modules

On Sat, 11 Jul 2009 03:37:01 pm Rusty Russell wrote:
> On Fri, 10 Jul 2009 08:54:03 pm Ingo Molnar wrote:
> > please use the customary comment style:
>
> Please don't. There's one spot in that file which does it, and frankly
> it's a distracting waste of space.

Sorry, Ingo there's more than one. I don't bother fixing up others' comments
in patches, so the file is now a bit mixed. It'd be better to make it
uniform, in which case I'd change it to kernel-style.

I have a question about this patch though: I think it's unsafe in general to
mark the last partial page as NX (we asked for executable pages, this could
remove executable from some unrelated allocation).

And here's the cleanup patch I applied on top of yours; mainly removing
pointless zero checks (ignore the whitespace cleanups).

Thanks,
Rusty.

diff --git a/kernel/module.c b/kernel/module.c
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1518,55 +1518,30 @@ static void set_section_ro_nx(void *base
unsigned long end_pfn;

/* Initially, all module sections have RWX permissions*/
-
DEBUGP("PROTECTING MODULE SECTION: 0x%lx\n"
" text size: %lu\n"
" ro size: %lu\n"
" total size: %lu\n",
(unsigned long)base,
- text_size, ro_size, total_size);
+ text_size, ro_size, total_size);

/* Set RO for module text and RO-data*/
- if (ro_size > 0) {
- /* Always protect first page.
- * Do not protect last partial page */
- begin_pfn = PFN_DOWN((unsigned long)base);
- end_pfn = PFN_DOWN((unsigned long)base + ro_size);
+ /* Always protect first page. Do not protect last partial page */
+ begin_pfn = PFN_DOWN((unsigned long)base);
+ end_pfn = PFN_DOWN((unsigned long)base + ro_size);

- /*Set text RO if there are still pages between begin and end*/
- if (end_pfn > begin_pfn) {
- DEBUGP(" RO: 0x%lx %lu\n",
- begin_pfn << PAGE_SHIFT,
- end_pfn - begin_pfn);
- set_memory_ro(begin_pfn << PAGE_SHIFT,
- end_pfn - begin_pfn);
- } else {
- DEBUGP(" RO: less than a page, not enforcing.\n");
- }
- } else {
- DEBUGP(" RO: section not present.\n");
- }
+ DEBUGP(" RO: 0x%lx %lu\n",
+ begin_pfn << PAGE_SHIFT, end_pfn - begin_pfn);
+ set_memory_ro(begin_pfn << PAGE_SHIFT, end_pfn - begin_pfn);

/* Set NX permissions for module data */
- if (total_size > text_size) {
- /* Do not protect first partial page
- * Always protect last page. */
- begin_pfn = PFN_UP((unsigned long)base + text_size);
- end_pfn = PFN_UP((unsigned long)base + total_size);
+ /* Do not protect first partial page. Always protect last page. */
+ begin_pfn = PFN_UP((unsigned long)base + text_size);
+ end_pfn = PFN_UP((unsigned long)base + total_size);

- /*Set data NX if there are still pages between begin and end*/
- if (end_pfn > begin_pfn) {
- DEBUGP(" NX: 0x%lx %lu\n",
- begin_pfn << PAGE_SHIFT,
- end_pfn - begin_pfn);
- set_memory_nx(begin_pfn << PAGE_SHIFT,
- end_pfn - begin_pfn);
- } else {
- DEBUGP(" NX: less than a page, not enforcing.\n");
- }
- } else {
- DEBUGP(" NX: section not present.\n");
- }
+ DEBUGP(" NX: 0x%lx %lu\n",
+ begin_pfn << PAGE_SHIFT, end_pfn - begin_pfn);
+ set_memory_nx(begin_pfn << PAGE_SHIFT, end_pfn - begin_pfn);
}

/* Setting memory back to RW+NX before releasing it */


2009-07-11 11:23:17

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH v5] RO/NX protection for loadable kernel modules

On Sat, 11 Jul 2009 05:00:37 pm Ingo Molnar wrote:
> Sigh, and now you apply this incomplete and unclean patch. The thing
> is, unpatched upstream kernel/module.c is quite a readability mess
> right now, even on the most trivial level and beyond comments:

Hi Ingo!

First, thanks for reading this code. Always useful to annoy you into
reading my stuff :)

TBH most of those things don't worry me enough to reject patches. If there
were other problems, I might ask someone to fix those up too, but I try not to
post feedback like the one you gave.

> - Code flow confusion: inconsistent checks for allocation errors
> within the same function.

Which one were you thinking?

> - Huge functions that should be broken up. load_module() is 470
> lines long (!).

It is, but unfortunately there's not a clear point to break it. Pulling
unrelated ops into a separate function just to reduce function size is worse
than the disease (see init/main.c's do_basic_setup() for an example, though
that file is not as bad as I remember).

> if (ret) {
> /* Update module bounds. */
> if ((unsigned long)ret < module_addr_min)
> module_addr_min = (unsigned long)ret;
> if ((unsigned long)ret + size > module_addr_max)
> module_addr_max = (unsigned long)ret + size;
> }
> return ret;
> }
>
> Can you read that function at a glance? I sure cannot.

Yep, for me this is ugly but clear.

I prefer accessible addresses to be held in void *; the module code is a bit
borderline, but originally void * was less casts than unsigned long (that may
well have changed in the last few years tho).

> And i'm sure there's more - this is what 2 minutes of looking
> showed. If we cannot even get the trivial stuff right how can we get
> the more complex stuff right?
>
> _Please_ work on making it more readabe before piling more features
> on it ...

I'm not happy with the module.c code, but not for these reasons. It just does
lots of little, bitsy things to load and set up a module, many of which are
arch-specific hooks, and/or config conditional.

eg. I'd like to split load_module() where it says "Module has moved": this
would be clean, because before that point "mod" is pointing to the temporary
version. But trying it quickly reveals it to be worse than the current code.

Small cleanups are possible, but they're not the ones which would make this
code really straightforward.

Cheers,
Rusty.

2009-07-11 15:48:56

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH v5] RO/NX protection for loadable kernel modules

On Sat, 11 Jul 2009 18:21:46 +0930
Rusty Russell <[email protected]> wrote:

> On Sat, 11 Jul 2009 03:37:01 pm Rusty Russell wrote:
> > On Fri, 10 Jul 2009 08:54:03 pm Ingo Molnar wrote:
> > > please use the customary comment style:
> >
> > Please don't. There's one spot in that file which does it, and
> > frankly it's a distracting waste of space.
>
> Sorry, Ingo there's more than one. I don't bother fixing up others'
> comments in patches, so the file is now a bit mixed. It'd be better
> to make it uniform, in which case I'd change it to kernel-style.
>
> I have a question about this patch though: I think it's unsafe in
> general to mark the last partial page as NX (we asked for executable
> pages, this could remove executable from some unrelated allocation).
>

we vmalloc / g_f_p modules right? so we don't share the last page.


--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-07-12 04:40:55

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH v5] RO/NX protection for loadable kernel modules

On Sun, 12 Jul 2009 01:19:58 am Arjan van de Ven wrote:
> Rusty Russell <[email protected]> wrote:
> > I have a question about this patch though: I think it's unsafe in
> > general to mark the last partial page as NX (we asked for executable
> > pages, this could remove executable from some unrelated allocation).
>
> we vmalloc / g_f_p modules right? so we don't share the last page.

Historically yes, but I don't think we should be counting on it. It makes
sense to kmalloc for small modules, and it's arch specific code.

OTOH, a quick grep shows currently only cris does kmalloc, and that's a config
option.

It might be time to unify this code. If we rename MODULE_START to
MODULE_VADDR on MIPS, then ignoring CRIS there's only two real variants;
vmalloc and __vmalloc.

(I like the idea of trying kmalloc and falling back, simply because it reduces
TLB pressure, but that's probably best done after unification).

Thoughts?
Rusty.

2009-07-12 04:51:09

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH v5] RO/NX protection for loadable kernel modules

Rusty Russell wrote:
>
> OTOH, a quick grep shows currently only cris does kmalloc, and that's a config
> option.
>
> It might be time to unify this code. If we rename MODULE_START to
> MODULE_VADDR on MIPS, then ignoring CRIS there's only two real variants;
> vmalloc and __vmalloc.
>
> (I like the idea of trying kmalloc and falling back, simply because it reduces
> TLB pressure, but that's probably best done after unification).
>

Instead of plain kmalloc we could also have separate R-- and R-X slabs
(the normal kmalloc slabs already being RW-).

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2009-07-12 07:44:21

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH v5] RO/NX protection for loadable kernel modules

On Sun, 12 Jul 2009 14:10:39 +0930
Rusty Russell <[email protected]> wrote:

> On Sun, 12 Jul 2009 01:19:58 am Arjan van de Ven wrote:
> > Rusty Russell <[email protected]> wrote:
> > > I have a question about this patch though: I think it's unsafe in
> > > general to mark the last partial page as NX (we asked for
> > > executable pages, this could remove executable from some
> > > unrelated allocation).
> >
> > we vmalloc / g_f_p modules right? so we don't share the last page.
>
> Historically yes, but I don't think we should be counting on it. It
> makes sense to kmalloc for small modules, and it's arch specific code.
>
> OTOH, a quick grep shows currently only cris does kmalloc, and that's
> a config option.
>
> It might be time to unify this code. If we rename MODULE_START to
> MODULE_VADDR on MIPS, then ignoring CRIS there's only two real
> variants; vmalloc and __vmalloc.
>
> (I like the idea of trying kmalloc and falling back, simply because
> it reduces TLB pressure, but that's probably best done after
> unification).
>

or using a non-power-of-two get_free_pages() thing...

some architectures will need to know that memory needs to be executable
at allocation time so that it can be put in an executable address range
etc...

2009-07-12 09:24:50

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v5] RO/NX protection for loadable kernel modules

Rusty Russell <[email protected]> writes:
>
> (I like the idea of trying kmalloc and falling back, simply because it reduces
> TLB pressure,

I implemented this for 32bit in 2.4, but I always had second thoughts
if that was really reducing TLB pressure.

x86 CPUs have separated TLBs for 2MB and 4K and they all have much
more 4K entries. So it might actually be worse to use the 2MB TLBs for
this.

> but that's probably best done after unification).

Trying kmalloc doesn't work on x86-64

-Andi

--
[email protected] -- Speaking for myself only.

2009-07-12 09:25:55

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v5] RO/NX protection for loadable kernel modules

Arjan van de Ven <[email protected]> writes:
>
> or using a non-power-of-two get_free_pages() thing...

alloc_pages_exact(). Was developed for exactly this purpose originally.

But it doesn't work on x86-64 again for modules.

-Andi
--
[email protected] -- Speaking for myself only.

2009-07-12 09:58:45

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH v5] RO/NX protection for loadable kernel modules

On Sun, 12 Jul 2009 05:15:24 pm Arjan van de Ven wrote:
> Rusty Russell <[email protected]> wrote:
> > (I like the idea of trying kmalloc and falling back, simply because
> > it reduces TLB pressure, but that's probably best done after
> > unification).
>
> or using a non-power-of-two get_free_pages() thing...
>
> some architectures will need to know that memory needs to be executable
> at allocation time so that it can be put in an executable address range
> etc...

Yes, maybe that's better than kmalloc. On my laptop I have 105 modules
loaded, with 3778464 total length: I'm wasting 206944 bytes on unused tails of
pages. But that's only 0.06% of my memory.

Thanks,
Rusty.

2009-07-12 15:31:20

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH v5] RO/NX protection for loadable kernel modules

On Sun, 12 Jul 2009 19:28:27 +0930
Rusty Russell <[email protected]> wrote:

> On Sun, 12 Jul 2009 05:15:24 pm Arjan van de Ven wrote:
> > Rusty Russell <[email protected]> wrote:
> > > (I like the idea of trying kmalloc and falling back, simply
> > > because it reduces TLB pressure, but that's probably best done
> > > after unification).
> >
> > or using a non-power-of-two get_free_pages() thing...
> >
> > some architectures will need to know that memory needs to be
> > executable at allocation time so that it can be put in an
> > executable address range etc...
>
> Yes, maybe that's better than kmalloc. On my laptop I have 105
> modules loaded, with 3778464 total length: I'm wasting 206944 bytes
> on unused tails of pages. But that's only 0.06% of my memory.
>

105 is also a sign that you picked a somewhat suboptimal config...
that's of course your choice but it's a choice that has a small price,
if you don't want to pay that price, changing the config to not be
entirely insane is a good answer as well ;-)



--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-07-12 17:34:21

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH v5] RO/NX protection for loadable kernel modules

On Sun, Jul 12, 2009 at 08:32:27AM -0700, Arjan van de Ven wrote:
> On Sun, 12 Jul 2009 19:28:27 +0930
> Rusty Russell <[email protected]> wrote:
>
> > On Sun, 12 Jul 2009 05:15:24 pm Arjan van de Ven wrote:
> > > Rusty Russell <[email protected]> wrote:
> > > > (I like the idea of trying kmalloc and falling back, simply
> > > > because it reduces TLB pressure, but that's probably best done
> > > > after unification).
> > >
> > > or using a non-power-of-two get_free_pages() thing...
> > >
> > > some architectures will need to know that memory needs to be
> > > executable at allocation time so that it can be put in an
> > > executable address range etc...
> >
> > Yes, maybe that's better than kmalloc. On my laptop I have 105
> > modules loaded, with 3778464 total length: I'm wasting 206944 bytes
> > on unused tails of pages. But that's only 0.06% of my memory.
> >
>
> 105 is also a sign that you picked a somewhat suboptimal config...
> that's of course your choice but it's a choice that has a small price,
> if you don't want to pay that price, changing the config to not be
> entirely insane is a good answer as well ;-)

But this is the "common" case in the world of Linux where the distros
are forced to build everything as modules. So it should be considered
as a very valid case, and not merely dismissed out of hand.

thanks,

greg k-h

2009-07-12 21:56:54

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH v5] RO/NX protection for loadable kernel modules

On Sun, 12 Jul 2009 10:33:29 -0700
Greg KH <[email protected]> wrote:

> >
> > 105 is also a sign that you picked a somewhat suboptimal config...
> > that's of course your choice but it's a choice that has a small
> > price, if you don't want to pay that price, changing the config to
> > not be entirely insane is a good answer as well ;-)
>
> But this is the "common" case in the world of Linux where the distros
> are forced to build everything as modules.

who's holding the stick?
Really.

I've seen some of these case, where the distro kernel has something as
a module, but the other parts of the distro the unconditionally load
that module always. That makes no sense.
If you have 105 different real devices in your system, that have
different drivers, sure, I'll buy 105. Somehow I doubt Rusty's box has
105 ;)



--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-07-12 22:15:23

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH v5] RO/NX protection for loadable kernel modules

On Sun, Jul 12, 2009 at 02:58:04PM -0700, Arjan van de Ven wrote:
> On Sun, 12 Jul 2009 10:33:29 -0700
> Greg KH <[email protected]> wrote:
>
> > >
> > > 105 is also a sign that you picked a somewhat suboptimal config...
> > > that's of course your choice but it's a choice that has a small
> > > price, if you don't want to pay that price, changing the config to
> > > not be entirely insane is a good answer as well ;-)
> >
> > But this is the "common" case in the world of Linux where the distros
> > are forced to build everything as modules.
>
> who's holding the stick?
> Really.

Stick?

> I've seen some of these case, where the distro kernel has something as
> a module, but the other parts of the distro the unconditionally load
> that module always. That makes no sense.
> If you have 105 different real devices in your system, that have
> different drivers, sure, I'll buy 105. Somehow I doubt Rusty's box has
> 105 ;)

Sure, and I agree that some distros do modularize stuff too much, and am
personally changing this on a distro that I can change. But even then,
I expect a "normal" system will end up with about 20-50 different
modules, due to a variety of good reasons (legacy system and ability to
handle wierd configurations better being both of them.)

thanks,

greg k-h

2009-07-12 23:22:15

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH v5] RO/NX protection for loadable kernel modules

On Mon, 13 Jul 2009 01:02:27 am Arjan van de Ven wrote:
> Rusty Russell <[email protected]> wrote:
> > Yes, maybe that's better than kmalloc. On my laptop I have 105
> > modules loaded, with 3778464 total length: I'm wasting 206944 bytes
> > on unused tails of pages. But that's only 0.06% of my memory.
>
> 105 is also a sign that you picked a somewhat suboptimal config...
> that's of course your choice but it's a choice that has a small price,
> if you don't want to pay that price, changing the config to not be
> entirely insane is a good answer as well ;-)

To be clear: I run distro kernels on my laptop (Ubuntu in this case). I think
this is what we should be optimizing for, or we should offer the distros
something better than modules.

Cheers,
Rusty.



2009-07-13 03:09:59

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH v5] RO/NX protection for loadable kernel modules

On Mon, 13 Jul 2009 08:51:55 +0930
Rusty Russell <[email protected]> wrote:

> On Mon, 13 Jul 2009 01:02:27 am Arjan van de Ven wrote:
> > Rusty Russell <[email protected]> wrote:
> > > Yes, maybe that's better than kmalloc. On my laptop I have 105
> > > modules loaded, with 3778464 total length: I'm wasting 206944
> > > bytes on unused tails of pages. But that's only 0.06% of my
> > > memory.
> >
> > 105 is also a sign that you picked a somewhat suboptimal config...
> > that's of course your choice but it's a choice that has a small
> > price, if you don't want to pay that price, changing the config to
> > not be entirely insane is a good answer as well ;-)
>
> To be clear: I run distro kernels on my laptop (Ubuntu in this
> case). I think this is what we should be optimizing for, or we
> should offer the distros something better than modules.

working for a distro myself... yes I like modules for drivers.
But I also know the thinking about modules in distros is changing a bit.
It used to be "everything must be a module", but that thinking is
changing to be a bit more balanced. Things that would be loaded always
are being built in now more and more; Greg is doing that for SuSE, Dave
has been doing that for Fedora, I've been doing it for Moblin and I
think Ubuntu is also going in that direction as well.

Modules have some overhead in various places (both load and runtime)
that justifies applying ones brain when making choices for distribution
kernels.

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-07-13 09:02:24

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v5] RO/NX protection for loadable kernel modules

Arjan van de Ven <[email protected]> writes:
> I've seen some of these case, where the distro kernel has something as
> a module, but the other parts of the distro the unconditionally load
> that module always. That makes no sense.

One good reason for this is that if something goes wrong with
the module you can still remove/blacklist the module. This can
be very useful in distro deployment, where telling users
"please set flag xyz" is much easier than asking them to get
a special kernel build. It also helps debugging when you're
trying to narrow down where a problem is.

But you can't do that with built-in drivers.

One way to avoid this would be to have a standard way to turn off
drivers/subsystems that are built in on the command line. Right
now that's difficult because the linked kernel doesn't even know
the driver names anymore.

Perhaps we should keep the module names/metadata even in static
kernel? (and make CONFIG_MODULE on the subsystem level disappear?).

IMHO that would be a great cleanup anyways, avoiding one special
case in the driver build testing.

It would be also nice if you could cat some file in sys and it gave
you the module descriptions for example.

-Andi
--
[email protected] -- Speaking for myself only.

2009-07-13 11:02:11

by Jesper Nilsson

[permalink] [raw]
Subject: Re: [PATCH v5] RO/NX protection for loadable kernel modules

On Sun, Jul 12, 2009 at 06:40:39AM +0200, Rusty Russell wrote:
> On Sun, 12 Jul 2009 01:19:58 am Arjan van de Ven wrote:
> > Rusty Russell <[email protected]> wrote:
> > > I have a question about this patch though: I think it's unsafe in
> > > general to mark the last partial page as NX (we asked for executable
> > > pages, this could remove executable from some unrelated allocation).
> >
> > we vmalloc / g_f_p modules right? so we don't share the last page.
>
> Historically yes, but I don't think we should be counting on it. It makes
> sense to kmalloc for small modules, and it's arch specific code.
>
> OTOH, a quick grep shows currently only cris does kmalloc, and that's a config
> option.

...which is a not often used debug option.

> It might be time to unify this code. If we rename MODULE_START to
> MODULE_VADDR on MIPS, then ignoring CRIS there's only two real variants;
> vmalloc and __vmalloc.
>
> (I like the idea of trying kmalloc and falling back, simply because it reduces
> TLB pressure, but that's probably best done after unification).
>
> Thoughts?
> Rusty.

/^JN - Jesper Nilsson
--
Jesper Nilsson -- [email protected]

2009-07-13 16:59:08

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH v5] RO/NX protection for loadable kernel modules


> > (I like the idea of trying kmalloc and falling back, simply because it reduces
> > TLB pressure,
>
> I implemented this for 32bit in 2.4, but I always had second thoughts
> if that was really reducing TLB pressure.

Certainly for non-x86 it can be very worthwhile. A long time ago I
worked on an embedded product that used PowerPC 440, which has only 64
(software-loaded) TLB entries. On PPC 440, Linux has a pinned TLB entry
for the kernel mapping, and modifying how the module loader allocated
space to load modules into that mapping vs. one that had dynamic TLB
entries was worth a factor of 2 in performance -- ie the TLB miss
handling for .text was literally taking half the CPU time of the module
code!

- R.