2002-11-14 22:30:09

by Richard Henderson

[permalink] [raw]
Subject: in-kernel linking issues

So you said you had a userland test harness?

Some problems I've seen browsing the code:

(1) You make no provision for sections to be loaded in any order
except the order they appear in the object file. This is bad.
You *really* need to replicate something akin to obj_load_order_prio
from the 2.4 modutils, lest small data sections be placed incorrectly
wrt the GOT section.

(2) I see no provision for small COMMON symbols to be placed in the
.sbss section rather than in the .bss section. Unless you are
sorting your allocation of COMMON symbols by size, which I also
don't see, this can result in link errors due to SCOMMON symbols
not being reachable from the GP.

These will affect at least Alpha, IA-64, and MIPS.

(3) Alpha and MIPS64 absolutely require that the core and init allocations
are "close" (within 2GB). I don't see how this can be guaranteed with
two different vmalloc calls.

Allocating the two together would also allow us to only have
one flush_icache_range call. Not that I consider module
loading particularly performance critical, but it'd be nice.


That's all I can think of at the moment.


r~


2002-11-15 04:06:30

by Andi Kleen

[permalink] [raw]
Subject: Re: in-kernel linking issues

Richard Henderson <[email protected]> writes:

> These will affect at least Alpha, IA-64, and MIPS.
>
> (3) Alpha and MIPS64 absolutely require that the core and init allocations
> are "close" (within 2GB). I don't see how this can be guaranteed with
> two different vmalloc calls.

In x86-64 (and I think sparc64) the modules (both data and code) also need
to be within 2GB of the main kernel code. This is done to avoid needing
a GOT for calls between main kernel and modules. In the old module code that
is done with a custom module_map() function. I have not looked yet on how
that could be implemented in the new code.

-Andi


2002-11-15 04:14:46

by Richard Henderson

[permalink] [raw]
Subject: Re: in-kernel linking issues

On Fri, Nov 15, 2002 at 05:13:23AM +0100, Andi Kleen wrote:
> Richard Henderson <[email protected]> writes:
> > (3) Alpha and MIPS64 absolutely require that the core and init allocations
> > are "close" (within 2GB). I don't see how this can be guaranteed with
> > two different vmalloc calls.
>
> In x86-64 (and I think sparc64) the modules (both data and code) also need
> to be within 2GB of the main kernel code. This is done to avoid needing
> a GOT for calls between main kernel and modules. In the old module code that
> is done with a custom module_map() function. I have not looked yet on how
> that could be implemented in the new code.

Hmm. I guess that can be done with the two allocation hooks,
which could allocate from a special pool (as is done with the
module_map function at present). And, as far as that goes,
could apply to Alpha and MIPS as well, if the same special
allocation is done.

Consider this point refuted, Rusty.


r~

2002-11-15 08:41:10

by Rusty Russell

[permalink] [raw]
Subject: Re: in-kernel linking issues

In message <[email protected]> you write:
> Richard Henderson <[email protected]> writes:
>
> > These will affect at least Alpha, IA-64, and MIPS.
> >
> > (3) Alpha and MIPS64 absolutely require that the core and init allocations
> > are "close" (within 2GB). I don't see how this can be guaranteed with
> > two different vmalloc calls.
>
> In x86-64 (and I think sparc64) the modules (both data and code) also need
> to be within 2GB of the main kernel code. This is done to avoid needing
> a GOT for calls between main kernel and modules. In the old module code that
> is done with a custom module_map() function. I have not looked yet on how
> that could be implemented in the new code.

Ack, PPC64 same issue. Several solutions:

1) Put everything in module_core and nothing in module_init. Sure,
you lose the discard-init stuff, but it's designed to work.

2) Use a magic allocator a-la Sparc64.

3) Best-effort: allocate both at alloc-core time (store init ptr in
mod_arch_specific) and if they're too far apart, throw that away
and fall back to one big alloc.

4) Implement jump stubs between the two sections.

See my kernel.org page for the PPC64 and ia64 implementations (they
were implemented and tested in userspace, and never actually compiled
as kernel code, so the linking code should be correct but there may be
typos and kernel-related ommissions, etc).

This is why I made sure I had half a dozen archs under my belt, I
still screwed up sparc64 (which needs __this_module to be withing
32-bits, too).

BTW, arch interface tweaked to fix sparc64 problem. Linus hasn't
taken it yet, but it's below (with predecessor patch to alter includes
to avoid pulling in elf.h everywhere which breaks xfs).

Cheers,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

Name: moduleloader.h
Author: Anders Gustafsson
Status: Experimental
Depends: Module/module-i386.patch.gz

D: Separates the module loading function prototypes (and elf.h) into
D: moduleloader.h. AT_GID in elf.h clashes with xfs.h, but this also
D: makes module.h less cluttered.

diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5-bk/arch/i386/kernel/module.c working-2.5-bk-noelf/arch/i386/kernel/module.c
--- linux-2.5-bk/arch/i386/kernel/module.c 2002-11-14 15:08:19.000000000 +1100
+++ working-2.5-bk-noelf/arch/i386/kernel/module.c 2002-11-14 16:30:13.000000000 +1100
@@ -15,7 +15,7 @@
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
-#include <linux/module.h>
+#include <linux/moduleloader.h>
#include <linux/elf.h>
#include <linux/vmalloc.h>
#include <linux/fs.h>
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5-bk/arch/sparc/kernel/module.c working-2.5-bk-noelf/arch/sparc/kernel/module.c
--- linux-2.5-bk/arch/sparc/kernel/module.c 2002-11-14 15:08:20.000000000 +1100
+++ working-2.5-bk-noelf/arch/sparc/kernel/module.c 2002-11-14 16:30:55.000000000 +1100
@@ -4,7 +4,7 @@
* Copyright (C) 2002 David S. Miller.
*/

-#include <linux/module.h>
+#include <linux/moduleloader.h>
#include <linux/kernel.h>
#include <linux/elf.h>
#include <linux/vmalloc.h>
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5-bk/arch/sparc64/kernel/module.c working-2.5-bk-noelf/arch/sparc64/kernel/module.c
--- linux-2.5-bk/arch/sparc64/kernel/module.c 2002-11-14 15:08:21.000000000 +1100
+++ working-2.5-bk-noelf/arch/sparc64/kernel/module.c 2002-11-14 16:31:02.000000000 +1100
@@ -4,7 +4,7 @@
* Copyright (C) 2002 David S. Miller.
*/

-#include <linux/module.h>
+#include <linux/moduleloader.h>
#include <linux/kernel.h>
#include <linux/elf.h>
#include <linux/vmalloc.h>
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5-bk/include/linux/module.h working-2.5-bk-noelf/include/linux/module.h
--- linux-2.5-bk/include/linux/module.h 2002-11-14 15:08:25.000000000 +1100
+++ working-2.5-bk-noelf/include/linux/module.h 2002-11-14 16:28:51.000000000 +1100
@@ -10,7 +10,6 @@
#include <linux/sched.h>
#include <linux/spinlock.h>
#include <linux/list.h>
-#include <linux/elf.h>
#include <linux/stat.h>
#include <linux/compiler.h>
#include <linux/cache.h>
@@ -143,53 +142,6 @@ struct module
char args[0];
};

-/* Helper function for arch-specific module loaders */
-unsigned long find_symbol_internal(Elf_Shdr *sechdrs,
- unsigned int symindex,
- const char *strtab,
- const char *name,
- struct module *mod,
- struct kernel_symbol_group **group);
-
-/* These must be implemented by the specific architecture */
-
-/* vmalloc AND zero for the non-releasable code; return ERR_PTR() on error. */
-void *module_core_alloc(const Elf_Ehdr *hdr,
- const Elf_Shdr *sechdrs,
- const char *secstrings,
- struct module *mod);
-
-/* vmalloc and zero (if any) for sections to be freed after init.
- Return ERR_PTR() on error. */
-void *module_init_alloc(const Elf_Ehdr *hdr,
- const Elf_Shdr *sechdrs,
- const char *secstrings,
- struct module *mod);
-
-/* Apply the given relocation to the (simplified) ELF. Return -error
- or 0. */
-int apply_relocate(Elf_Shdr *sechdrs,
- const char *strtab,
- unsigned int symindex,
- unsigned int relsec,
- struct module *mod);
-
-/* Apply the given add relocation to the (simplified) ELF. Return
- -error or 0 */
-int apply_relocate_add(Elf_Shdr *sechdrs,
- const char *strtab,
- unsigned int symindex,
- unsigned int relsec,
- struct module *mod);
-
-/* Any final processing of module before access. Return -error or 0. */
-int module_finalize(const Elf_Ehdr *hdr,
- const Elf_Shdr *sechdrs,
- struct module *mod);
-
-/* Free memory returned from module_core_alloc/module_init_alloc */
-void module_free(struct module *mod, void *module_region);
-
#ifdef CONFIG_MODULE_UNLOAD

void __symbol_put(const char *symbol);
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5-bk/include/linux/moduleloader.h working-2.5-bk-noelf/include/linux/moduleloader.h
--- linux-2.5-bk/include/linux/moduleloader.h 1970-01-01 10:00:00.000000000 +1000
+++ working-2.5-bk-noelf/include/linux/moduleloader.h 2002-11-14 16:29:33.000000000 +1100
@@ -0,0 +1,55 @@
+#ifndef _LINUX_MODULELOADER_H
+#define _LINUX_MODULELOADER_H
+/* The stuff needed for archs to support modules. */
+
+#include <linux/module.h>
+#include <linux/elf.h>
+
+/* Helper function for arch-specific module loaders */
+unsigned long find_symbol_internal(Elf_Shdr *sechdrs,
+ unsigned int symindex,
+ const char *strtab,
+ const char *name,
+ struct module *mod,
+ struct kernel_symbol_group **group);
+
+/* These must be implemented by the specific architecture */
+
+/* vmalloc AND zero for the non-releasable code; return ERR_PTR() on error. */
+void *module_core_alloc(const Elf_Ehdr *hdr,
+ const Elf_Shdr *sechdrs,
+ const char *secstrings,
+ struct module *mod);
+
+/* vmalloc and zero (if any) for sections to be freed after init.
+ Return ERR_PTR() on error. */
+void *module_init_alloc(const Elf_Ehdr *hdr,
+ const Elf_Shdr *sechdrs,
+ const char *secstrings,
+ struct module *mod);
+
+/* Free memory returned from module_core_alloc/module_init_alloc */
+void module_free(struct module *mod, void *module_region);
+
+/* Apply the given relocation to the (simplified) ELF. Return -error
+ or 0. */
+int apply_relocate(Elf_Shdr *sechdrs,
+ const char *strtab,
+ unsigned int symindex,
+ unsigned int relsec,
+ struct module *mod);
+
+/* Apply the given add relocation to the (simplified) ELF. Return
+ -error or 0 */
+int apply_relocate_add(Elf_Shdr *sechdrs,
+ const char *strtab,
+ unsigned int symindex,
+ unsigned int relsec,
+ struct module *mod);
+
+/* Any final processing of module before access. Return -error or 0. */
+int module_finalize(const Elf_Ehdr *hdr,
+ const Elf_Shdr *sechdrs,
+ struct module *mod);
+
+#endif
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5-bk/kernel/module.c working-2.5-bk-noelf/kernel/module.c
--- linux-2.5-bk/kernel/module.c 2002-11-14 15:08:25.000000000 +1100
+++ working-2.5-bk-noelf/kernel/module.c 2002-11-14 16:28:04.000000000 +1100
@@ -17,6 +17,7 @@
*/
#include <linux/config.h>
#include <linux/module.h>
+#include <linux/moduleloader.h>
#include <linux/init.h>
#include <linux/slab.h>
#include <linux/vmalloc.h>
Name: Allocate struct module using special allocator
Author: Rusty Russell
Status: Experimental
Depends: Module/moduleloader_h.patch.gz

D: Sparc64 (and probably others) need all the kernel symbols within
D: 32-bits, which includes the manufactured "__this_module" which refers
D: to the struct module *.
D:
D: This changes the interface back to its old style: the arch-specific code
D: manipulates the init and core sizes, and we call module_alloc() ourselves.

diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal working-2.5-bk-moduleloader_h/arch/i386/kernel/module.c working-2.5-bk-sparcfixes/arch/i386/kernel/module.c
--- working-2.5-bk-moduleloader_h/arch/i386/kernel/module.c 2002-11-14 20:47:30.000000000 +1100
+++ working-2.5-bk-sparcfixes/arch/i386/kernel/module.c 2002-11-14 20:47:50.000000000 +1100
@@ -28,22 +28,15 @@
#define DEBUGP(fmt , ...)
#endif

-static void *alloc_and_zero(unsigned long size)
+void *module_alloc(unsigned long size)
{
- void *ret;
-
- /* We handle the zero case fine, unlike vmalloc */
if (size == 0)
return NULL;
-
- ret = vmalloc(size);
- if (!ret) ret = ERR_PTR(-ENOMEM);
- else memset(ret, 0, size);
-
- return ret;
+ return vmalloc(size);
}

-/* Free memory returned from module_core_alloc/module_init_alloc */
+
+/* Free memory returned from module_alloc */
void module_free(struct module *mod, void *module_region)
{
vfree(module_region);
@@ -51,20 +44,21 @@ void module_free(struct module *mod, voi
table entries. */
}

-void *module_core_alloc(const Elf32_Ehdr *hdr,
- const Elf32_Shdr *sechdrs,
- const char *secstrings,
- struct module *module)
+/* We don't need anything special. */
+long module_core_size(const Elf32_Ehdr *hdr,
+ const Elf32_Shdr *sechdrs,
+ const char *secstrings,
+ struct module *module)
{
- return alloc_and_zero(module->core_size);
+ return module->core_size;
}

-void *module_init_alloc(const Elf32_Ehdr *hdr,
- const Elf32_Shdr *sechdrs,
- const char *secstrings,
- struct module *module)
+long module_init_size(const Elf32_Ehdr *hdr,
+ const Elf32_Shdr *sechdrs,
+ const char *secstrings,
+ struct module *module)
{
- return alloc_and_zero(module->init_size);
+ return module->init_size;
}

int apply_relocate(Elf32_Shdr *sechdrs,
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal working-2.5-bk-moduleloader_h/include/linux/moduleloader.h working-2.5-bk-sparcfixes/include/linux/moduleloader.h
--- working-2.5-bk-moduleloader_h/include/linux/moduleloader.h 2002-11-14 20:47:30.000000000 +1100
+++ working-2.5-bk-sparcfixes/include/linux/moduleloader.h 2002-11-14 20:47:50.000000000 +1100
@@ -15,20 +15,26 @@ unsigned long find_symbol_internal(Elf_S

/* These must be implemented by the specific architecture */

-/* vmalloc AND zero for the non-releasable code; return ERR_PTR() on error. */
-void *module_core_alloc(const Elf_Ehdr *hdr,
- const Elf_Shdr *sechdrs,
- const char *secstrings,
- struct module *mod);
+/* Total size to allocate for the non-releasable code; return len or
+ -error. mod->core_size is the current generic tally. */
+long module_core_size(const Elf_Ehdr *hdr,
+ const Elf_Shdr *sechdrs,
+ const char *secstrings,
+ struct module *mod);

-/* vmalloc and zero (if any) for sections to be freed after init.
- Return ERR_PTR() on error. */
-void *module_init_alloc(const Elf_Ehdr *hdr,
- const Elf_Shdr *sechdrs,
- const char *secstrings,
- struct module *mod);
+/* Total size of (if any) sections to be freed after init. Return 0
+ for none, len, or -error. mod->init_size is the current generic
+ tally. */
+long module_init_size(const Elf_Ehdr *hdr,
+ const Elf_Shdr *sechdrs,
+ const char *secstrings,
+ struct module *mod);

-/* Free memory returned from module_core_alloc/module_init_alloc */
+/* Allocator used for allocating struct module, core sections and init
+ sections. Returns NULL on failure. */
+void *module_alloc(unsigned long size);
+
+/* Free memory returned from module_alloc. */
void module_free(struct module *mod, void *module_region);

/* Apply the given relocation to the (simplified) ELF. Return -error
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal working-2.5-bk-moduleloader_h/kernel/module.c working-2.5-bk-sparcfixes/kernel/module.c
--- working-2.5-bk-moduleloader_h/kernel/module.c 2002-11-14 20:47:30.000000000 +1100
+++ working-2.5-bk-sparcfixes/kernel/module.c 2002-11-14 20:47:50.000000000 +1100
@@ -557,7 +557,7 @@ static void free_module(struct module *m
module_unload_free(mod);

/* Finally, free the module structure */
- kfree(mod);
+ module_free(mod, mod);
}

void *__symbol_get(const char *symbol)
@@ -805,7 +805,7 @@ static struct module *load_module(void *
unsigned long common_length;
struct sizes sizes, used;
struct module *mod;
- int err = 0;
+ long err = 0;
void *ptr = NULL; /* Stops spurious gcc uninitialized warning */

DEBUGP("load_module: umod=%p, len=%lu, uargs=%p\n",
@@ -894,7 +894,7 @@ static struct module *load_module(void *
goto free_hdr;
arglen = err;

- mod = kmalloc(sizeof(*mod) + arglen+1, GFP_KERNEL);
+ mod = module_alloc(sizeof(*mod) + arglen+1);
if (!mod) {
err = -ENOMEM;
goto free_hdr;
@@ -925,20 +925,36 @@ static struct module *load_module(void *
common_length = read_commons(hdr, &sechdrs[symindex]);
sizes.core_size += common_length;

- /* Set these up: arch's can add to them */
+ /* Set these up, and allow archs to manipulate them. */
mod->core_size = sizes.core_size;
mod->init_size = sizes.init_size;

- /* Allocate (this is arch specific) */
- ptr = module_core_alloc(hdr, sechdrs, secstrings, mod);
- if (IS_ERR(ptr))
+ /* Allow archs to add to them. */
+ err = module_init_size(hdr, sechdrs, secstrings, mod);
+ if (err < 0)
+ goto free_mod;
+ mod->init_size = err;
+
+ err = module_core_size(hdr, sechdrs, secstrings, mod);
+ if (err < 0)
goto free_mod;
+ mod->core_size = err;

+ /* Do the allocs. */
+ ptr = module_alloc(mod->core_size);
+ if (!ptr) {
+ err = -ENOMEM;
+ goto free_mod;
+ }
+ memset(ptr, 0, mod->core_size);
mod->module_core = ptr;

- ptr = module_init_alloc(hdr, sechdrs, secstrings, mod);
- if (IS_ERR(ptr))
+ ptr = module_alloc(mod->init_size);
+ if (!ptr) {
+ err = -ENOMEM;
goto free_core;
+ }
+ memset(ptr, 0, mod->init_size);
mod->module_init = ptr;

/* Transfer each section which requires ALLOC, and set sh_offset
@@ -1015,7 +1031,7 @@ static struct module *load_module(void *
free_core:
module_free(mod, mod->module_core);
free_mod:
- kfree(mod);
+ module_free(mod, mod);
free_hdr:
vfree(hdr);
if (err < 0) return ERR_PTR(err);



2002-11-15 10:22:42

by Andi Kleen

[permalink] [raw]
Subject: Re: in-kernel linking issues

> 2) Use a magic allocator a-la Sparc64.

That is what is currently done.

>
> 3) Best-effort: allocate both at alloc-core time (store init ptr in
> mod_arch_specific) and if they're too far apart, throw that away
> and fall back to one big alloc.
>
> 4) Implement jump stubs between the two sections.

Does not work, I need data references in the same range to.
Also frankly I would refuse to cripple my modules just to accomodate the new
code - it should be the other way around.


-Andi

2002-11-15 12:44:55

by Richard Henderson

[permalink] [raw]
Subject: Re: in-kernel linking issues

One more thing:

Are you really REALLY sure you don't want to load ET_DYN or ET_EXEC
files (aka shared libraries or executables) instead of ET_REL files
(aka .o files)?

If you load ET_DYN or ET_EXEC objects, then a lot of the ugliness
wrt linking can be left to the linker, where it belongs. You'd only
need to process the dynamic relocations remaining in the object.
Which would avoid the problems I mentioned earlier today wrt section
layout, and would also avoid the effort to create .got sections and
the like.

The .init bits could be allocated to their own segment via linker
script widgetry. E.g.

PHDRS {
core PT_LOAD;
init PT_LOAD;
rel PT_LOAD;
dyn PT_DYNAMIC;
}

SECTIONS
{
.text : { *(.text) } :core
.rodata : { *(.rodata) *(.rodata.*) } :core
.data : { *(.data) CONSTRUCTORS } :core
.got : { *(.got.plt) *(.got) } :core
.sdata : { *(.sdata) } :core
.sbss : { *(.sbss) *(.scommon) } :core
.bss : { *(.bss) *(.dynbss) *(COMMON) } :core

.init.text ALIGN(PAGE_SIZE) : { *(.init.text) } :init
.init.data : { *(.init.data) } :init

.hash : { *(.hash) } :rel
.dynsym : { *(.dynsym) } :rel
.dynstr : { *(.dynstr) } :rel
.rel.dyn : { *(.rel*) } :rel
.dynamic : { *(.dynamic) } :rel :dyn
}

to be used like so

ld -T z.ld -shared -o z.so z.o

Now we've got things nicely collected into three program headers:

Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x001000 0x00000000 0x00000000 0x00020 0x00024 RWE 0x1000
LOAD 0x002000 0x00001000 0x00001000 0x00014 0x00014 RWE 0x1000
LOAD 0x002014 0x00001014 0x00001014 0x002a4 0x002a4 RW 0x1000
DYNAMIC 0x002240 0x00001240 0x00001240 0x00078 0x00078 RW 0x4

Section to Segment mapping:
Segment Sections...
00 .text .got .bss
01 .init.text .init.data
02 .hash .dynsym .dynstr .rel.dyn .dynamic
03 .dynamic

The first segment contains the core sections, as you've got them now.
The second contains the init sections, which can be freed after running
the module init routine, and the third contains all of the dynamic
linking information, which can be discarded almost immediately.
(Though perhaps it's just as easy to discard it with the .init segment.)

This does reduce the freedom to allocate the init sections completely
separately from the core sections, but that seems a small price to pay
for the extreme reduction in complexity for the in-kernel loader.

Thoughts?


r~

2002-11-15 13:09:58

by Russell King

[permalink] [raw]
Subject: Re: in-kernel linking issues

On Fri, Nov 15, 2002 at 04:51:46AM -0800, Richard Henderson wrote:
> to be used like so
>
> ld -T z.ld -shared -o z.so z.o

I'm slightly worried about this. For things like shared libraries to be
relocatable on ARM on current toolchains, you need to build with -fPIC.
This introduces a measurable overhead to all code execution which our
current model does not. (Namely all symbol references go via the GOT
table.)

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2002-11-15 21:22:48

by Rusty Russell

[permalink] [raw]
Subject: Re: in-kernel linking issues

In message <[email protected]> you write:
> One more thing:
>
> Are you really REALLY sure you don't want to load ET_DYN or ET_EXEC
> files (aka shared libraries or executables) instead of ET_REL files
> (aka .o files)?

AFAICT, that would hurt some archs. Of course, you could say "modules
are meant to be slow" but I don't think that would win you any
friends 8)

I haven't thought about it hard: for some archs it might make perfect
sense, but I'm not sure what more than dropping the hdr->e_type check
would be required.

> This does reduce the freedom to allocate the init sections completely
> separately from the core sections, but that seems a small price to pay
> for the extreme reduction in complexity for the in-kernel loader.

Note: "extreme reduction" is probably overstating. There are only
about 300 lines of linker code in the kernel (x86). The rest of the
1400 lines is refcount manipulation, usage detection, symbol lookup,
system call code, /proc stuff.

But I think you could do this for any arch now by allocating the whole
module in the core alloc, and returning a pointer the the start of the
init section for the second "alloc". The "free" of the init section
then only frees those trailing pages.

I have no problem with the idea, if we can keep the per-arch
flexibility.

Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2002-11-15 22:15:33

by Richard Henderson

[permalink] [raw]
Subject: Re: in-kernel linking issues

On Sat, Nov 16, 2002 at 08:21:32AM +1100, Rusty Russell wrote:
> > Are you really REALLY sure you don't want to load ET_DYN or ET_EXEC
> > files (aka shared libraries or executables) instead of ET_REL files
> > (aka .o files)?
>
> AFAICT, that would hurt some archs. Of course, you could say "modules
> are meant to be slow" but I don't think that would win you any
> friends 8)

Actually, I've yet to come across one that is adversely affected.
Note that we're putting code _not_ compiled with -fpic into this
shared object.

> Note: "extreme reduction" is probably overstating. There are only
> about 300 lines of linker code in the kernel (x86).

Note that x86 is the easiest possible case. You've only got two
relocation types, you don't need to worry about .got, .plt, .opd
allocation, nor sorting sections into a required order, nor
sorting COMMON symbols.


r~

2002-11-15 22:24:47

by Richard Henderson

[permalink] [raw]
Subject: Re: in-kernel linking issues

On Fri, Nov 15, 2002 at 01:16:45PM +0000, Russell King wrote:
> I'm slightly worried about this. For things like shared libraries to be
> relocatable on ARM on current toolchains, you need to build with -fPIC.

Err, no you don't. You only need that if you want to share pages,
which is clearly not an issue with kernel modules. There are no
restrictions of which I am aware that require ARM to build with -fpic.

My test case,

int i;
int foo() { return bar() + i; }
int j __attribute__((section(".init.data")));
int __attribute__((section(".init.text")))
baz() { return i + j; }

works exactly as desired on ARM:

Disassembly of section .text:

00000000 <foo>:
0: e52de004 str lr, [sp, -#4]!
4: ebfffffe bl 4 <foo+0x4>
8: e59f3008 ldr r3, [pc, #8] ; 18 <foo+0x18>
c: e5933000 ldr r3, [r3]
10: e0800003 add r0, r0, r3
14: e49df004 ldr pc, [sp], #4

Disassembly of section .init.text:

00001000 <baz>:
1000: e59f3010 ldr r3, [pc, #16] ; 1018 <baz+0x18>
1004: e5930000 ldr r0, [r3]
1008: e59f300c ldr r3, [pc, #12] ; 101c <baz+0x1c>
100c: e5933000 ldr r3, [r3]
1010: e0800003 add r0, r0, r3
1014: e1a0f00e mov pc, lr

Relocation section '.rel.dyn' at offset 0x11258 contains 4 entries:
Offset Info Type Sym.Value Sym. Name
00000004 00001501 R_ARM_PC24 00000000 bar
00000018 00001202 R_ARM_ABS32 00000040 i
00001018 00001202 R_ARM_ABS32 00000040 i
0000101c 00000f02 R_ARM_ABS32 00001020 j



r~

2002-11-15 22:58:42

by Rusty Russell

[permalink] [raw]
Subject: Re: in-kernel linking issues

In message <[email protected]> you write:
> On Sat, Nov 16, 2002 at 08:21:32AM +1100, Rusty Russell wrote:
> > AFAICT, that would hurt some archs. Of course, you could say "modules
> > are meant to be slow" but I don't think that would win you any
> > friends 8)
>
> Actually, I've yet to come across one that is adversely affected.
> Note that we're putting code _not_ compiled with -fpic into this
> shared object.

Hmm, OK, I'm officially confused: I always connected the two.

> > Note: "extreme reduction" is probably overstating. There are only
> > about 300 lines of linker code in the kernel (x86).
>
> Note that x86 is the easiest possible case.

Of course. And ia64's module.c is about 500 lines (vs 130 for x86).
It's probably the worst case unless Alpha proves to be a complete pig
(note: ia64 might be missing some other stuff, but the linker is
tested).

> You've only got two relocation types, you don't need to worry about
> .got, .plt, .opd allocation, nor sorting sections into a required
> order, nor sorting COMMON symbols.

Hmm, OK, I guess this is where I say "patch welcome"?

Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2002-11-15 23:40:54

by Richard Henderson

[permalink] [raw]
Subject: Re: in-kernel linking issues

On Sat, Nov 16, 2002 at 09:45:21AM +1100, Rusty Russell wrote:
> > Actually, I've yet to come across one that is adversely affected.
> > Note that we're putting code _not_ compiled with -fpic into this
> > shared object.
>
> Hmm, OK, I'm officially confused: I always connected the two.

I encorage this view. Normally bad things happen when this rule is
not followed in userland. But the kernel can bend the rules a bit.

> Of course. And ia64's module.c is about 500 lines (vs 130 for x86).
> It's probably the worst case unless Alpha proves to be a complete pig
> (note: ia64 might be missing some other stuff, but the linker is
> tested).

The ia64 code you have isn't going to be reliable until the
other points I mentioned wrt section and common symbol sorting
are done. What you have will work until there's a large
variable (32k for alpha/mips, 1MB for ia64) in the data area.

> Hmm, OK, I guess this is where I say "patch welcome"?

I guess this is where I say "patch for what"? Do I have some
amount of buy-in for the shared library approach, or do I start
adding lots of code to your .o linker?

I guess I could work up a proof-of-concept patch for the former
and see what people think...


r~

2002-11-16 05:44:39

by Rusty Russell

[permalink] [raw]
Subject: Re: in-kernel linking issues

On Thu, 14 Nov 2002 14:37:01 -0800
Richard Henderson <[email protected]> wrote:

> So you said you had a userland test harness?

Yes, which is fine for testing basic relocs, but misses some subtle issues.
[ Sorry for the delayed reply, I only got this mail via kernel.org: did you
get a bounce from [email protected]? ]

> Some problems I've seen browsing the code:

Thanks for this. It adds even more weight to your ET_DYN argument as well.
I'll need to play with that linker script some more (on PPC, binfmt_misc.o
is 13000 bytes, binfmt_misc.so becomes 156128 bytes 8)

There's still the issue of PPC and PPC64 which can only jump 24-bits away,
and so currently insert trampolines which have to be allocated with the
module, but that should be no uglier than currently. (They could use a
special allocator, too, but with only 16M, they have to ensure noone else
uses those addresses).

PPC64 also frobs the TOC ptr (r2) in the trampolines: I don't have a
ppc64 box in front of me, but I imagine -shared will do the right thing
there too.

Thanks!
Rusty.
--
there are those who do and those who hang on and you don't see too
many doers quoting their contemporaries. -- Larry McVoy

2002-11-16 06:24:35

by Rusty Russell

[permalink] [raw]
Subject: Re: in-kernel linking issues

In message <[email protected]> you write:
> On Sat, Nov 16, 2002 at 09:45:21AM +1100, Rusty Russell wrote:
> > Hmm, OK, I guess this is where I say "patch welcome"?
>
> I guess this is where I say "patch for what"? Do I have some
> amount of buy-in for the shared library approach, or do I start
> adding lots of code to your .o linker?
>
> I guess I could work up a proof-of-concept patch for the former
> and see what people think...

The former. Just hack up a patch for x86 and alpha (skip dropping
init for now) and we can see what it looks like. I can do ppc32, and
when I get back home, ppc64 and Itanium (which IMHO is the real test:
ia64 seems to have one of everything).

It will probably take a week, since I return to .au in two days, and
that always hurts. Sorry 8(

Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2002-11-16 22:44:10

by Richard Henderson

[permalink] [raw]
Subject: Re: in-kernel linking issues

On Sat, Nov 16, 2002 at 04:47:55PM +1100, Rusty Russell wrote:
> [ Sorry for the delayed reply, I only got this mail via kernel.org: did you
> get a bounce from [email protected]? ]

Nope.

> Thanks for this. It adds even more weight to your ET_DYN argument as well.
> I'll need to play with that linker script some more (on PPC, binfmt_misc.o
> is 13000 bytes, binfmt_misc.so becomes 156128 bytes 8)

I know what this is -- max architectural page size of 64k.
The linker is trying to make the file offset congruent to
the address mod 64k. Which adds a *lot* of padding.

Should be fixable by using the -N option.

> There's still the issue of PPC and PPC64 which can only jump 24-bits away,
> and so currently insert trampolines which have to be allocated with the
> module, but that should be no uglier than currently.

I would hope that this would be handled by the normal .plt
creation. We'll see.



r~

2002-11-17 20:53:04

by Richard Henderson

[permalink] [raw]
Subject: Re: in-kernel linking issues

Another Idea:

Why build our own __ksymtab section, which contains names and
addresses in some random order that requires a linear search,
when instead we can re-use the dynamic symbol table for the
shared library. If we do that, we no longer have a linear search,
but can use the hash table provided by the linker. Plus we don't
have to work so hard to get rid of it. ;-)

Consider

#define EXPORT_SYMBOL(sym) \
asm (".section .exports\n" \
" .asciz \"" #sym "\"\n" \
".previous")

then link with '--version-exports-section ""'. This will result
in a .dynsym section that contains exactly those symbols we asked
to exported from the module (plus the undefineds, but that's obvious).

This has other benefits in that the linker will now know that
the symbols not exported may be able to have their relocations
satisfied completely at link time, which results in fewer dynamic
relocations.

Along that same vein, we should be using the link option -Bsymbolic,
since we do not allow modules to override one another's symbols,
and this describes that fact to the linker. Which will result in
even fewer dynamic relocations.

The problem remaining here is to get the same dynamic symbol table
generated for the kernel itself. This is where we'd really win with
the hashing, since the kernel has by far the largest symbol table.
I'll have to give this some more thought.


r~

2002-11-18 16:39:12

by Kai Germaschewski

[permalink] [raw]
Subject: Re: in-kernel linking issues

On Fri, 15 Nov 2002, Richard Henderson wrote:

> [...]
>
> ld -T z.ld -shared -o z.so z.o

BTW, this reminds me of something various people and me have been thinking
about for some time, i.e. postprocessing the .o files to generate the
actual module object.

It seems there are various reasons to do that, possibly the linker issues
which triggered the above, the new - yet to be introduced - module version
mechanism (I think), and it provides additional benefits like e.g. being
able to add the kernel version info in that step, so that a change in
EXTRAVERSION doesn't cause a rebuild of just about everything.

A related question would be a good suffix for kernel modules, e.g
"foo.mo" or "foo.ko" as in module object or kernel object?

--Kai


2002-11-19 06:21:35

by Rusty Russell

[permalink] [raw]
Subject: Re: in-kernel linking issues

In message <[email protected]> y
ou write:
> On Fri, 15 Nov 2002, Richard Henderson wrote:
>
> > [...]
> >
> > ld -T z.ld -shared -o z.so z.o
>
> BTW, this reminds me of something various people and me have been thinking
> about for some time, i.e. postprocessing the .o files to generate the
> actual module object.

See below, I made it go via ".raw.o" (this code is from the module
alias / device tables patch). Renaming the .o files to ".mod" or
".kso" is probably cleaner and clearer.

Cheers,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal working-2.5-bk-devicetable-base/scripts/Makefile.build working-2.5-bk-devicetable/scripts/Makefile.build
--- working-2.5-bk-devicetable-base/scripts/Makefile.build 2002-11-13 18:54:56.000000000 +1100
+++ working-2.5-bk-devicetable/scripts/Makefile.build 2002-11-14 03:41:30.000000000 +1100
@@ -91,7 +91,8 @@ cmd_cc_i_c = $(CPP) $(c_flags) -
quiet_cmd_cc_o_c = CC $@
cmd_cc_o_c = $(CC) $(c_flags) -c -o $@ $<

-%.o: %.c FORCE
+# Not for modules
+$(sort $(objs-y) $(multi-objs)) %.o: %.c FORCE
$(call if_changed_dep,cc_o_c)

quiet_cmd_cc_lst_c = MKLST $@
@@ -100,6 +101,16 @@ cmd_cc_lst_c = $(CC) $(c_flags) -g
%.lst: %.c FORCE
$(call if_changed_dep,cc_lst_c)

+# Modules need to go via "finishing" step.
+quiet_cmd_modfinish = FINISH $@
+cmd_modfinish = sh scripts/modfinish $< $@
+
+$(patsubst %.o,%.raw.o,$(filter-out $(multi-used-m),$(obj-m))): %.raw.o: %.c
+ $(call if_changed_dep,cc_o_c)
+
+$(obj-m): %.o: %.raw.o
+ $(call cmd,modfinish)
+
# Compile assembler sources (.S)
# ---------------------------------------------------------------------------

@@ -161,7 +172,7 @@ targets += $(L_TARGET)
endif

#
-# Rule to link composite objects
+# Rule to link composite objects (builtin)
#

quiet_cmd_link_multi = LD $@
@@ -174,8 +185,16 @@ cmd_link_multi = $(LD) $(LDFLAGS) $(EXTR
$(multi-used-y) : %.o: $(multi-objs-y) FORCE
$(call if_changed,link_multi)

-$(multi-used-m) : %.o: $(multi-objs-m) FORCE
- $(call if_changed,link_multi)
+#
+# Rule to link composite objects (module)
+#
+
+quiet_cmd_link_mmulti = LD $@
+cmd_link_mmulti = $(LD) $(LDFLAGS) $(EXTRA_LDFLAGS) -r -o $@ $(filter $(addprefix $(obj)/,$($(subst $(obj)/,,$(@:.raw.o=-objs))) $($(subst $(obj)/,,$(@:.raw.o=-y)))),$^)
+
+# Need finishing step
+$(multi-used-m:.o=.raw.o) : %.raw.o: $(multi-objs-m) FORCE
+ $(call if_changed,link_mmulti)

targets += $(multi-used-y) $(multi-used-m)