2009-10-23 10:08:50

by Anton D. Kachalov

[permalink] [raw]
Subject: Re: [PATCH 1/2] binfmt_elf: FatELF support in the binary loader.

I have made very similar patch but it's quite small and do not require
deep hacks.
It should works with "setarch" too to force selection of binary.
There is a tool to merge binaries. Glibc / binutils patch work in progress.

Rgds,
Anton


Attachments:
binfmt-truearch-mouse.patch (3.83 kB)

2009-10-23 13:32:11

by Anton D. Kachalov

[permalink] [raw]
Subject: Re: [PATCH 1/2] binfmt_elf: FatELF support in the binary loader.

Anton D. Kachalov wrote:
> I have made very similar patch but it's quite small and do not require
> deep hacks.
> It should works with "setarch" too to force selection of binary.
> There is a tool to merge binaries. Glibc / binutils patch work in
> progress.
$ uname -m
x86_64
$ ./truearch hello32 hello64 hellos
$ ./hellos
hello x86_64
$ setarch i386 ./hellos
Hello x86
$ setarch x86_64 ./hellos
hello x86_64
$ file hello32
hello: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
statically linked, for GNU/Linux 2.6.15, not stripped
$ file hello64
hello64: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically
linked, for GNU/Linux 2.6.15, not stripped

While I didn't finish glibc hacks, I could use only statically linked
binaries.

Rgds,
Anton

2009-10-24 00:20:10

by Ryan C. Gordon

[permalink] [raw]
Subject: Re: [PATCH 1/2] binfmt_elf: FatELF support in the binary loader.


> I have made very similar patch but it's quite small and do not require
> deep hacks.

Wow, competing ideas! :)

Here are my notes on your idea. Ego compels me to prefer my approach, but
I strove to be objective here, as there is a tradeoff of benefits in each
of our approaches.

> It should works with "setarch" too to force selection of binary.

How does setarch work? Does it reorder the file before launching or copy
out one of the ELF records?

If reordering:
What does this do to binaries you can't write to? Regular users couldn't
rewrite /bin/ls before launching, for example.

If copying:
What does this do to programs that rely on the value of argv[0]? If
setarch mangles up argv[0] in its exec*() call to match the original
binary's patch, what does this do to programs that rely on /proc/self/exe?


The most compelling feature of this approach is that a "truearch" binary
(is that the correct name?) could work with any existing Linux system, on
the condition that the architecture you want is the first one in the file.
If you put, say, x86 first in the file and you want to run it on an x86_64
system, you're either out of luck or going to be running the 32-bit
version. In this same scenario, if you put x86_64 first, it just won't run
at all on an unpatched x86 box. So, it's a cool trick, but it's not all
that beneficial. We have to assume that either approach requires kernel
patches to be truly useful. For unpatched boxes, FatELF provides a simple
command line app, fatelf-extract, which can be used to get the original
ELF binary you want out of the FatELF file, both for stripping unwanted
bits and as a measure of last resort if the kernel and dynamic loader
can't handle FatELF. I assume setarch works somewhat the same.

I'm concerned about using the padding bits in e_ident, too. A lot of
manpower went into the ELF specification and I felt it was presumptuous
for me to personally change the format. A container around them, like
FatELF, was a safer, more future-proof choice. I'd rather those that
control the ELF spec decide what those padding bits should be used for in
the future.

The truearch method requires the kernel to seek throughout the whole file
to decide if it can use it at all. FatELF uses the 128 bytes at the front
of the file, which binfmt_elf reads anyhow, and then seeks to the right
record from there, so disk bandwidth overhead is extremely small (one
extra read of 128 bytes if we can use the file, zero extra reads if not).
On the other hand, this approach allows for an unlimited amount of ELF
binaries to reside in a single file below the four gigabyte mark (which is
really, for all intents and purposes, a LOT of binaries). On the other
hand, the FatELF limit of 255 records is probably way more than you could
ever hope to reasonably cram into a file, and if it's not, we can raise it
to 64k (we have reserved bits in the header still). FatELF can store ELF
binaries above 4 gigabytes, unlike truearch, but I'm not sure that's
really ever going to be valuable.

Both approaches have zero disk overhead if a normal ELF file is loaded,
which is good.


In terms of this patch itself, I'd be concerned about using gotos for the
retry_* blocks when a loop would be easy enough to incorporate. I saw you
have a test for personality() that I didn't do; I might have to check into
that, but the binfmt_elf_compat code is definitely catching x86 binaries
on x86_64 here, so I'm not sure it's necessary.

Anyhow, I hope this was useful commentary, and not seen as a battle of
egos. I'm glad to see other approaches, though, as it suggests there
really is a genuine desire for this sort of functionality!

--ryan.

2009-10-24 09:00:10

by Anton D. Kachalov

[permalink] [raw]
Subject: Re: [PATCH 1/2] binfmt_elf: FatELF support in the binary loader.

Hello, Ryan.

Ryan C. Gordon wrote:
> Wow, competing ideas! :)
>
> Here are my notes on your idea. Ego compels me to prefer my approach, but
> I strove to be objective here, as there is a tradeoff of benefits in each
> of our approaches.
>
Thanks :) It was born just out of Apple's concept utilizing unused space
in ELF headers.
>> It should works with "setarch" too to force selection of binary.
>>
>
> How does setarch work? Does it reorder the file before launching or copy
> out one of the ELF records?
>
>
$ man setarch
NAME
setarch - change reported architecture in new program
environment and
set personality flags

$ dpkg -S /usr/bin/setarch
util-linux: /usr/bin/setarch

"setarch" just set "personality" of running program.

[...]
> The most compelling feature of this approach is that a "truearch" binary
> (is that the correct name?) could work with any existing Linux system, on
> the condition that the architecture you want is the first one in the file.
>
Nope, you may put binary files in any order. It's just a linked list of
binaries.
> If you put, say, x86 first in the file and you want to run it on an x86_64
> system, you're either out of luck or going to be running the 32-bit
> version.
As a previous state, you will able run x86_64. But you need to change
order of binfmt and compat_binfmt in built-in.o by changing Makefile.
Just swap two lines. I don't know why on x86_64 system first we try
compat mode than native while simple run of native app will take more
cpu cycles on x86_64 Vs. x86.
> In this same scenario, if you put x86_64 first, it just won't run
> at all on an unpatched x86 box. So, it's a cool trick, but it's not all
> that beneficial. We have to assume that either approach requires kernel
> patches to be truly useful. For unpatched boxes, FatELF provides a simple
> command line app, fatelf-extract, which can be used to get the original
> ELF binary you want out of the FatELF file, both for stripping unwanted
> bits and as a measure of last resort if the kernel and dynamic loader
> can't handle FatELF. I assume setarch works somewhat the same.
>
Which arch will be "fatelf-extract"? Let's say, If I'm running Linux on
PowerPC? x86? =) Only if it is a shell script, it will be beneficial for
any arch. I can inject "offset" portion in script file too...
> I'm concerned about using the padding bits in e_ident, too. A lot of
> manpower went into the ELF specification and I felt it was presumptuous
> for me to personally change the format. A container around them, like
> FatELF, was a safer, more future-proof choice. I'd rather those that
> control the ELF spec decide what those padding bits should be used for in
> the future.
>
> The truearch method requires the kernel to seek throughout the whole file
>
Nope, it just read "offset" field and seek if needed. So, if file is
just one-arch, it will read 128 bytes only.
> to decide if it can use it at all. FatELF uses the 128 bytes at the front
> of the file, which binfmt_elf reads anyhow, and then seeks to the right
> record from there, so disk bandwidth overhead is extremely small (one
> extra read of 128 bytes if we can use the file, zero extra reads if not).
>
In my approach, it's just a few seeks more. Just a few additional reads
are not so much compared to overall reads from that file.

[...]
> Both approaches have zero disk overhead if a normal ELF file is loaded,
> which is good.
>
>
> In terms of this patch itself, I'd be concerned about using gotos for the
> retry_* blocks when a loop would be easy enough to incorporate. I saw you
> have a test for personality() that I didn't do; I might have to check into
> that, but the binfmt_elf_compat code is definitely catching x86 binaries
> on x86_64 here, so I'm not sure it's necessary.
>
It's necessary if you would like to use setarch to choose binaries on
biarch systems.
> Anyhow, I hope this was useful commentary, and not seen as a battle of
> egos. I'm glad to see other approaches, though, as it suggests there
> really is a genuine desire for this sort of functionality!
>
:) Agreed

Rgds,
Anton

2009-10-24 18:14:56

by Ryan C. Gordon

[permalink] [raw]
Subject: Re: [PATCH 1/2] binfmt_elf: FatELF support in the binary loader.


> > How does setarch work? Does it reorder the file before launching or copy out
> > one of the ELF records?
> >
> $ man setarch

Huh, I guess that explains the personality() thing in your patch. :)

> > The most compelling feature of this approach is that a "truearch" binary (is
> > that the correct name?) could work with any existing Linux system, on the
> > condition that the architecture you want is the first one in the file.
> Nope, you may put binary files in any order. It's just a linked list of
> binaries.

I was talking about an unpatched kernel in this case. Without a kernel
change, it rejects the ELF file if the first architecture isn't the one it
wants. So your approach is slightly better for backwards compatibility
than mine, but not significantly so.

For a full compatibility plan, you'd probably still need to write
something like fatelf-extract.

> As a previous state, you will able run x86_64. But you need to change order of
> binfmt and compat_binfmt in built-in.o by changing Makefile. Just swap two
> lines. I don't know why on x86_64 system first we try compat mode than native
> while simple run of native app will take more cpu cycles on x86_64 Vs. x86.

I think it was a bug. It got corrected sometime after the 2.6.28 release
that your patch was based on (I ran into that this week, too),
register_binfmt() adds to the other side of the list now, so the Makefile
doesn't need changing.

> Which arch will be "fatelf-extract"? Let's say, If I'm running Linux on
> PowerPC? x86? =) Only if it is a shell script, it will be beneficial for any
> arch. I can inject "offset" portion in script file too...

Presumably, it'll be installed system-wide (as either an ELF or FatELF
binary), like setarch, or built by the user if they don't otherwise have
FatELF support. setarch and fatelf-extract are both simple C programs.

http://fedorahosted.org/setarch/browser/setarch.c
http://hg.icculus.org/icculus/fatelf/raw-file/tip/utils/fatelf-extract.c

> Nope, it just read "offset" field and seek if needed. So, if file is just
> one-arch, it will read 128 bytes only.

But if it's 10 architectures?

> In my approach, it's just a few seeks more. Just a few additional reads are
> not so much compared to overall reads from that file.

Yes, I'm not sure it's a serious concern, even if you put 200 binaries in
there and have to iterate them all, the overhead is probably not serious.
Still, it seems like a better practice to keep these things closer to O(1)
when possible.

> It's necessary if you would like to use setarch to choose binaries on biarch
> systems.

I didn't understand that when I first looked at the patch; sorry for any
confusion I caused. I should probably add a similar test to FatELF.

--ryan.

2009-10-26 06:04:12

by Ryan C. Gordon

[permalink] [raw]
Subject: Re: [PATCH 1/2] binfmt_elf: FatELF support in the binary loader.


> > It's necessary if you would like to use setarch to choose binaries on biarch
> > systems.
>
> I didn't understand that when I first looked at the patch; sorry for any
> confusion I caused. I should probably add a similar test to FatELF.

...and now I have.

This is the same as the previous revision of my patch, plus a few lines to
check the personality, so setarch(8) can be used to force a 32-bit record
to be selected. Thanks to Anton for the idea!



>From 083ebfa908c1bfbdabf81e921f3ce6b5d430bcb9 Mon Sep 17 00:00:00 2001
From: Ryan C. Gordon <[email protected]>
Date: Mon, 26 Oct 2009 01:50:41 -0400
Subject: [PATCH] binfmt_elf: FatELF support in the binary loader.

This allows the kernel to load a single file that contains multiple ELF
binaries (a "FatELF" file), selecting the correct one for the system.

Details, rationale, tools, and patches for handling FatELF binaries can be
found at http://icculus.org/fatelf/

Signed-off-by: Ryan C. Gordon <[email protected]>
---
arch/ia64/ia32/binfmt_elf32.c | 4 +-
fs/binfmt_elf.c | 159 ++++++++++++++++++++++++++++++++++------
include/linux/elf.h | 23 ++++++
3 files changed, 160 insertions(+), 26 deletions(-)

diff --git a/arch/ia64/ia32/binfmt_elf32.c b/arch/ia64/ia32/binfmt_elf32.c
index c69552b..02d5dd1 100644
--- a/arch/ia64/ia32/binfmt_elf32.c
+++ b/arch/ia64/ia32/binfmt_elf32.c
@@ -223,12 +223,12 @@ elf32_set_personality (void)

static unsigned long
elf32_map(struct file *filep, unsigned long addr, struct elf_phdr *eppnt,
- int prot, int type, unsigned long unused)
+ int prot, int type, unsigned long unused, unsigned long base)
{
unsigned long pgoff = (eppnt->p_vaddr) & ~IA32_PAGE_MASK;

return ia32_do_mmap(filep, (addr & IA32_PAGE_MASK), eppnt->p_filesz + pgoff, prot, type,
- eppnt->p_offset - pgoff);
+ (eppnt->p_offset + base) - pgoff);
}

#define cpu_uses_ia32el() (local_cpu_data->family > 0x1f)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index b9b3bb5..a195024 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -37,8 +37,9 @@

static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs);
static int load_elf_library(struct file *);
-static unsigned long elf_map(struct file *, unsigned long, struct elf_phdr *,
- int, int, unsigned long);
+static unsigned long
+elf_map(struct file *, unsigned long, struct elf_phdr *,
+ int, int, unsigned long, unsigned long);

/*
* If we don't support core dumping, then supply a NULL so we
@@ -319,7 +320,7 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,

static unsigned long elf_map(struct file *filep, unsigned long addr,
struct elf_phdr *eppnt, int prot, int type,
- unsigned long total_size)
+ unsigned long total_size, unsigned long base_offset)
{
unsigned long map_addr;
unsigned long size = eppnt->p_filesz + ELF_PAGEOFFSET(eppnt->p_vaddr);
@@ -343,11 +344,14 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
*/
if (total_size) {
total_size = ELF_PAGEALIGN(total_size);
- map_addr = do_mmap(filep, addr, total_size, prot, type, off);
+ map_addr = do_mmap(filep, addr, total_size, prot, type,
+ off + base_offset);
if (!BAD_ADDR(map_addr))
do_munmap(current->mm, map_addr+size, total_size-size);
- } else
- map_addr = do_mmap(filep, addr, size, prot, type, off);
+ } else {
+ map_addr = do_mmap(filep, addr, size, prot, type,
+ off + base_offset);
+ }

up_write(&current->mm->mmap_sem);
return(map_addr);
@@ -381,7 +385,7 @@ static unsigned long total_mapping_size(struct elf_phdr *cmds, int nr)

static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
struct file *interpreter, unsigned long *interp_map_addr,
- unsigned long no_base)
+ unsigned long no_base, unsigned long base_offset)
{
struct elf_phdr *elf_phdata;
struct elf_phdr *eppnt;
@@ -419,7 +423,7 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
if (!elf_phdata)
goto out;

- retval = kernel_read(interpreter, interp_elf_ex->e_phoff,
+ retval = kernel_read(interpreter, interp_elf_ex->e_phoff + base_offset,
(char *)elf_phdata,size);
error = -EIO;
if (retval != size) {
@@ -455,7 +459,8 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
load_addr = -vaddr;

map_addr = elf_map(interpreter, load_addr + vaddr,
- eppnt, elf_prot, elf_type, total_size);
+ eppnt, elf_prot, elf_type, total_size,
+ base_offset);
total_size = 0;
if (!*interp_map_addr)
*interp_map_addr = map_addr;
@@ -560,10 +565,97 @@ static unsigned long randomize_stack_top(unsigned long stack_top)
#endif
}

+
+/*
+ * See if we're a valid FatELF binary, find the right record, and
+ * load (*elf) with the actual ELF header. Sets (*offset) to the
+ * base offset of the chosen ELF binary. Returns 0 on success or a negative
+ * error code.
+ * If we're not a FatELF binary, (*elf) is loaded with the existing contents
+ * of (buf) and 0 is returned.
+ */
+static int examine_fatelf(struct file *file, const char *filename, char *buf,
+ int buflen, unsigned long *offset, struct elfhdr *elf)
+{
+ int records, i, rc;
+ const Fatelf_hdr *fatelf = (Fatelf_hdr *) buf;
+
+#if ((BITS_PER_LONG == 64) && (ELF_CLASS == ELFCLASS64) && CONFIG_COMPAT)
+ if (unlikely(personality(current->personality) == PER_LINUX32))
+ return -ENOEXEC; /* want to force 32-bit record! */
+#endif
+
+ if (likely(le32_to_cpu(fatelf->magic) != FATELF_MAGIC)) {
+ *elf = *((struct elfhdr *)buf); /* treat like normal ELF. */
+ return 0; /* not a FatELF binary; not an error. */
+ }
+
+ if (unlikely(le16_to_cpu(fatelf->version) != 1))
+ return -ENOEXEC; /* Unrecognized format version. */
+
+ /*
+ * In theory, there could be 255 separate records packed into this
+ * binary, but for now, bprm->buf (128 bytes) holds exactly 5
+ * records with the fatelf header, and that seems reasonable for
+ * most uses. We could add the complexity to read more records later
+ * if there's a serious need.
+ */
+ records = (int) fatelf->num_records; /* uint8, no byteswap needed */
+ if (unlikely(records > 5))
+ records = 5; /* clamp, in case we find one we can use. */
+
+ for (i = 0; i < records; i++) {
+ const Fatelf_record *record = &fatelf->records[i];
+ const __u8 osabi = record->osabi;
+ const int abiok = likely( likely(osabi == ELFOSABI_NONE) ||
+ unlikely(osabi == ELFOSABI_LINUX) );
+
+ /* Fill in the data elf_check_arch() might care about. */
+ elf->e_ident[EI_OSABI] = record->osabi;
+ elf->e_ident[EI_CLASS] = record->word_size;
+ elf->e_ident[EI_DATA] = record->byte_order;
+ elf->e_machine = le16_to_cpu(record->machine);
+
+ if (unlikely(elf_check_arch(elf))
+ && likely(abiok)
+ && likely(record->osabi_version == 0)) {
+ /* We can support this ELF arch/abi. */
+ const __u64 rec_offset = le64_to_cpu(record->offset);
+ const __u64 rec_size = le64_to_cpu(record->size);
+ const __u64 end_offset = rec_offset + rec_size;
+ const unsigned long uloff = (unsigned long) rec_offset;
+
+#if BITS_PER_LONG == 32
+ if (unlikely(end_offset > 0xFFFFFFFF))
+ continue;
+#endif
+
+ if (unlikely(end_offset < rec_offset))
+ continue; /* overflow (corrupt file?) */
+ if (unlikely(ELF_PAGEOFFSET(uloff) != 0))
+ continue; /* bad alignment. */
+
+ /* replace the FatELF data with the real ELF header. */
+ rc = kernel_read(file, uloff, (char*) elf, sizeof(*elf));
+ if (unlikely((rc != sizeof(*elf)) && (rc >= 0)))
+ rc = -EIO;
+ else if (likely(rc == sizeof(*elf))) {
+ *offset = uloff;
+ rc = 0;
+ }
+
+ return rc;
+ }
+ }
+
+ return -ENOEXEC; /* no binaries we could use. */
+}
+
+
static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
{
struct file *interpreter = NULL; /* to shut gcc up */
- unsigned long load_addr = 0, load_bias = 0;
+ unsigned long load_addr = 0, load_bias = 0;
int load_addr_set = 0;
char * elf_interpreter = NULL;
unsigned long error;
@@ -571,6 +663,8 @@ static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
unsigned long elf_bss, elf_brk;
int retval, i;
unsigned int size;
+ unsigned long base_offset = 0;
+ unsigned long interp_base_offset = 0;
unsigned long elf_entry;
unsigned long interp_load_addr = 0;
unsigned long start_code, end_code, start_data, end_data;
@@ -587,9 +681,11 @@ static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
retval = -ENOMEM;
goto out_ret;
}
-
- /* Get the exec-header */
- loc->elf_ex = *((struct elfhdr *)bprm->buf);
+
+ retval = examine_fatelf(bprm->file, bprm->filename, bprm->buf,
+ BINPRM_BUF_SIZE, &base_offset, &loc->elf_ex);
+ if (unlikely(retval < 0))
+ goto out_ret;

retval = -ENOEXEC;
/* First of all, some simple consistency checks */
@@ -615,7 +711,7 @@ static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
if (!elf_phdata)
goto out;

- retval = kernel_read(bprm->file, loc->elf_ex.e_phoff,
+ retval = kernel_read(bprm->file, loc->elf_ex.e_phoff + base_offset,
(char *)elf_phdata, size);
if (retval != size) {
if (retval >= 0)
@@ -649,7 +745,8 @@ static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
if (!elf_interpreter)
goto out_free_ph;

- retval = kernel_read(bprm->file, elf_ppnt->p_offset,
+ retval = kernel_read(bprm->file,
+ elf_ppnt->p_offset + base_offset,
elf_interpreter,
elf_ppnt->p_filesz);
if (retval != elf_ppnt->p_filesz) {
@@ -704,8 +801,13 @@ static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
goto out_free_dentry;
}

- /* Get the exec headers */
- loc->interp_elf_ex = *((struct elfhdr *)bprm->buf);
+ retval = examine_fatelf(interpreter, elf_interpreter,
+ bprm->buf, BINPRM_BUF_SIZE,
+ &interp_base_offset,
+ &loc->interp_elf_ex);
+ if (unlikely(retval < 0)) {
+ goto out_free_dentry;
+ }
break;
}
elf_ppnt++;
@@ -830,7 +932,7 @@ static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
}

error = elf_map(bprm->file, load_bias + vaddr, elf_ppnt,
- elf_prot, elf_flags, 0);
+ elf_prot, elf_flags, 0, base_offset);
if (BAD_ADDR(error)) {
send_sig(SIGKILL, current, 0);
retval = IS_ERR((void *)error) ?
@@ -911,7 +1013,7 @@ static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
elf_entry = load_elf_interp(&loc->interp_elf_ex,
interpreter,
&interp_map_addr,
- load_bias);
+ load_bias, interp_base_offset);
if (!IS_ERR((void *)elf_entry)) {
/*
* load_elf_interp() returns relocation
@@ -1026,11 +1128,19 @@ static int load_elf_library(struct file *file)
unsigned long elf_bss, bss, len;
int retval, error, i, j;
struct elfhdr elf_ex;
+ unsigned long base_offset = 0;
+ char buf[BINPRM_BUF_SIZE];

- error = -ENOEXEC;
- retval = kernel_read(file, 0, (char *)&elf_ex, sizeof(elf_ex));
- if (retval != sizeof(elf_ex))
+ retval = kernel_read(file, 0, buf, sizeof(buf));
+ if (unlikely(retval != sizeof(buf))) {
+ error = (retval >= 0) ? -EIO : retval;
goto out;
+ }
+ error = examine_fatelf(file, 0, buf, sizeof(buf), &base_offset, &elf_ex);
+ if (unlikely(retval < 0)) {
+ goto out;
+ }
+ error = -ENOEXEC;

if (memcmp(elf_ex.e_ident, ELFMAG, SELFMAG) != 0)
goto out;
@@ -1052,7 +1162,8 @@ static int load_elf_library(struct file *file)

eppnt = elf_phdata;
error = -ENOEXEC;
- retval = kernel_read(file, elf_ex.e_phoff, (char *)eppnt, j);
+ retval = kernel_read(file, elf_ex.e_phoff + base_offset,
+ (char *)eppnt, j);
if (retval != j)
goto out_free_ph;

@@ -1074,7 +1185,7 @@ static int load_elf_library(struct file *file)
PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE,
(eppnt->p_offset -
- ELF_PAGEOFFSET(eppnt->p_vaddr)));
+ ELF_PAGEOFFSET(eppnt->p_vaddr)) + base_offset);
up_write(&current->mm->mmap_sem);
if (error != ELF_PAGESTART(eppnt->p_vaddr))
goto out_free_ph;
diff --git a/include/linux/elf.h b/include/linux/elf.h
index 90a4ed0..ded1fb6 100644
--- a/include/linux/elf.h
+++ b/include/linux/elf.h
@@ -33,6 +33,29 @@ typedef __u32 Elf64_Word;
typedef __u64 Elf64_Xword;
typedef __s64 Elf64_Sxword;

+/* FatELF (multiple ELF binaries in one file) support */
+#define FATELF_MAGIC (0x1F0E70FA)
+
+typedef struct Fatelf_record {
+ __le16 machine; /* maps to e_machine */
+ __u8 osabi; /* maps to e_ident[EI_OSABI] */
+ __u8 osabi_version; /* maps to e_ident[EI_ABIVERSION] */
+ __u8 word_size; /* maps to e_ident[EI_CLASS] */
+ __u8 byte_order; /* maps to e_ident[EI_DATA] */
+ __u8 reserved0;
+ __u8 reserved1;
+ __le64 offset;
+ __le64 size;
+} Fatelf_record;
+
+typedef struct Fatelf_hdr {
+ __le32 magic;
+ __le16 version;
+ __u8 num_records;
+ __u8 reserved0;
+ Fatelf_record records[];
+} Fatelf_hdr;
+
/* These constants are for the segment types stored in the image headers */
#define PT_NULL 0
#define PT_LOAD 1
--
1.6.0.4


2009-10-30 02:22:30

by Ryan C. Gordon

[permalink] [raw]
Subject: Re: [PATCH 1/2] binfmt_elf: FatELF support in the binary loader.


(Hopefully final version, now with checkpatch.pl approval!)

>From c6fd97215e9cd1de050bed500195df617f01a291 Mon Sep 17 00:00:00 2001
From: Ryan C. Gordon <[email protected]>
Date: Tue, 27 Oct 2009 13:53:39 -0400
Subject: [PATCH] binfmt_elf: FatELF support in the binary loader.

This allows the kernel to load a single file that contains multiple ELF
binaries (a "FatELF" file), selecting the correct one for the system.

Details, rationale, tools, and patches for handling FatELF binaries can be
found at http://icculus.org/fatelf/

Signed-off-by: Ryan C. Gordon <[email protected]>
---
arch/ia64/ia32/binfmt_elf32.c | 4 +-
fs/binfmt_elf.c | 159 ++++++++++++++++++++++++++++++++++------
include/linux/elf.h | 23 ++++++
3 files changed, 160 insertions(+), 26 deletions(-)

diff --git a/arch/ia64/ia32/binfmt_elf32.c b/arch/ia64/ia32/binfmt_elf32.c
index c69552b..02d5dd1 100644
--- a/arch/ia64/ia32/binfmt_elf32.c
+++ b/arch/ia64/ia32/binfmt_elf32.c
@@ -223,12 +223,12 @@ elf32_set_personality (void)

static unsigned long
elf32_map(struct file *filep, unsigned long addr, struct elf_phdr *eppnt,
- int prot, int type, unsigned long unused)
+ int prot, int type, unsigned long unused, unsigned long base)
{
unsigned long pgoff = (eppnt->p_vaddr) & ~IA32_PAGE_MASK;

return ia32_do_mmap(filep, (addr & IA32_PAGE_MASK), eppnt->p_filesz + pgoff, prot, type,
- eppnt->p_offset - pgoff);
+ (eppnt->p_offset + base) - pgoff);
}

#define cpu_uses_ia32el() (local_cpu_data->family > 0x1f)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index b9b3bb5..2df7d8a 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -37,8 +37,9 @@

static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs);
static int load_elf_library(struct file *);
-static unsigned long elf_map(struct file *, unsigned long, struct elf_phdr *,
- int, int, unsigned long);
+static unsigned long
+elf_map(struct file *, unsigned long, struct elf_phdr *,
+ int, int, unsigned long, unsigned long);

/*
* If we don't support core dumping, then supply a NULL so we
@@ -319,7 +320,7 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,

static unsigned long elf_map(struct file *filep, unsigned long addr,
struct elf_phdr *eppnt, int prot, int type,
- unsigned long total_size)
+ unsigned long total_size, unsigned long base_offset)
{
unsigned long map_addr;
unsigned long size = eppnt->p_filesz + ELF_PAGEOFFSET(eppnt->p_vaddr);
@@ -343,11 +344,14 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
*/
if (total_size) {
total_size = ELF_PAGEALIGN(total_size);
- map_addr = do_mmap(filep, addr, total_size, prot, type, off);
+ map_addr = do_mmap(filep, addr, total_size, prot, type,
+ off + base_offset);
if (!BAD_ADDR(map_addr))
do_munmap(current->mm, map_addr+size, total_size-size);
- } else
- map_addr = do_mmap(filep, addr, size, prot, type, off);
+ } else {
+ map_addr = do_mmap(filep, addr, size, prot, type,
+ off + base_offset);
+ }

up_write(&current->mm->mmap_sem);
return(map_addr);
@@ -381,7 +385,7 @@ static unsigned long total_mapping_size(struct elf_phdr *cmds, int nr)

static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
struct file *interpreter, unsigned long *interp_map_addr,
- unsigned long no_base)
+ unsigned long no_base, unsigned long base_offset)
{
struct elf_phdr *elf_phdata;
struct elf_phdr *eppnt;
@@ -419,7 +423,7 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
if (!elf_phdata)
goto out;

- retval = kernel_read(interpreter, interp_elf_ex->e_phoff,
+ retval = kernel_read(interpreter, interp_elf_ex->e_phoff + base_offset,
(char *)elf_phdata,size);
error = -EIO;
if (retval != size) {
@@ -455,7 +459,8 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
load_addr = -vaddr;

map_addr = elf_map(interpreter, load_addr + vaddr,
- eppnt, elf_prot, elf_type, total_size);
+ eppnt, elf_prot, elf_type, total_size,
+ base_offset);
total_size = 0;
if (!*interp_map_addr)
*interp_map_addr = map_addr;
@@ -560,10 +565,98 @@ static unsigned long randomize_stack_top(unsigned long stack_top)
#endif
}

+
+/*
+ * See if we're a valid FatELF binary, find the right record, and
+ * load (*elf) with the actual ELF header. Sets (*offset) to the
+ * base offset of the chosen ELF binary. Returns 0 on success or a negative
+ * error code.
+ * If we're not a FatELF binary, (*elf) is loaded with the existing contents
+ * of (buf) and 0 is returned.
+ */
+static int examine_fatelf(struct file *file, const char *filename, char *buf,
+ int buflen, unsigned long *offset, struct elfhdr *elf)
+{
+ int records, i, rc;
+ const struct Fatelf_hdr *fatelf = (struct Fatelf_hdr *) buf;
+
+#if ((BITS_PER_LONG == 64) && (ELF_CLASS == ELFCLASS64) && CONFIG_COMPAT)
+ if (unlikely(personality(current->personality) == PER_LINUX32))
+ return -ENOEXEC; /* want to force 32-bit record! */
+#endif
+
+ if (likely(le32_to_cpu(fatelf->magic) != FATELF_MAGIC)) {
+ *elf = *((struct elfhdr *)buf); /* treat like normal ELF. */
+ return 0; /* not a FatELF binary; not an error. */
+ }
+
+ if (unlikely(le16_to_cpu(fatelf->version) != 1))
+ return -ENOEXEC; /* Unrecognized format version. */
+
+ /*
+ * In theory, there could be 255 separate records packed into this
+ * binary, but for now, bprm->buf (128 bytes) holds exactly 5
+ * records with the fatelf header, and that seems reasonable for
+ * most uses. We could add the complexity to read more records later
+ * if there's a serious need.
+ */
+ records = (int) fatelf->num_records; /* uint8, no byteswap needed */
+ if (unlikely(records > 5))
+ records = 5; /* clamp, in case we find one we can use. */
+
+ for (i = 0; i < records; i++) {
+ const struct Fatelf_record *record = &fatelf->records[i];
+ const __u8 osabi = record->osabi;
+ const int abiok = likely(likely(osabi == ELFOSABI_NONE) ||
+ unlikely(osabi == ELFOSABI_LINUX));
+
+ /* Fill in the data elf_check_arch() might care about. */
+ elf->e_ident[EI_OSABI] = record->osabi;
+ elf->e_ident[EI_CLASS] = record->word_size;
+ elf->e_ident[EI_DATA] = record->byte_order;
+ elf->e_machine = le16_to_cpu(record->machine);
+
+ if (unlikely(elf_check_arch(elf))
+ && likely(abiok)
+ && likely(record->osabi_version == 0)) {
+ /* We can support this ELF arch/abi. */
+ const __u64 rec_offset = le64_to_cpu(record->offset);
+ const __u64 rec_size = le64_to_cpu(record->size);
+ const __u64 end_offset = rec_offset + rec_size;
+ const unsigned long uloff = (unsigned long) rec_offset;
+
+#if BITS_PER_LONG == 32
+ if (unlikely(end_offset > 0xFFFFFFFF))
+ continue;
+#endif
+
+ if (unlikely(end_offset < rec_offset))
+ continue; /* overflow (corrupt file?) */
+ if (unlikely(ELF_PAGEOFFSET(uloff) != 0))
+ continue; /* bad alignment. */
+
+ /* replace the FatELF data with the real ELF header. */
+ rc = kernel_read(file, uloff, (char *) elf,
+ sizeof(*elf));
+ if (unlikely((rc != sizeof(*elf)) && (rc >= 0)))
+ rc = -EIO;
+ else if (likely(rc == sizeof(*elf))) {
+ *offset = uloff;
+ rc = 0;
+ }
+
+ return rc;
+ }
+ }
+
+ return -ENOEXEC; /* no binaries we could use. */
+}
+
+
static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
{
struct file *interpreter = NULL; /* to shut gcc up */
- unsigned long load_addr = 0, load_bias = 0;
+ unsigned long load_addr = 0, load_bias = 0;
int load_addr_set = 0;
char * elf_interpreter = NULL;
unsigned long error;
@@ -571,6 +664,8 @@ static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
unsigned long elf_bss, elf_brk;
int retval, i;
unsigned int size;
+ unsigned long base_offset = 0;
+ unsigned long interp_base_offset = 0;
unsigned long elf_entry;
unsigned long interp_load_addr = 0;
unsigned long start_code, end_code, start_data, end_data;
@@ -587,9 +682,11 @@ static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
retval = -ENOMEM;
goto out_ret;
}
-
- /* Get the exec-header */
- loc->elf_ex = *((struct elfhdr *)bprm->buf);
+
+ retval = examine_fatelf(bprm->file, bprm->filename, bprm->buf,
+ BINPRM_BUF_SIZE, &base_offset, &loc->elf_ex);
+ if (unlikely(retval < 0))
+ goto out_ret;

retval = -ENOEXEC;
/* First of all, some simple consistency checks */
@@ -615,7 +712,7 @@ static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
if (!elf_phdata)
goto out;

- retval = kernel_read(bprm->file, loc->elf_ex.e_phoff,
+ retval = kernel_read(bprm->file, loc->elf_ex.e_phoff + base_offset,
(char *)elf_phdata, size);
if (retval != size) {
if (retval >= 0)
@@ -649,7 +746,8 @@ static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
if (!elf_interpreter)
goto out_free_ph;

- retval = kernel_read(bprm->file, elf_ppnt->p_offset,
+ retval = kernel_read(bprm->file,
+ elf_ppnt->p_offset + base_offset,
elf_interpreter,
elf_ppnt->p_filesz);
if (retval != elf_ppnt->p_filesz) {
@@ -704,8 +802,12 @@ static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
goto out_free_dentry;
}

- /* Get the exec headers */
- loc->interp_elf_ex = *((struct elfhdr *)bprm->buf);
+ retval = examine_fatelf(interpreter, elf_interpreter,
+ bprm->buf, BINPRM_BUF_SIZE,
+ &interp_base_offset,
+ &loc->interp_elf_ex);
+ if (unlikely(retval < 0))
+ goto out_free_dentry;
break;
}
elf_ppnt++;
@@ -830,7 +932,7 @@ static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
}

error = elf_map(bprm->file, load_bias + vaddr, elf_ppnt,
- elf_prot, elf_flags, 0);
+ elf_prot, elf_flags, 0, base_offset);
if (BAD_ADDR(error)) {
send_sig(SIGKILL, current, 0);
retval = IS_ERR((void *)error) ?
@@ -911,7 +1013,7 @@ static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
elf_entry = load_elf_interp(&loc->interp_elf_ex,
interpreter,
&interp_map_addr,
- load_bias);
+ load_bias, interp_base_offset);
if (!IS_ERR((void *)elf_entry)) {
/*
* load_elf_interp() returns relocation
@@ -1026,11 +1128,19 @@ static int load_elf_library(struct file *file)
unsigned long elf_bss, bss, len;
int retval, error, i, j;
struct elfhdr elf_ex;
+ unsigned long base_offset = 0;
+ char buf[BINPRM_BUF_SIZE];

- error = -ENOEXEC;
- retval = kernel_read(file, 0, (char *)&elf_ex, sizeof(elf_ex));
- if (retval != sizeof(elf_ex))
+ retval = kernel_read(file, 0, buf, sizeof(buf));
+ if (unlikely(retval != sizeof(buf))) {
+ error = (retval >= 0) ? -EIO : retval;
+ goto out;
+ }
+ error = examine_fatelf(file, 0, buf, sizeof(buf),
+ &base_offset, &elf_ex);
+ if (unlikely(retval < 0))
goto out;
+ error = -ENOEXEC;

if (memcmp(elf_ex.e_ident, ELFMAG, SELFMAG) != 0)
goto out;
@@ -1052,7 +1162,8 @@ static int load_elf_library(struct file *file)

eppnt = elf_phdata;
error = -ENOEXEC;
- retval = kernel_read(file, elf_ex.e_phoff, (char *)eppnt, j);
+ retval = kernel_read(file, elf_ex.e_phoff + base_offset,
+ (char *)eppnt, j);
if (retval != j)
goto out_free_ph;

@@ -1074,7 +1185,7 @@ static int load_elf_library(struct file *file)
PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE,
(eppnt->p_offset -
- ELF_PAGEOFFSET(eppnt->p_vaddr)));
+ ELF_PAGEOFFSET(eppnt->p_vaddr)) + base_offset);
up_write(&current->mm->mmap_sem);
if (error != ELF_PAGESTART(eppnt->p_vaddr))
goto out_free_ph;
diff --git a/include/linux/elf.h b/include/linux/elf.h
index 90a4ed0..742bab9 100644
--- a/include/linux/elf.h
+++ b/include/linux/elf.h
@@ -33,6 +33,29 @@ typedef __u32 Elf64_Word;
typedef __u64 Elf64_Xword;
typedef __s64 Elf64_Sxword;

+/* FatELF (multiple ELF binaries in one file) support */
+#define FATELF_MAGIC (0x1F0E70FA)
+
+struct Fatelf_record {
+ __le16 machine; /* maps to e_machine */
+ __u8 osabi; /* maps to e_ident[EI_OSABI] */
+ __u8 osabi_version; /* maps to e_ident[EI_ABIVERSION] */
+ __u8 word_size; /* maps to e_ident[EI_CLASS] */
+ __u8 byte_order; /* maps to e_ident[EI_DATA] */
+ __u8 reserved0;
+ __u8 reserved1;
+ __le64 offset;
+ __le64 size;
+};
+
+struct Fatelf_hdr {
+ __le32 magic;
+ __le16 version;
+ __u8 num_records;
+ __u8 reserved0;
+ struct Fatelf_record records[];
+};
+
/* These constants are for the segment types stored in the image headers */
#define PT_NULL 0
#define PT_LOAD 1
--
1.6.0.4