2006-08-04 21:05:46

by Don Zickus

[permalink] [raw]
Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

On Mon, Jul 31, 2006 at 10:19:04AM -0600, Eric W. Biederman wrote:
>
> I have spent some time and have gotten my relocatable kernel patches
> working against the latest kernels. I intend to push this upstream
> shortly.
>
> Could all of the people who care take a look and test this out
> to make certain that it doesn't just work on my test box?

Is there any reason to get following error on x86_64 using your patches?

Filesystem type is ext2fs, partition type 0x83
kernel /bzImage ro root=LABEL=/1 console=ttyS0,115200
earlyprintk=ttyS0,115200
[Linux-bzImage, setup=0x1c00, size=0x24917c]
initrd /initrd-2.6.18-rc3.img
[Linux-initrd @ 0x37e0d000, 0x1e25e7 bytes]

.
Decompressing Linux...

length error

-- System halted


I can get i386 to boot fine. I can't for the life of me figure out what I
am doing wrong..

Cheers,
Don


2006-08-04 21:27:38

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

Don Zickus <[email protected]> writes:

> On Mon, Jul 31, 2006 at 10:19:04AM -0600, Eric W. Biederman wrote:
>>
>> I have spent some time and have gotten my relocatable kernel patches
>> working against the latest kernels. I intend to push this upstream
>> shortly.
>>
>> Could all of the people who care take a look and test this out
>> to make certain that it doesn't just work on my test box?
>
> Is there any reason to get following error on x86_64 using your patches?

There shouldn't be.

> Filesystem type is ext2fs, partition type 0x83
> kernel /bzImage ro root=LABEL=/1 console=ttyS0,115200
> earlyprintk=ttyS0,115200
> [Linux-bzImage, setup=0x1c00, size=0x24917c]
> initrd /initrd-2.6.18-rc3.img
> [Linux-initrd @ 0x37e0d000, 0x1e25e7 bytes]
>
> .
> Decompressing Linux...
>
> length error
>
> -- System halted
>
>
> I can get i386 to boot fine. I can't for the life of me figure out what I
> am doing wrong..

The length error comes from lib/inflate.c

I think it would be interesting to look at orig_len and bytes_out.

My hunch is that I have tripped over a tool chain bug or a weird
alignment issue.

The error is the uncompressed length does not math the stored length
of the data before from before we compressed it. Now what is
fascinating is that our crc's match (as that check is performed first).

Something is very slightly off and I don't see what it is.

After looking at the state variables I would probably start looking
at the uncompressed data to see if it really was decompressing
properly. If nothing else that is the kind of process that would tend
to spark a clue.

Eric

2006-08-04 23:40:41

by Don Zickus

[permalink] [raw]
Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

> The length error comes from lib/inflate.c
>
> I think it would be interesting to look at orig_len and bytes_out.
>
> My hunch is that I have tripped over a tool chain bug or a weird
> alignment issue.

I thought so too, but I took vmlinuz images from people (Vivek) who had it
boot on their systems but those images still failed on my two machines.

>
> The error is the uncompressed length does not math the stored length
> of the data before from before we compressed it. Now what is
> fascinating is that our crc's match (as that check is performed first).
>
> Something is very slightly off and I don't see what it is.

I printed out orig_len -> 5910532 (which matches vmlinux.bin)
bytes_out -> 5910531

>
> After looking at the state variables I would probably start looking
> at the uncompressed data to see if it really was decompressing
> properly. If nothing else that is the kind of process that would tend
> to spark a clue.

I am not familiar with the code, so very few sparks are flying. I'll
still dig through though. Thanks for the tips.

Cheers,
Don

2006-08-05 07:50:45

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

Don Zickus <[email protected]> writes:

>> The length error comes from lib/inflate.c
>>
>> I think it would be interesting to look at orig_len and bytes_out.
>>
>> My hunch is that I have tripped over a tool chain bug or a weird
>> alignment issue.
>
> I thought so too, but I took vmlinuz images from people (Vivek) who had it
> boot on their systems but those images still failed on my two machines.

Odd. That might narrow things down. This is just booting with grub
so there is no relocation specific weirdness coming into play.

>> The error is the uncompressed length does not math the stored length
>> of the data before from before we compressed it. Now what is
>> fascinating is that our crc's match (as that check is performed first).
>>
>> Something is very slightly off and I don't see what it is.
>
> I printed out orig_len -> 5910532 (which matches vmlinux.bin)
> bytes_out -> 5910531

Is the last byte of vmlinux.bin 0?

One byte off certainly, fits my patter of something slightly off.

>> After looking at the state variables I would probably start looking
>> at the uncompressed data to see if it really was decompressing
>> properly. If nothing else that is the kind of process that would tend
>> to spark a clue.
>
> I am not familiar with the code, so very few sparks are flying. I'll
> still dig through though. Thanks for the tips.

Welcome.

Eric

2006-08-05 16:08:54

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

Don Zickus <[email protected]> writes:

>> The length error comes from lib/inflate.c
>>
>> I think it would be interesting to look at orig_len and bytes_out.
>>
>> My hunch is that I have tripped over a tool chain bug or a weird
>> alignment issue.
>
> I thought so too, but I took vmlinuz images from people (Vivek) who had it
> boot on their systems but those images still failed on my two machines.
>
>>
>> The error is the uncompressed length does not math the stored length
>> of the data before from before we compressed it. Now what is
>> fascinating is that our crc's match (as that check is performed first).
>>
>> Something is very slightly off and I don't see what it is.
>
> I printed out orig_len -> 5910532 (which matches vmlinux.bin)
> bytes_out -> 5910531
>
>>
>> After looking at the state variables I would probably start looking
>> at the uncompressed data to see if it really was decompressing
>> properly. If nothing else that is the kind of process that would tend
>> to spark a clue.
>
> I am not familiar with the code, so very few sparks are flying. I'll
> still dig through though. Thanks for the tips.

I guess the interesting thing to do would be to
- Recompute the crc to see if we still match.
- Possibly instrument of flush_window.

I have a strange feeling that the uncompressed data is getting corrupted
after we have flushed the window.

Eric


2006-08-07 17:42:07

by Don Zickus

[permalink] [raw]
Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

On Sat, Aug 05, 2006 at 10:07:01AM -0600, Eric W. Biederman wrote:
> Don Zickus <[email protected]> writes:
>
> >> The length error comes from lib/inflate.c
> >>
> >> I think it would be interesting to look at orig_len and bytes_out.
> >>
> >> My hunch is that I have tripped over a tool chain bug or a weird
> >> alignment issue.
> >
> > I thought so too, but I took vmlinuz images from people (Vivek) who had it
> > boot on their systems but those images still failed on my two machines.
> >
> >>
> >> The error is the uncompressed length does not math the stored length
> >> of the data before from before we compressed it. Now what is
> >> fascinating is that our crc's match (as that check is performed first).
> >>
> >> Something is very slightly off and I don't see what it is.
> >
> > I printed out orig_len -> 5910532 (which matches vmlinux.bin)
> > bytes_out -> 5910531
> >
> >>
> >> After looking at the state variables I would probably start looking
> >> at the uncompressed data to see if it really was decompressing
> >> properly. If nothing else that is the kind of process that would tend
> >> to spark a clue.
> >
> > I am not familiar with the code, so very few sparks are flying. I'll
> > still dig through though. Thanks for the tips.
>
> I guess the interesting thing to do would be to
> - Recompute the crc to see if we still match.
> - Possibly instrument of flush_window.
>
> I have a strange feeling that the uncompressed data is getting corrupted
> after we have flushed the window.

It seems to be an AMD64 vs EM64T problem. AMD chipsets work but Intel
chipsets don't.

I also blindly incremented bytes_out (as a really cheap hack), it didn't
work until I added some random putstr's below it (timing??). Then the
kernel booted.

Still looking into things.

Cheers,
Don

>
> Eric
>

2006-08-07 18:10:14

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

Don Zickus <[email protected]> writes:

> On Sat, Aug 05, 2006 at 10:07:01AM -0600, Eric W. Biederman wrote:
>> Don Zickus <[email protected]> writes:
>>
>> >> The length error comes from lib/inflate.c
>> >>
>> >> I think it would be interesting to look at orig_len and bytes_out.
>> >>
>> >> My hunch is that I have tripped over a tool chain bug or a weird
>> >> alignment issue.
>> >
>> > I thought so too, but I took vmlinuz images from people (Vivek) who had it
>> > boot on their systems but those images still failed on my two machines.
>> >
>> >>
>> >> The error is the uncompressed length does not math the stored length
>> >> of the data before from before we compressed it. Now what is
>> >> fascinating is that our crc's match (as that check is performed first).
>> >>
>> >> Something is very slightly off and I don't see what it is.
>> >
>> > I printed out orig_len -> 5910532 (which matches vmlinux.bin)
>> > bytes_out -> 5910531
>> >
>> >>
>> >> After looking at the state variables I would probably start looking
>> >> at the uncompressed data to see if it really was decompressing
>> >> properly. If nothing else that is the kind of process that would tend
>> >> to spark a clue.
>> >
>> > I am not familiar with the code, so very few sparks are flying. I'll
>> > still dig through though. Thanks for the tips.
>>
>> I guess the interesting thing to do would be to
>> - Recompute the crc to see if we still match.
>> - Possibly instrument of flush_window.
>>
>> I have a strange feeling that the uncompressed data is getting corrupted
>> after we have flushed the window.
>
> It seems to be an AMD64 vs EM64T problem. AMD chipsets work but Intel
> chipsets don't.
>
> I also blindly incremented bytes_out (as a really cheap hack), it didn't
> work until I added some random putstr's below it (timing??). Then the
> kernel booted.
>
> Still looking into things.

Odd. I wonder if I'm missing a serializing instruction somewhere,
to ensure the effects of ``self modifying code'' aren't a problem.
As I read Intels Documentation if you have a jump before you get
to the code there shouldn't be a problem.

Still that doesn't really explain bytes_out.


Eric

2006-08-07 23:54:52

by Don Zickus

[permalink] [raw]
Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

> >> >
> >> >>
> >> >> The error is the uncompressed length does not math the stored length
> >> >> of the data before from before we compressed it. Now what is
> >> >> fascinating is that our crc's match (as that check is performed first).
> >> >>
> >> >> Something is very slightly off and I don't see what it is.
> >> >
> >> > I printed out orig_len -> 5910532 (which matches vmlinux.bin)
> >> > bytes_out -> 5910531
> >> >
> >> >>
> > It seems to be an AMD64 vs EM64T problem. AMD chipsets work but Intel
> > chipsets don't.
> >
> > I also blindly incremented bytes_out (as a really cheap hack), it didn't
> > work until I added some random putstr's below it (timing??). Then the
> > kernel booted.
> >
> > Still looking into things.
>
> Odd. I wonder if I'm missing a serializing instruction somewhere,
> to ensure the effects of ``self modifying code'' aren't a problem.
> As I read Intels Documentation if you have a jump before you get
> to the code there shouldn't be a problem.
>
> Still that doesn't really explain bytes_out.
>

So I narrowed down the problem but it isn't obvious to me why this problem
exists. Basically, even though bytes_out is supposed to be initialized to
0, it becomes -1 before entering decompress_kernel(). Of course, the
fallout is in flush_window() bytes_out wounds up being one less than
outcnt and hence my original problem.

Any thoughts on how to debug where this could be getting corrupted?

Cheers,
Don

2006-08-08 05:03:38

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

Don Zickus <[email protected]> writes:

>> >> >
>> >> >>
>> >> >> The error is the uncompressed length does not math the stored length
>> >> >> of the data before from before we compressed it. Now what is
>> >> >> fascinating is that our crc's match (as that check is performed first).
>> >> >>
>> >> >> Something is very slightly off and I don't see what it is.
>> >> >
>> >> > I printed out orig_len -> 5910532 (which matches vmlinux.bin)
>> >> > bytes_out -> 5910531
>> >> >
>> >> >>
>> > It seems to be an AMD64 vs EM64T problem. AMD chipsets work but Intel
>> > chipsets don't.
>> >
>> > I also blindly incremented bytes_out (as a really cheap hack), it didn't
>> > work until I added some random putstr's below it (timing??). Then the
>> > kernel booted.
>> >
>> > Still looking into things.
>>
>> Odd. I wonder if I'm missing a serializing instruction somewhere,
>> to ensure the effects of ``self modifying code'' aren't a problem.
>> As I read Intels Documentation if you have a jump before you get
>> to the code there shouldn't be a problem.
>>
>> Still that doesn't really explain bytes_out.
>>
>
> So I narrowed down the problem but it isn't obvious to me why this problem
> exists. Basically, even though bytes_out is supposed to be initialized to
> 0, it becomes -1 before entering decompress_kernel(). Of course, the
> fallout is in flush_window() bytes_out wounds up being one less than
> outcnt and hence my original problem.
>
> Any thoughts on how to debug where this could be getting corrupted?

Looking at my build it appears bytes_out is being placed in the .bss.
A little odd since it is zero initialized but no big deal.
Could you confirm that bytes_out is being placed in the .bss section
by inspecting arch/x86_64/boot/compresssed/misc.o and
arch/x86_64/boot_compressed/vmlinux. "readelf -a $file" and then
looking up the section number and looking at the section table to see
which section it is was my technique.

If bytes_out is in the .bss for you then I suspect something is not
correctly zeroing the .bss. Or else the .bss is being stomped.

I'm not certain how rep stosb can be done wrong but some bad pointer
math could have done it.

Eric

2006-08-08 19:34:11

by Don Zickus

[permalink] [raw]
Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

On Mon, Aug 07, 2006 at 11:01:53PM -0600, Eric W. Biederman wrote:
> Don Zickus <[email protected]> writes:
>
> >> >> >
> >> >> >>
> >> >> >> The error is the uncompressed length does not math the stored length
> >> >> >> of the data before from before we compressed it. Now what is
> >> >> >> fascinating is that our crc's match (as that check is performed first).
> >> >> >>
> >> >> >> Something is very slightly off and I don't see what it is.
> >> >> >
> >> >> > I printed out orig_len -> 5910532 (which matches vmlinux.bin)
> >> >> > bytes_out -> 5910531
> >> >> >
> >> >> >>
> >> > It seems to be an AMD64 vs EM64T problem. AMD chipsets work but Intel
> >> > chipsets don't.
> >> >
> >> > I also blindly incremented bytes_out (as a really cheap hack), it didn't
> >> > work until I added some random putstr's below it (timing??). Then the
> >> > kernel booted.
> >> >
> >> > Still looking into things.
> >>
> >> Odd. I wonder if I'm missing a serializing instruction somewhere,
> >> to ensure the effects of ``self modifying code'' aren't a problem.
> >> As I read Intels Documentation if you have a jump before you get
> >> to the code there shouldn't be a problem.
> >>
> >> Still that doesn't really explain bytes_out.
> >>
> >
> > So I narrowed down the problem but it isn't obvious to me why this problem
> > exists. Basically, even though bytes_out is supposed to be initialized to
> > 0, it becomes -1 before entering decompress_kernel(). Of course, the
> > fallout is in flush_window() bytes_out wounds up being one less than
> > outcnt and hence my original problem.
> >
> > Any thoughts on how to debug where this could be getting corrupted?
>
> Looking at my build it appears bytes_out is being placed in the .bss.
> A little odd since it is zero initialized but no big deal.
> Could you confirm that bytes_out is being placed in the .bss section
> by inspecting arch/x86_64/boot/compresssed/misc.o and
> arch/x86_64/boot_compressed/vmlinux. "readelf -a $file" and then
> looking up the section number and looking at the section table to see
> which section it is was my technique.

Yes bytes_out is in the .bss for both files.

>
> If bytes_out is in the .bss for you then I suspect something is not
> correctly zeroing the .bss. Or else the .bss is being stomped.
>
> I'm not certain how rep stosb can be done wrong but some bad pointer
> math could have done it.

Even worse, from the time the .bss is cleared to the time gunzip() is
called inside decompress_kernel(), there is very little code to do some
stomping.

So I am stuck trying to debug this. This code seems very fragile. The
more debug code I add (ie putstr) the more the length is off (varies from
-32 to +1). Makes me scratch my head as to what is really going on here.

I created a really pathetic patch to get the thing to boot but even that
doesn't make sense.


diff --git a/arch/x86_64/boot/compressed/misc.c b/arch/x86_64/boot/compressed/misc.c
index 0e6c4b7..614416e 100644
--- a/arch/x86_64/boot/compressed/misc.c
+++ b/arch/x86_64/boot/compressed/misc.c
@@ -183,6 +183,7 @@ #define OLD_CL_MAGIC 0xA33F
extern unsigned char input_data[];
extern int input_len;

+static long dummy;
static long bytes_out = 0;

static void *malloc(int size);
@@ -594,6 +595,7 @@ asmlinkage void decompress_kernel(void *
if ((ulg)output >= 0xffffffffffUL)
error("Destination address too large");

+ bytes_out = 0;
makecrc();
putstr(".\nDecompressing Linux...");
gunzip();

And yes, the 'dummy' variable needs to be there.
I am trying to use gdb on vmlinux to fish for clues. But I am at a loss
right now.

Cheers,
Don

>
> Eric

2006-08-08 23:36:51

by Andi Kleen

[permalink] [raw]
Subject: Re: [CFT] ELF Relocatable x86 and x86_64 bzImages

Don Zickus <[email protected]> writes:
> >
> > Odd. I wonder if I'm missing a serializing instruction somewhere,
> > to ensure the effects of ``self modifying code'' aren't a problem.
> > As I read Intels Documentation if you have a jump before you get
> > to the code there shouldn't be a problem.
> >
> > Still that doesn't really explain bytes_out.
> >

Sounds nasty.

>
> So I narrowed down the problem but it isn't obvious to me why this problem
> exists. Basically, even though bytes_out is supposed to be initialized to
> 0, it becomes -1 before entering decompress_kernel(). Of course, the
> fallout is in flush_window() bytes_out wounds up being one less than
> outcnt and hence my original problem.
>
> Any thoughts on how to debug where this could be getting corrupted?

Use a simulator (hopefully you can reproduce it in there) like qemu
or AMD SimNow and set a watch point on the address?

Or try to find someone who has a Intel target probe to help you out.

-Andi

2006-08-09 20:04:22

by Don Zickus

[permalink] [raw]
Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

> Looking at my build it appears bytes_out is being placed in the .bss.
> A little odd since it is zero initialized but no big deal.
> Could you confirm that bytes_out is being placed in the .bss section
> by inspecting arch/x86_64/boot/compresssed/misc.o and
> arch/x86_64/boot_compressed/vmlinux. "readelf -a $file" and then
> looking up the section number and looking at the section table to see
> which section it is was my technique.
>
> If bytes_out is in the .bss for you then I suspect something is not
> correctly zeroing the .bss. Or else the .bss is being stomped.
>
> I'm not certain how rep stosb can be done wrong but some bad pointer
> math could have done it.
>
> Eric

It seems Vivek came up with a solution that works. He sent it to me this
morning. We tested a bunch of machines and things seem to work now. It
looks like it mimics the i386 behaviour now.

Cheers,
Don

Signed-off-by: Vivek Goyal <[email protected]>
---

arch/x86_64/boot/compressed/head.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff -puN arch/x86_64/boot/compressed/head.S~x86_64-bss-clearing-test
arch/x86_64/boot/compressed/head.S
---
linux-2.6.18-rc3-1M/arch/x86_64/boot/compressed/head.S~x86_64-bss-clearing-test
2006-08-09 09:43:17.000000000 -0400
+++ linux-2.6.18-rc3-1M-root/arch/x86_64/boot/compressed/head.S 2006-08-09
09:43:34.000000000 -0400
@@ -235,8 +235,8 @@ relocated:
/*
* Clear BSS
*/
- movq $_edata, %rdi
- movq $_end, %rcx
+ leaq _edata(%rbx), %rdi
+ leaq _end(%rbx), %rcx
subq %rdi, %rcx
cld
rep
_

2006-08-10 06:10:48

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

Don Zickus <[email protected]> writes:

>> Looking at my build it appears bytes_out is being placed in the .bss.
>> A little odd since it is zero initialized but no big deal.
>> Could you confirm that bytes_out is being placed in the .bss section
>> by inspecting arch/x86_64/boot/compresssed/misc.o and
>> arch/x86_64/boot_compressed/vmlinux. "readelf -a $file" and then
>> looking up the section number and looking at the section table to see
>> which section it is was my technique.
>>
>> If bytes_out is in the .bss for you then I suspect something is not
>> correctly zeroing the .bss. Or else the .bss is being stomped.
>>
>> I'm not certain how rep stosb can be done wrong but some bad pointer
>> math could have done it.
>>
>> Eric
>
> It seems Vivek came up with a solution that works. He sent it to me this
> morning. We tested a bunch of machines and things seem to work now. It
> looks like it mimics the i386 behaviour now.

Yes, this looks right. It looks like I forgot to make this change when
the logic from i386 was adopted to x86_64, ages ago.

This is exactly the place in the code I would have expected a bug
from the symptoms you were seeing.

Thanks all I will include this in my version of the patches.

Eric

2006-08-10 13:13:51

by Vivek Goyal

[permalink] [raw]
Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

On Thu, Aug 10, 2006 at 12:09:56AM -0600, Eric W. Biederman wrote:
> Don Zickus <[email protected]> writes:
>
> >> Looking at my build it appears bytes_out is being placed in the .bss.
> >> A little odd since it is zero initialized but no big deal.
> >> Could you confirm that bytes_out is being placed in the .bss section
> >> by inspecting arch/x86_64/boot/compresssed/misc.o and
> >> arch/x86_64/boot_compressed/vmlinux. "readelf -a $file" and then
> >> looking up the section number and looking at the section table to see
> >> which section it is was my technique.
> >>
> >> If bytes_out is in the .bss for you then I suspect something is not
> >> correctly zeroing the .bss. Or else the .bss is being stomped.
> >>
> >> I'm not certain how rep stosb can be done wrong but some bad pointer
> >> math could have done it.
> >>
> >> Eric
> >
> > It seems Vivek came up with a solution that works. He sent it to me this
> > morning. We tested a bunch of machines and things seem to work now. It
> > looks like it mimics the i386 behaviour now.
>
> Yes, this looks right. It looks like I forgot to make this change when
> the logic from i386 was adopted to x86_64, ages ago.
>
> This is exactly the place in the code I would have expected a bug
> from the symptoms you were seeing.
>
> Thanks all I will include this in my version of the patches.

Apart from this I think something is still off on x86_64. I have not
been able to make kdump work on x86_64. Second kernel simply hangs.
Two different machines are showing different results.

- On one machine, it seems to be stuck somewhere in decompress_kernel().
Serial console is not behaving properly even with earlyprintk(). Somehow
I feel it is some bss corruption even after my changes.

- Other machines seems to be going till start_kernel() and even after
that (No messages on the console, all serial debugging) and then
either it hangs or jumps back to BIOS.

Will look more into it.

Thanks
Vivek

2006-08-10 17:06:39

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

Vivek Goyal <[email protected]> writes:

> Apart from this I think something is still off on x86_64. I have not
> been able to make kdump work on x86_64. Second kernel simply hangs.
> Two different machines are showing different results.
>
> - On one machine, it seems to be stuck somewhere in decompress_kernel().
> Serial console is not behaving properly even with earlyprintk(). Somehow
> I feel it is some bss corruption even after my changes.
>
> - Other machines seems to be going till start_kernel() and even after
> that (No messages on the console, all serial debugging) and then
> either it hangs or jumps back to BIOS.
>
> Will look more into it.

Thanks.

I'm a little disappointed but at this point it isn't a great surprise,
the code is early yet and hasn't had much testing or attention.
I wonder if I have missed something else silly.

As for testing, can you use plain kexec to load the kernel at a
different address? I'm curious to know if it is something related
to the kexec on panic path or if it is just running at a different
location that is the problem.

I'm back on the namespace stuff this week so it will be a while before
I get back to this. It doesn't look like I have time to work the whole
patchset at once. So my current plan is to take as many pieces that
make sense by themselves and push them upstream. Until we get down to
just the relocatable kernel patches that are outstanding.

Everything was fairly well received on the round of reviews with some
minor nits that needed to be picked. So I think this is doable.

Eric

2006-08-10 18:18:54

by Vivek Goyal

[permalink] [raw]
Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

On Thu, Aug 10, 2006 at 11:05:22AM -0600, Eric W. Biederman wrote:
> Vivek Goyal <[email protected]> writes:
>
> > Apart from this I think something is still off on x86_64. I have not
> > been able to make kdump work on x86_64. Second kernel simply hangs.
> > Two different machines are showing different results.
> >
> > - On one machine, it seems to be stuck somewhere in decompress_kernel().
> > Serial console is not behaving properly even with earlyprintk(). Somehow
> > I feel it is some bss corruption even after my changes.
> >
> > - Other machines seems to be going till start_kernel() and even after
> > that (No messages on the console, all serial debugging) and then
> > either it hangs or jumps back to BIOS.
> >
> > Will look more into it.
>
> Thanks.
>
> I'm a little disappointed but at this point it isn't a great surprise,
> the code is early yet and hasn't had much testing or attention.
> I wonder if I have missed something else silly.
>
> As for testing, can you use plain kexec to load the kernel at a
> different address? I'm curious to know if it is something related
> to the kexec on panic path or if it is just running at a different
> location that is the problem.

Yes. This seems to be minor stuff. Parameter segment seems to be
getting stomped while I am doing decompression. Most probably should
be coming from extra space calculations (32K etc) being done at run
time to find out where should we shift the compressed image.

Kexec works because parameter segment is being loaded below the
compressed image and doest not get stomped over. :-)

I just reserved memory at non 2MB aligned location 65MB@15MB so that
kernel is loaded at 16MB and other smaller segments below the compressed
image, then I can successfully booted into the kdump kernel.

So basically kexec on panic path seems to be clean except stomping issue.
May be bzImage program header should reflect right "MemSize" which
takes into account extra memory space calculations.

Thanks
Vivek

2006-08-10 20:11:11

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

Vivek Goyal <[email protected]> writes:

> On Thu, Aug 10, 2006 at 11:05:22AM -0600, Eric W. Biederman wrote:
>> Vivek Goyal <[email protected]> writes:
>>
>> > Apart from this I think something is still off on x86_64. I have not
>> > been able to make kdump work on x86_64. Second kernel simply hangs.
>> > Two different machines are showing different results.
>> >
>> > - On one machine, it seems to be stuck somewhere in decompress_kernel().
>> > Serial console is not behaving properly even with earlyprintk(). Somehow
>> > I feel it is some bss corruption even after my changes.
>> >
>> > - Other machines seems to be going till start_kernel() and even after
>> > that (No messages on the console, all serial debugging) and then
>> > either it hangs or jumps back to BIOS.
>> >
>> > Will look more into it.
>>
>> Thanks.
>>
>> I'm a little disappointed but at this point it isn't a great surprise,
>> the code is early yet and hasn't had much testing or attention.
>> I wonder if I have missed something else silly.
>>
>> As for testing, can you use plain kexec to load the kernel at a
>> different address? I'm curious to know if it is something related
>> to the kexec on panic path or if it is just running at a different
>> location that is the problem.
>
> Yes. This seems to be minor stuff. Parameter segment seems to be
> getting stomped while I am doing decompression. Most probably should
> be coming from extra space calculations (32K etc) being done at run
> time to find out where should we shift the compressed image.
>
> Kexec works because parameter segment is being loaded below the
> compressed image and doest not get stomped over. :-)

Ah. That makes sense.

> I just reserved memory at non 2MB aligned location 65MB@15MB so that
> kernel is loaded at 16MB and other smaller segments below the compressed
> image, then I can successfully booted into the kdump kernel.

:)

> So basically kexec on panic path seems to be clean except stomping issue.
> May be bzImage program header should reflect right "MemSize" which
> takes into account extra memory space calculations.

Yes. That sounds like the right thing to do.

I remember trying to compute a good memsize when I created the bzImage
header but it is completely possible I missed some part of the
calculation or assumed that the kernels .bss section would always be
larger than what I needed for decompression.

Eric

2006-08-11 21:22:58

by Don Zickus

[permalink] [raw]
Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

> >>
> >> I'm a little disappointed but at this point it isn't a great surprise,
> >> the code is early yet and hasn't had much testing or attention.
> >> I wonder if I have missed something else silly.
> >>
> >> As for testing, can you use plain kexec to load the kernel at a
> >> different address? I'm curious to know if it is something related
> >> to the kexec on panic path or if it is just running at a different
> >> location that is the problem.
> >

I think I have found the 'something silly'. Here is a patch that allows
our Dell em64t boxes to boot. This change matches the original code. The
main difference that caused the problems was the setting of _PAGE_NX bit.
This caused issues in early_io_remap().

Thanks to Larry Woodman for debugging this.

Cheers,
Don


Signed-off-by: Don Zickus <[email protected]>

--- linux-2.6.17.noarch/arch/x86_64/mm/init.c.orig 2006-08-11 12:35:58.000000000 -0400
+++ linux-2.6.17.noarch/arch/x86_64/mm/init.c 2006-08-11 13:14:20.000000000 -0400
@@ -196,7 +196,7 @@
vaddr += addr & ~PMD_MASK;
addr &= PMD_MASK;
for (i = 0; i < pmds; i++, addr += PMD_SIZE)
- set_pmd(pmd + i,__pmd(addr | __PAGE_KERNEL_LARGE));
+ set_pmd(pmd + i,__pmd(addr | _KERNPG_TABLE | _PAGE_PSE));
__flush_tlb();
return (void *)vaddr;
next:

2006-08-12 07:21:33

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

Don Zickus <[email protected]> writes:

>> >>
>> >> I'm a little disappointed but at this point it isn't a great surprise,
>> >> the code is early yet and hasn't had much testing or attention.
>> >> I wonder if I have missed something else silly.
>> >>
>> >> As for testing, can you use plain kexec to load the kernel at a
>> >> different address? I'm curious to know if it is something related
>> >> to the kexec on panic path or if it is just running at a different
>> >> location that is the problem.
>> >
>
> I think I have found the 'something silly'. Here is a patch that allows
> our Dell em64t boxes to boot. This change matches the original code. The
> main difference that caused the problems was the setting of _PAGE_NX bit.
> This caused issues in early_io_remap().
>
> Thanks to Larry Woodman for debugging this.

This looks like a different one but looks fairly sane.

Do you know what code had problems having _PAGE_NX set.
What are we doing with early_ioremap the requires execute
permissions. It doesn't sound right that we would need
this.

Eric

2006-08-12 15:23:33

by Don Zickus

[permalink] [raw]
Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

On Sat, Aug 12, 2006 at 01:20:29AM -0600, Eric W. Biederman wrote:
> Don Zickus <[email protected]> writes:
>
> >> >>
> >> >> I'm a little disappointed but at this point it isn't a great surprise,
> >> >> the code is early yet and hasn't had much testing or attention.
> >> >> I wonder if I have missed something else silly.
> >> >>
> >> >> As for testing, can you use plain kexec to load the kernel at a
> >> >> different address? I'm curious to know if it is something related
> >> >> to the kexec on panic path or if it is just running at a different
> >> >> location that is the problem.
> >> >
> >
> > I think I have found the 'something silly'. Here is a patch that allows
> > our Dell em64t boxes to boot. This change matches the original code. The
> > main difference that caused the problems was the setting of _PAGE_NX bit.
> > This caused issues in early_io_remap().
> >
> > Thanks to Larry Woodman for debugging this.
>
> This looks like a different one but looks fairly sane.
>
> Do you know what code had problems having _PAGE_NX set.
> What are we doing with early_ioremap the requires execute
> permissions. It doesn't sound right that we would need
> this.

This fix is only needed for a subset of our em64t boxes, so it could be
just a chipset problem. Supposedly, if I remember the conversation
correctly, when the kernel first boots it reserves about 40MB and about 20
pmds automatically. After decompression, early_io_remap tries to setup
all the memory. The conflict arose when early_io_remap tried to reuse one
of those pmds. This caused the system to crash and reboot.

I'll try to get more info Monday on the specifics.

Cheers,
Don

>
> Eric

2006-08-12 19:42:20

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

Don Zickus <[email protected]> writes:

>> This looks like a different one but looks fairly sane.
>>
>> Do you know what code had problems having _PAGE_NX set.
>> What are we doing with early_ioremap the requires execute
>> permissions. It doesn't sound right that we would need
>> this.
>
> This fix is only needed for a subset of our em64t boxes, so it could be
> just a chipset problem. Supposedly, if I remember the conversation
> correctly, when the kernel first boots it reserves about 40MB and about 20
> pmds automatically. After decompression, early_io_remap tries to setup
> all the memory. The conflict arose when early_io_remap tried to reuse one
> of those pmds. This caused the system to crash and reboot.
>
> I'll try to get more info Monday on the specifics.

Thanks.


Eric

2006-08-13 20:06:51

by Andi Kleen

[permalink] [raw]
Subject: Re: [CFT] ELF Relocatable x86 and x86_64 bzImages

[email protected] (Eric W. Biederman) writes:
>
> Do you know what code had problems having _PAGE_NX set.
> What are we doing with early_ioremap the requires execute
> permissions. It doesn't sound right that we would need
> this.

The early EM64T CPUs didn't support NX and would GPF when
they hit the bit. That is why you always need to mask
with __supported_pte_mask when using _PAGE_NX.

-Andi

2006-08-13 21:45:12

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [CFT] ELF Relocatable x86 and x86_64 bzImages

Andi Kleen <[email protected]> writes:

> [email protected] (Eric W. Biederman) writes:
>>
>> Do you know what code had problems having _PAGE_NX set.
>> What are we doing with early_ioremap the requires execute
>> permissions. It doesn't sound right that we would need
>> this.
>
> The early EM64T CPUs didn't support NX and would GPF when
> they hit the bit. That is why you always need to mask
> with __supported_pte_mask when using _PAGE_NX.

Ok. Thanks. That explains that it.

The NX bit itself causes the GPF not someone trying to execute
data on a page.

Eric

2006-08-14 16:52:39

by Vivek Goyal

[permalink] [raw]
Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

On Thu, Aug 10, 2006 at 02:09:58PM -0600, Eric W. Biederman wrote:
> > I just reserved memory at non 2MB aligned location 65MB@15MB so that
> > kernel is loaded at 16MB and other smaller segments below the compressed
> > image, then I can successfully booted into the kdump kernel.
>
> :)
>
> > So basically kexec on panic path seems to be clean except stomping issue.
> > May be bzImage program header should reflect right "MemSize" which
> > takes into account extra memory space calculations.
>
> Yes. That sounds like the right thing to do.
>
> I remember trying to compute a good memsize when I created the bzImage
> header but it is completely possible I missed some part of the
> calculation or assumed that the kernels .bss section would always be
> larger than what I needed for decompression.
>

Hi Eric,

Please find a patch attached to fix the issue. I have added few things
which might be consuming memory beyond "MemSize" as described in
misc.c file.

Regarding decompressor code using kernel .bss section area, I think
that might not be possible as kernel .bss is part of raw binary
being generated. (vmlinux.bin). So effectively it becomes part of
input data and output compressed data (vmlinux.bin.gz).

I think generally objcopy does not output bss section in the raw
binary but in kernel case .bss is somewhere in the middle of the final
image and not at the end, and that could be the reason that objcopy
is oututting bss also in raw binary image.

In case of second objcopy while we are generating vmlinux.bin from
compressed kernel vmlinux (vmlinux containing decompressor code), bss
section does not seem to be part of outputted raw binary. That's the
reason I had to pass another argument to tools/build.c to determine
exact memory requirements of compressed vmlinux.

So the decompressor can not use kernel's .bss for its execution. So
we should be taking decompressor's memory requirements into account
while calculating "MemSize", irrespective of kernel's .bss size? Am
I missing something?

If this seems reasonable, then i can roll out similar patch for i386
too.

Thanks & Regards
Vivek


Attachments:
(No filename) (2.08 kB)
x86_64-bzImage-mem-size-adjustment-fix.patch (9.67 kB)
Download all attachments

2006-08-14 17:05:15

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

Vivek Goyal wrote:
> On Thu, Aug 10, 2006 at 02:09:58PM -0600, Eric W. Biederman wrote:
>>> I just reserved memory at non 2MB aligned location 65MB@15MB so that
>>> kernel is loaded at 16MB and other smaller segments below the compressed
>>> image, then I can successfully booted into the kdump kernel.
>> :)
>>
>>> So basically kexec on panic path seems to be clean except stomping issue.
>>> May be bzImage program header should reflect right "MemSize" which
>>> takes into account extra memory space calculations.
>> Yes. That sounds like the right thing to do.
>>
>> I remember trying to compute a good memsize when I created the bzImage
>> header but it is completely possible I missed some part of the
>> calculation or assumed that the kernels .bss section would always be
>> larger than what I needed for decompression.
>>

Could someone please describe the intended semantics of this MemSize
header, *and* its intended usage?

-hpa

2006-08-14 18:13:11

by Vivek Goyal

[permalink] [raw]
Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

On Mon, Aug 14, 2006 at 10:04:29AM -0700, H. Peter Anvin wrote:
> Vivek Goyal wrote:
> >On Thu, Aug 10, 2006 at 02:09:58PM -0600, Eric W. Biederman wrote:
> >>>I just reserved memory at non 2MB aligned location 65MB@15MB so that
> >>>kernel is loaded at 16MB and other smaller segments below the compressed
> >>>image, then I can successfully booted into the kdump kernel.
> >>:)
> >>
> >>>So basically kexec on panic path seems to be clean except stomping issue.
> >>>May be bzImage program header should reflect right "MemSize" which
> >>>takes into account extra memory space calculations.
> >>Yes. That sounds like the right thing to do.
> >>
> >>I remember trying to compute a good memsize when I created the bzImage
> >>header but it is completely possible I missed some part of the
> >>calculation or assumed that the kernels .bss section would always be
> >>larger than what I needed for decompression.
> >>
>
> Could someone please describe the intended semantics of this MemSize
> header, *and* its intended usage?
>

Now and ELF header(attached to bzImage) is being used to describe
the kernel executable. One program header of PT_LOAD type is being
created. The "p_filesz" field of program header is basically
describing the vmlinux file size and "p_memsz" is giving how
much memory will be consumed by kernel image at load time.

Ideally "p_memsz" should be "p_memsz" summation of all the program
headers of vmlinux file but I guess in this case we are stretching the
ELF specification a little bit and also taking into the account the
additional memory which will be used by decompressor and decompression
logic by the time execution is transferred to the actual kernel.

The intended usage is currently kexec/kdump. While pre-loading a
kernel in memory, kexec creates multiple segments and puts various
data into it. (like kernel image, initrd, parameters etc.) Kexec
needs to know how much memory is being used by the loaded kernel so
that it can place another segment after kernel at a safe distance.
By reading "p_memsz" from ELF header, kexec can determine it.

Currently problem we are facing in kdump case is that parameter
segment (command line and other bootloader parameters) is being
placed immediately after kernel which gets stomped over by decompressor
code and kernel boot fails.

Normal boot never faces this problem as parameter segment is always
loaded below where kernel image is loaded.

Thanks
Vivek

2006-08-14 19:32:56

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

Vivek Goyal wrote:
> On Mon, Aug 14, 2006 at 10:04:29AM -0700, H. Peter Anvin wrote:
>> Vivek Goyal wrote:
>>> On Thu, Aug 10, 2006 at 02:09:58PM -0600, Eric W. Biederman wrote:
>>>>> I just reserved memory at non 2MB aligned location 65MB@15MB so that
>>>>> kernel is loaded at 16MB and other smaller segments below the compressed
>>>>> image, then I can successfully booted into the kdump kernel.
>>>> :)
>>>>
>>>>> So basically kexec on panic path seems to be clean except stomping issue.
>>>>> May be bzImage program header should reflect right "MemSize" which
>>>>> takes into account extra memory space calculations.
>>>> Yes. That sounds like the right thing to do.
>>>>
>>>> I remember trying to compute a good memsize when I created the bzImage
>>>> header but it is completely possible I missed some part of the
>>>> calculation or assumed that the kernels .bss section would always be
>>>> larger than what I needed for decompression.
>>>>
>> Could someone please describe the intended semantics of this MemSize
>> header, *and* its intended usage?
>>
>
> Now and ELF header(attached to bzImage) is being used to describe
> the kernel executable. One program header of PT_LOAD type is being
> created. The "p_filesz" field of program header is basically
> describing the vmlinux file size and "p_memsz" is giving how
> much memory will be consumed by kernel image at load time.
>
> Ideally "p_memsz" should be "p_memsz" summation of all the program
> headers of vmlinux file but I guess in this case we are stretching the
> ELF specification a little bit and also taking into the account the
> additional memory which will be used by decompressor and decompression
> logic by the time execution is transferred to the actual kernel.
>

What about once the kernel is booted?

-hpa

2006-08-14 19:43:49

by Vivek Goyal

[permalink] [raw]
Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

On Mon, Aug 14, 2006 at 12:32:32PM -0700, H. Peter Anvin wrote:
> Vivek Goyal wrote:
> >On Mon, Aug 14, 2006 at 10:04:29AM -0700, H. Peter Anvin wrote:
> >>Vivek Goyal wrote:
> >>>On Thu, Aug 10, 2006 at 02:09:58PM -0600, Eric W. Biederman wrote:
> >>>>>I just reserved memory at non 2MB aligned location 65MB@15MB so that
> >>>>>kernel is loaded at 16MB and other smaller segments below the
> >>>>>compressed
> >>>>>image, then I can successfully booted into the kdump kernel.
> >>>>:)
> >>>>
> >>>>>So basically kexec on panic path seems to be clean except stomping
> >>>>>issue.
> >>>>>May be bzImage program header should reflect right "MemSize" which
> >>>>>takes into account extra memory space calculations.
> >>>>Yes. That sounds like the right thing to do.
> >>>>
> >>>>I remember trying to compute a good memsize when I created the bzImage
> >>>>header but it is completely possible I missed some part of the
> >>>>calculation or assumed that the kernels .bss section would always be
> >>>>larger than what I needed for decompression.
> >>>>
> >>Could someone please describe the intended semantics of this MemSize
> >>header, *and* its intended usage?
> >>
> >
> >Now and ELF header(attached to bzImage) is being used to describe
> >the kernel executable. One program header of PT_LOAD type is being
> >created. The "p_filesz" field of program header is basically
> >describing the vmlinux file size and "p_memsz" is giving how
> >much memory will be consumed by kernel image at load time.
> >
> >Ideally "p_memsz" should be "p_memsz" summation of all the program
> >headers of vmlinux file but I guess in this case we are stretching the
> >ELF specification a little bit and also taking into the account the
> >additional memory which will be used by decompressor and decompression
> >logic by the time execution is transferred to the actual kernel.
> >
>
> What about once the kernel is booted?
>

Sorry did not understand the question. Few more lines will help.

Thanks
Vivek

2006-08-14 19:46:07

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

Vivek Goyal wrote:
>>>
>> What about once the kernel is booted?
>
> Sorry did not understand the question. Few more lines will help.
>

Is this field intended to protect any kind of memory during the early
boot phase of the kernel proper, or only the decompressor?

-hpa

2006-08-14 19:57:33

by Vivek Goyal

[permalink] [raw]
Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

On Mon, Aug 14, 2006 at 12:45:31PM -0700, H. Peter Anvin wrote:
> Vivek Goyal wrote:
> >>>
> >>What about once the kernel is booted?
> >
> >Sorry did not understand the question. Few more lines will help.
> >
>
> Is this field intended to protect any kind of memory during the early
> boot phase of the kernel proper, or only the decompressor?
>

I think it should protect against any dynamic memory usage during early
boot phase too till we reach a point where kernel is aware of BIOS provided
memory maps and kernel memory area usage can be controlled with the help
of BIOS provided/User defined memory maps.

In i386 implementation Eric is alredy taking into account the memory
used by bootmem bitmap and initial page tables. I have not looked into
x86_64 kernel code whether do I need to make such adjustments. It worked
for me so did not bother much. I will look into it.

Thanks
Vivek

2006-08-14 20:01:15

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

"H. Peter Anvin" <[email protected]> writes:

> Vivek Goyal wrote:
>> On Thu, Aug 10, 2006 at 02:09:58PM -0600, Eric W. Biederman wrote:
>>>> I just reserved memory at non 2MB aligned location 65MB@15MB so that
>>>> kernel is loaded at 16MB and other smaller segments below the compressed
>>>> image, then I can successfully booted into the kdump kernel.
>>> :)
>>>
>>>> So basically kexec on panic path seems to be clean except stomping issue.
>>>> May be bzImage program header should reflect right "MemSize" which
>>>> takes into account extra memory space calculations.
>>> Yes. That sounds like the right thing to do.
>>>
>>> I remember trying to compute a good memsize when I created the bzImage
>>> header but it is completely possible I missed some part of the
>>> calculation or assumed that the kernels .bss section would always be
>>> larger than what I needed for decompression.
>>>
>
> Could someone please describe the intended semantics of this MemSize header,
> *and* its intended usage?

I think Vivek did a decent job. But here is my take.

Currently the ELF header we prepend to the linux kernel have
exactly one segment.

A segment has several file offset, fields alignment, type, physical
address, virtual address, file size, and memory size.

The file size parameter describes how much data to pull off of the
disk. The memory size describes how much room the segment will
consume in memory. The difference between file size and memory size
is treated as bss data. Memory size must always be bigger than
file size.

In the case of the kernel there is a certain amount of memory that
the kernel uses before it starts reserving things and using the
memory map. The memory that the kernel unconditionally uses should
be described with the memsize parameter.

An accurate description allows your initrd and your parameter segment
to be placed right up next to your kernel without worry about them
being stomped, we already do this on a couple of other architectures,
or it allows you to detect that there is not enough room to hold your
kernel, initrd and parameters.

So since we now have the possibility of describing this accurately I
would like to. Although the traditional x86 work around of pushing
everything up as far in memory as we can and the kernel can address
is potentially still an option.

For the kexec on panic case we have a very small reserved chunk of
memory (16MB I think is typical right now). The smaller that we can
successfully run out of the better. Which makes it easy to hit these
kinds of things if we don't have an accurate description of the
kernel.

Eric

2006-08-14 20:11:33

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

"H. Peter Anvin" <[email protected]> writes:

> Vivek Goyal wrote:
>>>>
>>> What about once the kernel is booted?
>> Sorry did not understand the question. Few more lines will help.
>>
>
> Is this field intended to protect any kind of memory during the early boot phase
> of the kernel proper, or only the decompressor?

Yes, the field should account for memory usage until the kernel starts
doing the accounting at run time.

I'm actually surprised that taking into account the .bss was not enough to
cover up anything the decompressor was doing. Usually the kernel's .bss
is more than the extra 32K or so that the decompressor uses.

Eric

2006-08-14 20:59:58

by Vivek Goyal

[permalink] [raw]
Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

On Mon, Aug 14, 2006 at 02:10:51PM -0600, Eric W. Biederman wrote:
> "H. Peter Anvin" <[email protected]> writes:
>
> > Vivek Goyal wrote:
> >>>>
> >>> What about once the kernel is booted?
> >> Sorry did not understand the question. Few more lines will help.
> >>
> >
> > Is this field intended to protect any kind of memory during the early boot phase
> > of the kernel proper, or only the decompressor?
>
> Yes, the field should account for memory usage until the kernel starts
> doing the accounting at run time.
>
> I'm actually surprised that taking into account the .bss was not enough to
> cover up anything the decompressor was doing. Usually the kernel's .bss
> is more than the extra 32K or so that the decompressor uses.
>

I think .bss section size will act as a buffer for decompressor only if
.bss is not part of compressed data hence decompressor does not have to
move beyond bss and it can run very well from kernel bss space.

But somehow on my machine, it looks like that bss is very much part
of raw binary image hence part of compressed data (vmlinux.bin.gz).
memsz exported in bzImage is same as size of raw output binary.

Probably that's the reason that we are stomping other segments in my
case and if my understanding is right then it should happen irrespective
of kernel bss size.

Here I am pasting how kernel vmlinux file program headers look like.
.bss is mapped by first program header along with .text.

Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000200000 0xffffffff80000000 0x0000000000000000
0x0000000000546bf8 0x00000000005dbc28 RWE 200000
LOAD 0x00000000007dc000 0xffffffff805dc000 0x00000000005dc000
0x000000000000ede0 0x000000000000ede0 RW 200000
LOAD 0x0000000000800000 0xffffffffff600000 0x00000000005eb000
0x0000000000000c08 0x0000000000000c08 RWE 200000
LOAD 0x00000000009ec000 0xffffffff805ec000 0x00000000005ec000
0x0000000000044004 0x0000000000044004 RWE 200000
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RWE 8

Section to Segment mapping:
Segment Sections...
00 .text __ex_table .rodata .pci_fixup __ksymtab __ksymtab_gpl
__ksymtab_unused __ksymtab_gpl_future __ksymtab_strings __param
.eh_frame .data .bss
01 .data.cacheline_aligned .data.read_mostly
02 .vsyscall_0 .xtime_lock .vxtime .wall_jiffies .sys_tz
.sysctl_vsyscall .xtime .jiffies .vsyscall_1 .vsyscall_2 .vsyscall_3
03 .data.init_task .data.page_aligned .smp_altinstructions
.smp_locks .smp_altinstr_replacement .init.text .init.data .init.setup
.initcall.init .con_initcall.init .altinstructions .altinstr_replacement
.exit.text .init.ramfs .data.percpu .data_nosave
04

Thanks
Vivek

2006-08-14 21:17:04

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

Vivek Goyal <[email protected]> writes:

> On Mon, Aug 14, 2006 at 02:10:51PM -0600, Eric W. Biederman wrote:
>> "H. Peter Anvin" <[email protected]> writes:
>>
>> > Vivek Goyal wrote:
>> >>>>
>> >>> What about once the kernel is booted?
>> >> Sorry did not understand the question. Few more lines will help.
>> >>
>> >
>> > Is this field intended to protect any kind of memory during the early boot
> phase
>> > of the kernel proper, or only the decompressor?
>>
>> Yes, the field should account for memory usage until the kernel starts
>> doing the accounting at run time.
>>
>> I'm actually surprised that taking into account the .bss was not enough to
>> cover up anything the decompressor was doing. Usually the kernel's .bss
>> is more than the extra 32K or so that the decompressor uses.
>>
>
> I think .bss section size will act as a buffer for decompressor only if
> .bss is not part of compressed data hence decompressor does not have to
> move beyond bss and it can run very well from kernel bss space.

Agreed.

> But somehow on my machine, it looks like that bss is very much part
> of raw binary image hence part of compressed data (vmlinux.bin.gz).
> memsz exported in bzImage is same as size of raw output binary.
>
> Probably that's the reason that we are stomping other segments in my
> case and if my understanding is right then it should happen irrespective
> of kernel bss size.
>
> Here I am pasting how kernel vmlinux file program headers look like.
> .bss is mapped by first program header along with .text.

Ok. So somehow we have done the insane thing of putting .bss in the middle of
the executable. It might even be sane if it is just the .init sections we put
after it, but no we are putting .data after the .bss.

Well that easily explains why we had a problem.

Getting the proper accounting in for handling this case is probably reasonable.
It probably also makes sense for someone to take a good hard look at the crazy
ordering of sections on x86_64.

Eric