by Eric W. Biederman

[permalink] [raw]

Subject: Re: [CFT] ELF Relocatable x86 and x86_64 bzImages

Andi Kleen <[email protected]> writes:

> [email protected] (Eric W. Biederman) writes:
>>
>> Do you know what code had problems having _PAGE_NX set.
>> What are we doing with early_ioremap the requires execute
>> permissions. It doesn't sound right that we would need
>> this.
>
> The early EM64T CPUs didn't support NX and would GPF when
> they hit the bit. That is why you always need to mask
> with __supported_pte_mask when using _PAGE_NX.

Ok. Thanks. That explains that it.

The NX bit itself causes the GPF not someone trying to execute
data on a page.

Eric

2006-08-14 16:52:39

On Mon, Aug 14, 2006 at 02:10:51PM -0600, Eric W. Biederman wrote:
> "H. Peter Anvin" <[email protected]> writes:
>
> > Vivek Goyal wrote:
> >>>>
> >>> What about once the kernel is booted?
> >> Sorry did not understand the question. Few more lines will help.
> >>
> >
> > Is this field intended to protect any kind of memory during the early boot phase
> > of the kernel proper, or only the decompressor?
>
> Yes, the field should account for memory usage until the kernel starts
> doing the accounting at run time.
>
> I'm actually surprised that taking into account the .bss was not enough to
> cover up anything the decompressor was doing. Usually the kernel's .bss
> is more than the extra 32K or so that the decompressor uses.
>

I think .bss section size will act as a buffer for decompressor only if
.bss is not part of compressed data hence decompressor does not have to
move beyond bss and it can run very well from kernel bss space.

But somehow on my machine, it looks like that bss is very much part
of raw binary image hence part of compressed data (vmlinux.bin.gz).
memsz exported in bzImage is same as size of raw output binary.

Probably that's the reason that we are stomping other segments in my
case and if my understanding is right then it should happen irrespective
of kernel bss size.

Here I am pasting how kernel vmlinux file program headers look like.
.bss is mapped by first program header along with .text.

Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000200000 0xffffffff80000000 0x0000000000000000
0x0000000000546bf8 0x00000000005dbc28 RWE 200000
LOAD 0x00000000007dc000 0xffffffff805dc000 0x00000000005dc000
0x000000000000ede0 0x000000000000ede0 RW 200000
LOAD 0x0000000000800000 0xffffffffff600000 0x00000000005eb000
0x0000000000000c08 0x0000000000000c08 RWE 200000
LOAD 0x00000000009ec000 0xffffffff805ec000 0x00000000005ec000
0x0000000000044004 0x0000000000044004 RWE 200000
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RWE 8

Section to Segment mapping:
Segment Sections...
00 .text __ex_table .rodata .pci_fixup __ksymtab __ksymtab_gpl
__ksymtab_unused __ksymtab_gpl_future __ksymtab_strings __param
.eh_frame .data .bss
01 .data.cacheline_aligned .data.read_mostly
02 .vsyscall_0 .xtime_lock .vxtime .wall_jiffies .sys_tz
.sysctl_vsyscall .xtime .jiffies .vsyscall_1 .vsyscall_2 .vsyscall_3
03 .data.init_task .data.page_aligned .smp_altinstructions
.smp_locks .smp_altinstr_replacement .init.text .init.data .init.setup
.initcall.init .con_initcall.init .altinstructions .altinstr_replacement
.exit.text .init.ramfs .data.percpu .data_nosave
04

Thanks
Vivek

2006-08-14 21:17:04

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

Vivek Goyal <[email protected]> writes:

> On Mon, Aug 14, 2006 at 02:10:51PM -0600, Eric W. Biederman wrote:
>> "H. Peter Anvin" <[email protected]> writes:
>>
>> > Vivek Goyal wrote:
>> >>>>
>> >>> What about once the kernel is booted?
>> >> Sorry did not understand the question. Few more lines will help.
>> >>
>> >
>> > Is this field intended to protect any kind of memory during the early boot
> phase
>> > of the kernel proper, or only the decompressor?
>>
>> Yes, the field should account for memory usage until the kernel starts
>> doing the accounting at run time.
>>
>> I'm actually surprised that taking into account the .bss was not enough to
>> cover up anything the decompressor was doing. Usually the kernel's .bss
>> is more than the extra 32K or so that the decompressor uses.
>>
>
> I think .bss section size will act as a buffer for decompressor only if
> .bss is not part of compressed data hence decompressor does not have to
> move beyond bss and it can run very well from kernel bss space.

Agreed.

> But somehow on my machine, it looks like that bss is very much part
> of raw binary image hence part of compressed data (vmlinux.bin.gz).
> memsz exported in bzImage is same as size of raw output binary.
>
> Probably that's the reason that we are stomping other segments in my
> case and if my understanding is right then it should happen irrespective
> of kernel bss size.
>
> Here I am pasting how kernel vmlinux file program headers look like.
> .bss is mapped by first program header along with .text.

Ok. So somehow we have done the insane thing of putting .bss in the middle of
the executable. It might even be sane if it is just the .init sections we put
after it, but no we are putting .data after the .bss.

Well that easily explains why we had a problem.

Getting the proper accounting in for handling this case is probably reasonable.
It probably also makes sense for someone to take a good hard look at the crazy
ordering of sections on x86_64.

Eric