2012-10-01 15:40:01

by Dave Martin

[permalink] [raw]
Subject: Re: [PATCH] [ARM] Use AT() in the linker script to create correct program headers

On Sun, Sep 30, 2012 at 05:21:16PM -0600, Jason Gunthorpe wrote:
> The standard linux asm-generic/vmlinux.lds.h already supports this,
> and it seems other architectures do as well.
>
> The goal is to create an ELF file that has correct program headers. We
> want to see the VirtAddr be the runtime address of the kernel with the
> MMU turned on, and PhysAddr be the physical load address for the section
> with no MMU.
>
> This allows ELF based boot loaders to properly load vmlinux:
>
> $ readelf -l vmlinux
> Entry point 0x8000
> Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
> LOAD 0x008000 0xc0008000 0x00008000 0x372244 0x3a4310 RWE 0x8000

Not related directly to your patch, but I wonder why we don't we see
separate r-x and rw- segments?

Perhaps the linker script needs tidyup, or there are wrong attributes on
some sections somewhere.

> Signed-off-by: Jason Gunthorpe <[email protected]>
> ---
> arch/arm/include/asm/memory.h | 2 +-
> arch/arm/kernel/vmlinux.lds.S | 47 ++++++++++++++++++++++++----------------
> 2 files changed, 29 insertions(+), 20 deletions(-)
>
> diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
> index 5f6ddcc..4ce5b6d 100644
> --- a/arch/arm/include/asm/memory.h
> +++ b/arch/arm/include/asm/memory.h
> @@ -283,7 +283,7 @@ static inline __deprecated void *bus_to_virt(unsigned long x)
> #define arch_is_coherent() 0
> #endif
>
> -#endif
> +#endif /* __ASSEMBLY__ */
>
> #include <asm-generic/memory_model.h>
>
> diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
> index 36ff15b..07942b6 100644
> --- a/arch/arm/kernel/vmlinux.lds.S
> +++ b/arch/arm/kernel/vmlinux.lds.S
> @@ -3,6 +3,13 @@
> * Written by Martin Mares <[email protected]>
> */
>
> +/* If we have a known, fixed physical load address then set LOAD_OFFSET
> + and generate an ELF that has the physical load address in the program
> + headers. */
> +#ifndef CONFIG_ARM_PATCH_PHYS_VIRT
> +#define LOAD_OFFSET (PAGE_OFFSET - PHYS_OFFSET)
> +#endif
> +

What happens if CONFIG_ARM_PATCH_PHYS_VIRT=y? This will be used
increasingly, especially for multiplatform kernels.

Currently, it looks like we will just fail to link the kernel with
ARM_PATCH_PHYS_VIRT... or have I overlooked something?

If the kernel is intended to be loadable at a physical address which is
not statically known, no ELF loader that does not ignore the ELF phdr
address fields can work. There is no correct way to represent this
situation using an ET_EXEC image, unless we specify explicitly that
the addresses must be ignored as part of our boot protocol. (We ran into
the same situation with uImages, which bake the load address into the
image. The eventual answer was to fix U-Boot.)


Really, LOAD_OFFSET (in your terminology) is something that has to be
inferred by the loader in the general case, and in that case the ELF
paddr might as well be the same as the vaddr.


> #include <asm-generic/vmlinux.lds.h>
> #include <asm/cache.h>
> #include <asm/thread_info.h>
> @@ -39,7 +46,7 @@
> #endif
>
> OUTPUT_ARCH(arm)
> -ENTRY(stext)
> +ENTRY(phys_start)

This is debatable. In fact, stext has the property that its virtual
(runtime) and load addresses are the same. To represent this correctly
in the linker scripts, the position-independent head.S code should be
split out into a separate section to which LOAD_OFFSET is not applied.

This may cause confusion elsewhere though, since _text probably needs
to coincide with the virtual image of stext.


Setting vaddr and paddr to PAGE_OFFSET (as we do now) and having the
loader choose the appropriate board-specific place to load the kernel
image makes this irrelevant, if I've understood the situation correctly.

Cheers
---Dave

>
> #ifndef __ARMEB__
> jiffies = jiffies_64;
> @@ -86,11 +93,13 @@ SECTIONS
> #else
> . = PAGE_OFFSET + TEXT_OFFSET;
> #endif
> - .head.text : {
> + .head.text : AT(ADDR(.head.text) - LOAD_OFFSET) {
> _text = .;
> + phys_start = . - LOAD_OFFSET;
> HEAD_TEXT
> }
> - .text : { /* Real text segment */
> + /* Real text segment */
> + .text : AT(ADDR(.text) - LOAD_OFFSET) {
> _stext = .; /* Text and read-only data */
> __exception_text_start = .;
> *(.exception.text)
> @@ -119,12 +128,12 @@ SECTIONS
> * Stack unwinding tables
> */
> . = ALIGN(8);
> - .ARM.unwind_idx : {
> + .ARM.unwind_idx : AT(ADDR(.ARM.unwind_idx) - LOAD_OFFSET) {
> __start_unwind_idx = .;
> *(.ARM.exidx*)
> __stop_unwind_idx = .;
> }
> - .ARM.unwind_tab : {
> + .ARM.unwind_tab : AT(ADDR(.ARM.unwind_tab) - LOAD_OFFSET) {
> __start_unwind_tab = .;
> *(.ARM.extab*)
> __stop_unwind_tab = .;
> @@ -139,35 +148,35 @@ SECTIONS
> #endif
>
> INIT_TEXT_SECTION(8)
> - .exit.text : {
> + .exit.text : AT(ADDR(.exit.text) - LOAD_OFFSET) {
> ARM_EXIT_KEEP(EXIT_TEXT)
> }
> - .init.proc.info : {
> + .init.proc.info : AT(ADDR(.init.proc.info) - LOAD_OFFSET) {
> ARM_CPU_DISCARD(PROC_INFO)
> }
> - .init.arch.info : {
> + .init.arch.info : AT(ADDR(.init.arch.info) - LOAD_OFFSET) {
> __arch_info_begin = .;
> *(.arch.info.init)
> __arch_info_end = .;
> }
> - .init.tagtable : {
> + .init.tagtable : AT(ADDR(.init.tagtable) - LOAD_OFFSET) {
> __tagtable_begin = .;
> *(.taglist.init)
> __tagtable_end = .;
> }
> #ifdef CONFIG_SMP_ON_UP
> - .init.smpalt : {
> + .init.smpalt : AT(ADDR(.init.smpalt) - LOAD_OFFSET) {
> __smpalt_begin = .;
> *(.alt.smp.init)
> __smpalt_end = .;
> }
> #endif
> - .init.pv_table : {
> + .init.pv_table : AT(ADDR(.init.pv_table) - LOAD_OFFSET) {
> __pv_table_begin = .;
> *(.pv_table)
> __pv_table_end = .;
> }
> - .init.data : {
> + .init.data : AT(ADDR(.init.data) - LOAD_OFFSET) {
> #ifndef CONFIG_XIP_KERNEL
> INIT_DATA
> #endif
> @@ -178,7 +187,7 @@ SECTIONS
> INIT_RAM_FS
> }
> #ifndef CONFIG_XIP_KERNEL
> - .exit.data : {
> + .exit.data : AT(ADDR(.exit.data) - LOAD_OFFSET) {
> ARM_EXIT_KEEP(EXIT_DATA)
> }
> #endif
> @@ -196,7 +205,7 @@ SECTIONS
> __data_loc = .;
> #endif
>
> - .data : AT(__data_loc) {
> + .data : AT(__data_loc - LOAD_OFFSET) {
> _data = .; /* address in memory */
> _sdata = .;
>
> @@ -245,7 +254,7 @@ SECTIONS
> * free it after init has commenced and TCM contents have
> * been copied to its destination.
> */
> - .tcm_start : {
> + .tcm_start : AT(ADDR(.tcm_start) - LOAD_OFFSET) {
> . = ALIGN(PAGE_SIZE);
> __tcm_start = .;
> __itcm_start = .;
> @@ -257,7 +266,7 @@ SECTIONS
> * and we'll upload the contents from RAM to TCM and free
> * the used RAM after that.
> */
> - .text_itcm ITCM_OFFSET : AT(__itcm_start)
> + .text_itcm ITCM_OFFSET : AT(__itcm_start - LOAD_OFFSET)
> {
> __sitcm_text = .;
> *(.tcm.text)
> @@ -272,12 +281,12 @@ SECTIONS
> */
> . = ADDR(.tcm_start) + SIZEOF(.tcm_start) + SIZEOF(.text_itcm);
>
> - .dtcm_start : {
> + .dtcm_start : AT(ADDR(.dtcm_start) - LOAD_OFFSET) {
> __dtcm_start = .;
> }
>
> /* TODO: add remainder of ITCM as well, that can be used for data! */
> - .data_dtcm DTCM_OFFSET : AT(__dtcm_start)
> + .data_dtcm DTCM_OFFSET : AT(__dtcm_start - LOAD_OFFSET)
> {
> . = ALIGN(4);
> __sdtcm_data = .;
> @@ -290,7 +299,7 @@ SECTIONS
> . = ADDR(.dtcm_start) + SIZEOF(.data_dtcm);
>
> /* End marker for freeing TCM copy in linked object */
> - .tcm_end : AT(ADDR(.dtcm_start) + SIZEOF(.data_dtcm)){
> + .tcm_end : AT(ADDR(.dtcm_start) + SIZEOF(.data_dtcm) - LOAD_OFFSET){
> . = ALIGN(PAGE_SIZE);
> __tcm_end = .;
> }
> --
> 1.7.4.1
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


2012-10-01 16:06:46

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] [ARM] Use AT() in the linker script to create correct program headers

On Mon, Oct 01, 2012 at 04:39:53PM +0100, Dave Martin wrote:

> > Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
> > LOAD 0x008000 0xc0008000 0x00008000 0x372244 0x3a4310 RWE 0x8000
>
> Not related directly to your patch, but I wonder why we don't we see
> separate r-x and rw- segments?

I think this is because the sections are not aligned when the
protections change, and the sections are not sorted by protection
type.

[ 1] .head.text PROGBITS c0008000 008000 0001c0 00 AX 0 0 32
[ 2] .text PROGBITS c00081c0 0081c0 1dc880 00 AX 0 0 32
[ 3] .rodata PROGBITS c01e5000 1e5000 060a80 00 A 0 0 32
[ 4] __bug_table PROGBITS c0245a80 245a80 002a78 00 A 0 0 1
[ 5] .pci_fixup PROGBITS c02484f8 2484f8 000090 00 A 0 0 4
[ 6] __param PROGBITS c0248588 248588 000280 00 A 0 0 4
[ 7] __modver PROGBITS c0248808 248808 0007f8 00 A 0 0 4
[ 8] .init.text PROGBITS c0249000 249000 01af20 00 AX 0 0 32
[ 9] .exit.text PROGBITS c0263f20 263f20 000c0c 00 AX 0 0 4
[10] .init.proc.info PROGBITS c0264b2c 264b2c 00009c 00 AX 0 0 1
[11] .init.arch.info PROGBITS c0264bc8 264bc8 000044 00 A 0 0 4
[12] .init.tagtable PROGBITS c0264c0c 264c0c 000040 00 A 0 0 4
[13] .init.data PROGBITS c0264c60 264c60 0f481c 00 WA 0 0 32
[14] .data PROGBITS c035a000 35a000 020220 00 WA 0 0 32
[15] .notes NOTE c037a220 37a220 000024 00 AX 0 0 4
[16] .bss NOBITS c037a260 37a244 0320b0 00 WA 0 0 32
[17] .comment PROGBITS 00000000 37a244 000021 01 MS 0 0 1


> > +/* If we have a known, fixed physical load address then set LOAD_OFFSET
> > + and generate an ELF that has the physical load address in the program
> > + headers. */
> > +#ifndef CONFIG_ARM_PATCH_PHYS_VIRT
> > +#define LOAD_OFFSET (PAGE_OFFSET - PHYS_OFFSET)
> > +#endif
> > +
>
> What happens if CONFIG_ARM_PATCH_PHYS_VIRT=y? This will be used
> increasingly, especially for multiplatform kernels.

Then LOAD_OFFSET is unset and include/asm-generic/vmlinux.lds.h does:

#ifndef LOAD_OFFSET
#define LOAD_OFFSET 0
#endif

Which is exactly the same case we have today. LOAD_OFFSET is not a
name I invented, it is the standard kernel name for the functionality.

> If the kernel is intended to be loadable at a physical address which is
> not statically known, no ELF loader that does not ignore the ELF
> phdr

In this case you can't really use a standard ELF loader to load the
kernel so, LOAD_OFFSET = 0 is fine. My case is using an ELF loader,
and I have set options for a static physical load address.

> > OUTPUT_ARCH(arm)
> > -ENTRY(stext)
> > +ENTRY(phys_start)
>
> This is debatable. In fact, stext has the property that its virtual
> (runtime) and load addresses are the same.

This is what other arches using LOAD_OFFSET do (eg see ia64), and is
sensible. ENTRY is *only* used by the ELF loader and specifies where
the loader should jump. It must be a physical address since the loader
only works with physical addresse. I don't understand your comment
that .stext has the same load and virtual address, that is clearly not
true for the ELF images my build is producing:

c0008000 T _text
c0008000 T stext

The physical address of that symbol is 0x8000.

> To represent this correctly in the linker scripts, the
> position-independent head.S code should be split out into a separate
> section to which LOAD_OFFSET is not applied.

I'm not sure, the linker script should use addresses that are correct
during runtime, and if the PIC code does not care about its PC then it
is fine to keep it at the virtual address to minimize confusion once
the MMU is on.

> Setting vaddr and paddr to PAGE_OFFSET (as we do now) and having the
> loader choose the appropriate board-specific place to load the kernel
> image makes this irrelevant, if I've understood the situation correctly.

Yes, if you use more loader stages then the load headers are ignored.
Our boot loaders on our boards boot straight ELF vmlinux.gz so they
need correct load headers.

Jason

2012-10-01 17:56:54

by Dave Martin

[permalink] [raw]
Subject: Re: [PATCH] [ARM] Use AT() in the linker script to create correct program headers

On Mon, Oct 01, 2012 at 10:06:39AM -0600, Jason Gunthorpe wrote:
> On Mon, Oct 01, 2012 at 04:39:53PM +0100, Dave Martin wrote:
>
> > > Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
> > > LOAD 0x008000 0xc0008000 0x00008000 0x372244 0x3a4310 RWE 0x8000
> >
> > Not related directly to your patch, but I wonder why we don't we see
> > separate r-x and rw- segments?
>
> I think this is because the sections are not aligned when the
> protections change, and the sections are not sorted by protection
> type.
>
> [ 1] .head.text PROGBITS c0008000 008000 0001c0 00 AX 0 0 32
> [ 2] .text PROGBITS c00081c0 0081c0 1dc880 00 AX 0 0 32
> [ 3] .rodata PROGBITS c01e5000 1e5000 060a80 00 A 0 0 32
> [ 4] __bug_table PROGBITS c0245a80 245a80 002a78 00 A 0 0 1
> [ 5] .pci_fixup PROGBITS c02484f8 2484f8 000090 00 A 0 0 4
> [ 6] __param PROGBITS c0248588 248588 000280 00 A 0 0 4
> [ 7] __modver PROGBITS c0248808 248808 0007f8 00 A 0 0 4
> [ 8] .init.text PROGBITS c0249000 249000 01af20 00 AX 0 0 32
> [ 9] .exit.text PROGBITS c0263f20 263f20 000c0c 00 AX 0 0 4
> [10] .init.proc.info PROGBITS c0264b2c 264b2c 00009c 00 AX 0 0 1
> [11] .init.arch.info PROGBITS c0264bc8 264bc8 000044 00 A 0 0 4
> [12] .init.tagtable PROGBITS c0264c0c 264c0c 000040 00 A 0 0 4
> [13] .init.data PROGBITS c0264c60 264c60 0f481c 00 WA 0 0 32
> [14] .data PROGBITS c035a000 35a000 020220 00 WA 0 0 32
> [15] .notes NOTE c037a220 37a220 000024 00 AX 0 0 4
> [16] .bss NOBITS c037a260 37a244 0320b0 00 WA 0 0 32
> [17] .comment PROGBITS 00000000 37a244 000021 01 MS 0 0 1
>
>
> > > +/* If we have a known, fixed physical load address then set LOAD_OFFSET
> > > + and generate an ELF that has the physical load address in the program
> > > + headers. */
> > > +#ifndef CONFIG_ARM_PATCH_PHYS_VIRT
> > > +#define LOAD_OFFSET (PAGE_OFFSET - PHYS_OFFSET)
> > > +#endif
> > > +
> >
> > What happens if CONFIG_ARM_PATCH_PHYS_VIRT=y? This will be used
> > increasingly, especially for multiplatform kernels.
>
> Then LOAD_OFFSET is unset and include/asm-generic/vmlinux.lds.h does:
>
> #ifndef LOAD_OFFSET
> #define LOAD_OFFSET 0
> #endif

OK, good.

> Which is exactly the same case we have today. LOAD_OFFSET is not a
> name I invented, it is the standard kernel name for the functionality.
>
> > If the kernel is intended to be loadable at a physical address which is
> > not statically known, no ELF loader that does not ignore the ELF
> > phdr
>
> In this case you can't really use a standard ELF loader to load the
> kernel so, LOAD_OFFSET = 0 is fine. My case is using an ELF loader,
> and I have set options for a static physical load address.

Generally, people should try to be compatible with the single kernel
image effort unless there's a really compelling reason not to.

Wouldn't your firmware be incapable of loading a multiplatform kernel?

> > > OUTPUT_ARCH(arm)
> > > -ENTRY(stext)
> > > +ENTRY(phys_start)
> >
> > This is debatable. In fact, stext has the property that its virtual
> > (runtime) and load addresses are the same.
>
> This is what other arches using LOAD_OFFSET do (eg see ia64), and is
> sensible. ENTRY is *only* used by the ELF loader and specifies where
> the loader should jump. It must be a physical address since the loader
> only works with physical addresse. I don't understand your comment
> that .stext has the same load and virtual address, that is clearly not
> true for the ELF images my build is producing:
>
> c0008000 T _text
> c0008000 T stext
>
> The physical address of that symbol is 0x8000.

Well, that was a bit of a pedantic point I admit, but there are
conflicting definitions of what "virtual address" really means in these
situations. The original SYSV ABI spec explicitly specifies that
e_entry is a virtual address, but is also rather vague about how the
paddr fields should be interpreted.


All that AT(ADDR(blah) - LOAD_OFFSET) stuff is cumbersome, but if it's
at least consistent with other architectures then it may not such a
disaster. It's not universal though: less than 50% of the arches in
the kernel currently seem to use this.

An alternative, cleaner approach might be to specify segment load
addresses directly using PHDRS { }. This should at leat allow the
load address to be specified once per phdr, rather than once per section.

> > To represent this correctly in the linker scripts, the
> > position-independent head.S code should be split out into a separate
> > section to which LOAD_OFFSET is not applied.
>
> I'm not sure, the linker script should use addresses that are correct
> during runtime, and if the PIC code does not care about its PC then it
> is fine to keep it at the virtual address to minimize confusion once
> the MMU is on.
> > Setting vaddr and paddr to PAGE_OFFSET (as we do now) and having the
> > loader choose the appropriate board-specific place to load the kernel
> > image makes this irrelevant, if I've understood the situation correctly.
>
> Yes, if you use more loader stages then the load headers are ignored.
> Our boot loaders on our boards boot straight ELF vmlinux.gz so they
> need correct load headers.

If your image is compressed anyway though, why are you not using zImage?

Cheers
---Dave

2012-10-01 18:35:55

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] [ARM] Use AT() in the linker script to create correct program headers

On Mon, Oct 01, 2012 at 06:56:47PM +0100, Dave Martin wrote:

> > > If the kernel is intended to be loadable at a physical address which is
> > > not statically known, no ELF loader that does not ignore the ELF
> > > phdr
> >
> > In this case you can't really use a standard ELF loader to load the
> > kernel so, LOAD_OFFSET = 0 is fine. My case is using an ELF loader,
> > and I have set options for a static physical load address.
>
> Generally, people should try to be compatible with the single kernel
> image effort unless there's a really compelling reason not to.

Well, all the embedded kernels we use are always custom built and
minimized for the target. So as long as there are options to minimize
the kernel size/increase performance by taking out the relocation
stuff, we are going to use them.

All our boards on PPC and ARM use DT kernels now, and we try to flow
back all the generic stuff as best we can. As someone that makes
custom boards, I really like DT, it makes things much easier :)

> Wouldn't your firmware be incapable of loading a multiplatform kernel?

Well, no, it boots ELFs, so it can boot anything, with any memory
layout. A 2nd stage loader would be required to boot standard kernels,
that loader would be an ELF with 1 section for the 2nd stage, 1
section for the zImage and 1 section for the initrd, with proper load
headers.

Creating such a system is a lot of annoyance, so we never have - it is
*so much* easier to just boot vmlinux ELF directly.

> Well, that was a bit of a pedantic point I admit, but there are
> conflicting definitions of what "virtual address" really means in these
> situations. The original SYSV ABI spec explicitly specifies that
> e_entry is a virtual address, but is also rather vague about how the
> paddr fields should be interpreted.

Granted the spec is vauge, but convention for loaders seems to be that
it is a physical address these days.

It could be a virtual address, and the loader could translate it by
looking at the phdrs, but I don't see any other arches doing that?

> All that AT(ADDR(blah) - LOAD_OFFSET) stuff is cumbersome, but if it's
> at least consistent with other architectures then it may not such a
> disaster. It's not universal though: less than 50% of the arches in
> the kernel currently seem to use this.

I agree it is not nice, but I once did try to make PHDRS work as you
described, but was never successful. IIRC there were serious linker
bugs) As you note the AT method is consistent with other arches, and
the generic vmlinux.lds.h

> > Yes, if you use more loader stages then the load headers are ignored.
> > Our boot loaders on our boards boot straight ELF vmlinux.gz so they
> > need correct load headers.

> If your image is compressed anyway though, why are you not using zImage?

We store the kernel in a CRAMFS, the loader pulls it out and
decompresses it, processes the ELF sections 'on the fly' and jumps to
it. Using zImage would result in double-decompression, and reallly has
no benefits to us.

Jason

2012-10-02 10:23:59

by Dave Martin

[permalink] [raw]
Subject: Re: [PATCH] [ARM] Use AT() in the linker script to create correct program headers

On Mon, Oct 01, 2012 at 12:35:43PM -0600, Jason Gunthorpe wrote:
> On Mon, Oct 01, 2012 at 06:56:47PM +0100, Dave Martin wrote:
>
> > > > If the kernel is intended to be loadable at a physical address which is
> > > > not statically known, no ELF loader that does not ignore the ELF
> > > > phdr
> > >
> > > In this case you can't really use a standard ELF loader to load the
> > > kernel so, LOAD_OFFSET = 0 is fine. My case is using an ELF loader,
> > > and I have set options for a static physical load address.
> >
> > Generally, people should try to be compatible with the single kernel
> > image effort unless there's a really compelling reason not to.
>
> Well, all the embedded kernels we use are always custom built and
> minimized for the target. So as long as there are options to minimize
> the kernel size/increase performance by taking out the relocation
> stuff, we are going to use them.
>
> All our boards on PPC and ARM use DT kernels now, and we try to flow
> back all the generic stuff as best we can. As someone that makes
> custom boards, I really like DT, it makes things much easier :)
>
> > Wouldn't your firmware be incapable of loading a multiplatform kernel?
>
> Well, no, it boots ELFs, so it can boot anything, with any memory
> layout. A 2nd stage loader would be required to boot standard kernels,
> that loader would be an ELF with 1 section for the 2nd stage, 1
> section for the zImage and 1 section for the initrd, with proper load
> headers.

Don't you already have to treat Linux as a special case though? How
do you know where to load ATAGs, DT and/or initramfs, and how to
initalise the registers? None of that is part of any ELF specification,
and would be inappropriate if you boot any non-linux images.

> Creating such a system is a lot of annoyance, so we never have - it is
> *so much* easier to just boot vmlinux ELF directly.
>
> > Well, that was a bit of a pedantic point I admit, but there are
> > conflicting definitions of what "virtual address" really means in these
> > situations. The original SYSV ABI spec explicitly specifies that
> > e_entry is a virtual address, but is also rather vague about how the
> > paddr fields should be interpreted.
>
> Granted the spec is vauge, but convention for loaders seems to be that
> it is a physical address these days.
>
> It could be a virtual address, and the loader could translate it by
> looking at the phdrs, but I don't see any other arches doing that?

You would just give .head.text a virtual address matching its load
address. But as you say, no other arches bother with this, and it's
not obviously worthwhile.

> > All that AT(ADDR(blah) - LOAD_OFFSET) stuff is cumbersome, but if it's
> > at least consistent with other architectures then it may not such a
> > disaster. It's not universal though: less than 50% of the arches in
> > the kernel currently seem to use this.
>
> I agree it is not nice, but I once did try to make PHDRS work as you
> described, but was never successful. IIRC there were serious linker
> bugs) As you note the AT method is consistent with other arches, and
> the generic vmlinux.lds.h

A quick experiment shows that

PHDRS {
kernel PT_LOAD AT(PHYS_OFFSET + LOAD_OFFSET);
}

/* ... */

SECTIONS {
.head.text {
/* ... */
} :kernel

/* ... */
}

can produce a sensible-looking vmlinux at least with my version of the
tools.

As you observe, GNU ld behaviour in this area tends to be rather patchily
specified, buggy or both. That does argue in favour of reusing the
same techniques already used for other arches, though.


A question does occur to me: do your changes work with XIP_KERNEL?
I'm not very familiar with XIP_KERNEL myself, so I'm currently not
clear on whether there's an impact here.

Beyond this, I think the approach doesn't look unreasonable.

> > > Yes, if you use more loader stages then the load headers are ignored.
> > > Our boot loaders on our boards boot straight ELF vmlinux.gz so they
> > > need correct load headers.
>
> > If your image is compressed anyway though, why are you not using zImage?
>
> We store the kernel in a CRAMFS, the loader pulls it out and
> decompresses it, processes the ELF sections 'on the fly' and jumps to
> it. Using zImage would result in double-decompression, and reallly has
> no benefits to us.

You store vmlinux.gz in a cramfs? Is that a typo, or have you already
compressed the kernel twice?


Cheers
---Dave

2012-10-02 17:48:10

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] [ARM] Use AT() in the linker script to create correct program headers

On Tue, Oct 02, 2012 at 11:23:46AM +0100, Dave Martin wrote:

> > Well, no, it boots ELFs, so it can boot anything, with any memory
> > layout. A 2nd stage loader would be required to boot standard kernels,
> > that loader would be an ELF with 1 section for the 2nd stage, 1
> > section for the zImage and 1 section for the initrd, with proper load
> > headers.
>
> Don't you already have to treat Linux as a special case though? How
> do you know where to load ATAGs, DT and/or initramfs, and how to
> initalise the registers? None of that is part of any ELF specification,
> and would be inappropriate if you boot any non-linux images.

Experience with our PPC systems has been that linking the DTB into
vmlinux is the way to go - so we have a trivally small,
non-upstreamble change (for PPC and ARM) to put the DTB(s) in vmlinux
and capture the CPU registers values (entry call function arguments)
that are set from the bootloader.

Some DTB fixup code (supported by the OF kernel routines) in vmlinux
edits the DTB chosen node to include the information from the
bootloader.

>From there any other DTB fixups (eg fetching a MAC address from I2C)
plus other register initializations are done in Linux (typically by
the upstream drivers, though not 100%), with the MMU on, with full
kernel services. This is a much easier development environment :)

All in all it is basically about 100 lines of code to do what I've
described in vmlinux. This is dramatically less code than would be
required if we tried to conform to the standard vmlinux boot protocol.

> As you observe, GNU ld behaviour in this area tends to be rather patchily
> specified, buggy or both. That does argue in favour of reusing the
> same techniques already used for other arches, though.

It is good to know PHDRS seems to be working better now, maybe things
can migrate someday!

> A question does occur to me: do your changes work with XIP_KERNEL?
> I'm not very familiar with XIP_KERNEL myself, so I'm currently not
> clear on whether there's an impact here.

I looked at the XIP defines and they didn't seem to conflict. Since
this change only effects the values in the LOAD headers, not the
actual image I doubt it has any impact.

> Beyond this, I think the approach doesn't look unreasonable.

Great, should I do anything further to get this patch into mainline.

> You store vmlinux.gz in a cramfs? Is that a typo, or have you already
> compressed the kernel twice?

The compression can either be intrisic to cramfs or a raw elf that has
been gzip'd.

Jason

2012-10-03 10:43:50

by Dave Martin

[permalink] [raw]
Subject: Re: [PATCH] [ARM] Use AT() in the linker script to create correct program headers

[Nicolas, do you have a view on this thread with regard to XIP?]

Hi,

On Tue, Oct 02, 2012 at 11:47:59AM -0600, Jason Gunthorpe wrote:
> On Tue, Oct 02, 2012 at 11:23:46AM +0100, Dave Martin wrote:
>
> > > Well, no, it boots ELFs, so it can boot anything, with any memory
> > > layout. A 2nd stage loader would be required to boot standard kernels,
> > > that loader would be an ELF with 1 section for the 2nd stage, 1
> > > section for the zImage and 1 section for the initrd, with proper load
> > > headers.
> >
> > Don't you already have to treat Linux as a special case though? How
> > do you know where to load ATAGs, DT and/or initramfs, and how to
> > initalise the registers? None of that is part of any ELF specification,
> > and would be inappropriate if you boot any non-linux images.
>
> Experience with our PPC systems has been that linking the DTB into
> vmlinux is the way to go - so we have a trivally small,
> non-upstreamble change (for PPC and ARM) to put the DTB(s) in vmlinux
> and capture the CPU registers values (entry call function arguments)
> that are set from the bootloader.
>
> Some DTB fixup code (supported by the OF kernel routines) in vmlinux
> edits the DTB chosen node to include the information from the
> bootloader.
>
> From there any other DTB fixups (eg fetching a MAC address from I2C)
> plus other register initializations are done in Linux (typically by
> the upstream drivers, though not 100%), with the MMU on, with full
> kernel services. This is a much easier development environment :)
>
> All in all it is basically about 100 lines of code to do what I've
> described in vmlinux. This is dramatically less code than would be
> required if we tried to conform to the standard vmlinux boot protocol.

[1]

I'm not sure exactly what you mean by linking the DTB into vmlinux.
I don't think this is supported upstream, at least for ARM. It could
be done externally by post-processing vmlinux to add extra sections
and some boot shim code which initialises the registers appropriately
for kernel entry ... but you're really inventing a bootloader if you
start to do that.

For ARM, I believe that bootloaders that do not pass the dtb properly are
considered considered legacy, or broken (in the case of platforms which
are DT-based). We don't really attempt to support this.

ARM_APPENDED_DTB is the workaround for that, but that is only available
as part of the zImage loader. This is meant to ease migration, not as
a permanent solution... but it will probably be available for quite
a while yet.


Making a simple bootloader DT-aware is not actually that hard: libfdt
is intentionally pretty straightforward to integrate.

For an example of how this might look in a simple scenario, you could
take a look at the zImage loader implementation, or

git://git.linaro.org/arm/models/boot-wrapper.git

(Particularly update_fdt() and load_kernel() in semi_loader.c)

It's more than 100 lines, but not _that_ much more, and contains
some implementation features you probably don't need.

There may still be an argument against this if your bootloader is
exteremely space-constrained.

> > As you observe, GNU ld behaviour in this area tends to be rather patchily
> > specified, buggy or both. That does argue in favour of reusing the
> > same techniques already used for other arches, though.
>
> It is good to know PHDRS seems to be working better now, maybe things
> can migrate someday!

I guess we shouldn't rush to do that unnecessarily. Old tools hang
around for a long, long time...

> > A question does occur to me: do your changes work with XIP_KERNEL?
> > I'm not very familiar with XIP_KERNEL myself, so I'm currently not
> > clear on whether there's an impact here.
>
> I looked at the XIP defines and they didn't seem to conflict. Since
> this change only effects the values in the LOAD headers, not the
> actual image I doubt it has any impact.
>
> > Beyond this, I think the approach doesn't look unreasonable.
>
> Great, should I do anything further to get this patch into mainline.

I think that someone more familiar with XIP should at least comment on
it.

Since you admit [1] that you are deliberately not following the boot
protocol proper, you may get some pushback about whether the change
is justified.

>From my side, I think that the fact this takes arch/arm/vmlinux.lds.S
closer to some other common arches means that this is low risk and should
be mostly harmless, but I will take another look at the patch first.

> > You store vmlinux.gz in a cramfs? Is that a typo, or have you already
> > compressed the kernel twice?
>
> The compression can either be intrisic to cramfs or a raw elf that has
> been gzip'd.

If you can have files with intrinsic compression such that they don't
get double-compressed, why can you not do this for zImage?

Am I still misunderstanding something?

Cheers
---Dave

2012-10-03 18:44:41

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] [ARM] Use AT() in the linker script to create correct program headers

On Wed, Oct 03, 2012 at 11:43:35AM +0100, Dave Martin wrote:

> I'm not sure exactly what you mean by linking the DTB into vmlinux.
> I don't think this is supported upstream, at least for ARM. It could
> be done externally by post-processing vmlinux to add extra sections
> and some boot shim code which initialises the registers appropriately
> for kernel entry ... but you're really inventing a bootloader if you
> start to do that.

We use the existing build infrastructure (see cmd_dt_S_dtb) and add a
4 line patch to head.S to put the dtb address into the right
register. I think it would be great to have a CONFIG_ARM_BUILT_IN_DTB
for applications like mine, but it is really no problem to carry the
head.S patch. The normal build process produces a vmlinux that is self
contained, no post processing required.

> Making a simple bootloader DT-aware is not actually that hard: libfdt
> is intentionally pretty straightforward to integrate.

Sure, but we have several major issues with that entire idea:
1) Our bootloader (on PPC and ARM) is in redundant NOR flash and
is limited to 16K. That is just enough to do gzip, elf, cramfs
and tftpv6 - there is no space left for dtb or fdt code.
2) Experience using DT with PPC has shown that the vmlinux is very
sensitive to the DTB. Every DTB/vmlinux combination must be tested
to ensure it works. By far the best way to manage this *for us* is
to guarentee that the vmlinux sees only the DTB it was designed
for by including it in vmlinux.
3) The DTB *changes*. We change it because we've changed things in the
programmable part of our environment and we change it because the
kernel has shifted. Planning for this inevitable change is
necessary.
4) On our ARM platforms updating the boot loader in the field is more
challenging than updating the vmlinux, not impossible, but not
something we want to do every day. Our PPC boot loader has
survived without change since 2007, and has booted kernels from
2.6.12 to 3.6 with and without DT. This is a time tested,
field proven approach.

Fundamentally we treat the DTB like the .config - an integral,
internal, part of vmlinux.

As far as I can see, the *only* argument to put the DT in the boot
loader is to support standard, distribution kernels.

> It's more than 100 lines, but not _that_ much more, and contains
> some implementation features you probably don't need.

But that is not enough, variations of our platforms use bit bang I2C,
SPI and NOR FLASH to store things like the MAC address and other
DT-relevant information. So, swaths of that code would need to go into
the boot loader/zImage as well.

At that point we may as well take our chances with uBoot. :)

> From my side, I think that the fact this takes arch/arm/vmlinux.lds.S
> closer to some other common arches means that this is low risk and should
> be mostly harmless, but I will take another look at the patch first.

This matches my opinion..

> > > You store vmlinux.gz in a cramfs? Is that a typo, or have you already
> > > compressed the kernel twice?
> >
> > The compression can either be intrisic to cramfs or a raw elf that has
> > been gzip'd.
>
> If you can have files with intrinsic compression such that they don't
> get double-compressed, why can you not do this for zImage?
>
> Am I still misunderstanding something?

We have two boot modes, one boots ELF.gz (for development), the other
boots ELF in CRAMFS (for deployment). Both are compressed, one is
compressed using gzip and one is compressed using zlib in CRAMFS. It
is hard to see how to store a zImage in CRAMFS and avoid double
de-compression. We never store an ELF.gz in CRAMFS.

Since the entire point of zImage is to introduce compression it is
hard to see *why* we'd want to store a zImage in a CRAMFS.

Jason

2012-10-04 11:36:54

by Dave Martin

[permalink] [raw]
Subject: Re: [PATCH] [ARM] Use AT() in the linker script to create correct program headers

On Wed, Oct 03, 2012 at 12:44:38PM -0600, Jason Gunthorpe wrote:
> On Wed, Oct 03, 2012 at 11:43:35AM +0100, Dave Martin wrote:
>
> > I'm not sure exactly what you mean by linking the DTB into vmlinux.
> > I don't think this is supported upstream, at least for ARM. It could
> > be done externally by post-processing vmlinux to add extra sections
> > and some boot shim code which initialises the registers appropriately
> > for kernel entry ... but you're really inventing a bootloader if you
> > start to do that.
>
> We use the existing build infrastructure (see cmd_dt_S_dtb) and add a
> 4 line patch to head.S to put the dtb address into the right
> register. I think it would be great to have a CONFIG_ARM_BUILT_IN_DTB
> for applications like mine, but it is really no problem to carry the
> head.S patch. The normal build process produces a vmlinux that is self
> contained, no post processing required.

OK, so it is supported, but not for ARM, yet. I'm not sure that such
a patch would be rejected, since building in a DTB is not really that
different from building in a command-line.

The post-processing that would otherwise be needed is likely to be
pretty trivial, though (see example below).

> > Making a simple bootloader DT-aware is not actually that hard: libfdt
> > is intentionally pretty straightforward to integrate.
>
> Sure, but we have several major issues with that entire idea:
> 1) Our bootloader (on PPC and ARM) is in redundant NOR flash and
> is limited to 16K. That is just enough to do gzip, elf, cramfs
> and tftpv6 - there is no space left for dtb or fdt code.
> 2) Experience using DT with PPC has shown that the vmlinux is very
> sensitive to the DTB. Every DTB/vmlinux combination must be tested
> to ensure it works. By far the best way to manage this *for us* is
> to guarentee that the vmlinux sees only the DTB it was designed
> for by including it in vmlinux.
> 3) The DTB *changes*. We change it because we've changed things in the
> programmable part of our environment and we change it because the
> kernel has shifted. Planning for this inevitable change is
> necessary.
> 4) On our ARM platforms updating the boot loader in the field is more
> challenging than updating the vmlinux, not impossible, but not
> something we want to do every day. Our PPC boot loader has
> survived without change since 2007, and has booted kernels from
> 2.6.12 to 3.6 with and without DT. This is a time tested,
> field proven approach.
>
> Fundamentally we treat the DTB like the .config - an integral,
> internal, part of vmlinux.
>
> As far as I can see, the *only* argument to put the DT in the boot
> loader is to support standard, distribution kernels.

DT-aware does not have to mean that the DTB is part of the bootloader.
Generally, that's considered to be a bad idea for the reasons you
describe (among others).

The other solution to this problem is to distribute the dtb alongside the
kernel as a supplementary image (similar to initramfs), with the
bootloader doing the bare minimum of modifications to it.

Naturally, if zero modifications are needed then the bootloader does not
need libfdt (or equivalent code) at all.

> > It's more than 100 lines, but not _that_ much more, and contains
> > some implementation features you probably don't need.
>
> But that is not enough, variations of our platforms use bit bang I2C,
> SPI and NOR FLASH to store things like the MAC address and other
> DT-relevant information. So, swaths of that code would need to go into
> the boot loader/zImage as well.
>
> At that point we may as well take our chances with uBoot. :)
>
> > From my side, I think that the fact this takes arch/arm/vmlinux.lds.S
> > closer to some other common arches means that this is low risk and should
> > be mostly harmless, but I will take another look at the patch first.
>
> This matches my opinion..
>
> > > > You store vmlinux.gz in a cramfs? Is that a typo, or have you already
> > > > compressed the kernel twice?
> > >
> > > The compression can either be intrisic to cramfs or a raw elf that has
> > > been gzip'd.
> >
> > If you can have files with intrinsic compression such that they don't
> > get double-compressed, why can you not do this for zImage?
> >
> > Am I still misunderstanding something?
>
> We have two boot modes, one boots ELF.gz (for development), the other
> boots ELF in CRAMFS (for deployment). Both are compressed, one is
> compressed using gzip and one is compressed using zlib in CRAMFS. It
> is hard to see how to store a zImage in CRAMFS and avoid double
> de-compression. We never store an ELF.gz in CRAMFS.

OK, now I understand.

> Since the entire point of zImage is to introduce compression it is
> hard to see *why* we'd want to store a zImage in a CRAMFS.

Because zImage has grown one or two extra features (which in fact have
nothing to do with compression). But I can see why you're not keen
on the idea. In principle we could have an uncompressed zImage, but
for now this doesn't exist.


One other option open to you would be to provide an ELF image loadable
by your bootloader via a simple wrapper. This would probably be a more
robust approach than carrying patches against head.S, since it removes
any intimate dependency on the kernel code itself.

---8<--- boot.lds.S ---

OUTPUT_FORMAT(elf32-littlearm)

TARGET(binary)
INPUT(Image)
INPUT(dtb)

TARGET(elf32-littlearm)

ENTRY(__entry)

SECTIONS {
.text PHYS_OFFSET : {
boot.o(.text)
}

kernel PHYS_OFFSET + TEXT_OFFSET : {
__kernel_entry = .;
Image(*)

payload PHYS_OFFSET + PAYLOAD_OFFSET : {
__dtb_start = .;
dtb(*)
}

/DISCARD/ : {
*(*)
}
}

--- boot.S ---

.globl __entry
.type __entry, %function
__entry:
mov r0, #0
mov r1, #0xFFFFFFFF
ldr r2, =__dtb_start
b __kernel_entry

--->8---

Where PHYS_OFFSET and TEXT_OFFSET are the appropriate definitions
for your platform, and PAYLOAD_OFFSET is an appropriate safe
location for the dtb.

The resulting image can be a bit smaller than vmlinux, too (but only
slightly).


---Dave

2012-10-04 17:59:14

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] [ARM] Use AT() in the linker script to create correct program headers

On Thu, Oct 04, 2012 at 12:36:37PM +0100, Dave Martin wrote:

> OK, so it is supported, but not for ARM, yet. I'm not sure that
> such a patch would be rejected, since building in a DTB is not
> really that different from building in a command-line.

Maybe I will be able to look at this in a few months..

> The other solution to this problem is to distribute the dtb
> alongside the kernel as a supplementary image (similar to
> initramfs), with the bootloader doing the bare minimum of
> modifications to it.

Yes, but we still need rely on complex code like I2C/MTD to create a
correct DTB, which again puts us back to patching the kernel for that
functionality.

> Naturally, if zero modifications are needed then the bootloader does not
> need libfdt (or equivalent code) at all.

If only :)

> One other option open to you would be to provide an ELF image loadable
> by your bootloader via a simple wrapper. This would probably be a more
> robust approach than carrying patches against head.S, since it removes
> any intimate dependency on the kernel code itself.

This is what I ment by having a 2nd stage loader, but this example is
not complete because at this point you also have to patch into the DTB
the initrd address (passed in r0/r1), and the various MAC addresses
(getting them is the hard part..).

The nice thing about the head.S patch is that it lives right after
_entry, so there actually are no dependencies on the kernel code, it
executes in the same environment as the example you presented.

Anyhow, exploring a CONFIG_ARM_BUILT_IN_DTB.. I think I'd add
something like this to head.S:

#ifdef CONFIG_ARM_BUILT_IN_DTB
adr r3,bootloader_r0
stmia r3,{r0,r1,r2}
mov r2, #0
mov r1, #0
mov r0, #0
#else
bl __vet_atags
#endif

And something like this to early_dt_fixup:

#ifdef CONFIG_ARM_BUILT_IN_DTB
for_each_dt_provider(dtp) {
devtree = dtp->get_dtb();
if (devtree)
break;
}
#else
devtree = phys_to_virt(dt_phys);
#endif

Where the 'dev tree provider' would use the stored bootloader
registers and any other information to return the proper DTB.

Jason

2012-10-05 08:45:23

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH] [ARM] Use AT() in the linker script to create correct program headers

On Mon, Oct 01, 2012 at 10:06:39AM -0600, Jason Gunthorpe wrote:
> On Mon, Oct 01, 2012 at 04:39:53PM +0100, Dave Martin wrote:
> > > Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
> > > LOAD 0x008000 0xc0008000 0x00008000 0x372244 0x3a4310 RWE 0x8000
> >
> > Not related directly to your patch, but I wonder why we don't we see
> > separate r-x and rw- segments?
>
> I think this is because the sections are not aligned when the
> protections change, and the sections are not sorted by protection
> type.

They aren't sorted by protection type, they're ordered according to what's
required for the kernel - which is to have the init sections together as
one complete block so that it can be freed, and to place all of the kernel
text as close to the beginning of the image as possible.

2012-10-08 10:24:23

by Dave Martin

[permalink] [raw]
Subject: Re: [PATCH] [ARM] Use AT() in the linker script to create correct program headers

On Fri, Oct 05, 2012 at 09:45:00AM +0100, Russell King - ARM Linux wrote:
> On Mon, Oct 01, 2012 at 10:06:39AM -0600, Jason Gunthorpe wrote:
> > On Mon, Oct 01, 2012 at 04:39:53PM +0100, Dave Martin wrote:
> > > > Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
> > > > LOAD 0x008000 0xc0008000 0x00008000 0x372244 0x3a4310 RWE 0x8000
> > >
> > > Not related directly to your patch, but I wonder why we don't we see
> > > separate r-x and rw- segments?
> >
> > I think this is because the sections are not aligned when the
> > protections change, and the sections are not sorted by protection
> > type.
>
> They aren't sorted by protection type, they're ordered according to what's
> required for the kernel - which is to have the init sections together as
> one complete block so that it can be freed, and to place all of the kernel
> text as close to the beginning of the image as possible.

Ah, right.

Partly this came from some side speculation about whether we could do
things like privileged read-only permissions on newer CPUs, for preventing
unintended or undesired writes to the kernel's code or read-only data.

There would be various things to solve in order to make that work, so I
guess there's no great urgency for it right now.

Cheers
---Dave

2012-10-08 10:46:59

by Dave Martin

[permalink] [raw]
Subject: Re: [PATCH] [ARM] Use AT() in the linker script to create correct program headers

On Thu, Oct 04, 2012 at 11:59:07AM -0600, Jason Gunthorpe wrote:
> On Thu, Oct 04, 2012 at 12:36:37PM +0100, Dave Martin wrote:
>
> > OK, so it is supported, but not for ARM, yet. I'm not sure that
> > such a patch would be rejected, since building in a DTB is not
> > really that different from building in a command-line.
>
> Maybe I will be able to look at this in a few months..
>
> > The other solution to this problem is to distribute the dtb
> > alongside the kernel as a supplementary image (similar to
> > initramfs), with the bootloader doing the bare minimum of
> > modifications to it.
>
> Yes, but we still need rely on complex code like I2C/MTD to create a
> correct DTB, which again puts us back to patching the kernel for that
> functionality.

I'm still confused as to where this complexity is coming from.

If you need to run some complex I2C and MTD code to generate the DT, what
is that code doing? If it is probing to get the information, then can you
avoid putting the info in the DT at all? Primarily the DT is to describe
those aspects of the hardware which can't be probed.

Otherwise, you that have a few static configurations: you could have one
pre-baked DT per hardware platform, and choose the correct one once you
have detected the platform.

> > Naturally, if zero modifications are needed then the bootloader does not
> > need libfdt (or equivalent code) at all.
>
> If only :)
>
> > One other option open to you would be to provide an ELF image loadable
> > by your bootloader via a simple wrapper. This would probably be a more
> > robust approach than carrying patches against head.S, since it removes
> > any intimate dependency on the kernel code itself.
>
> This is what I ment by having a 2nd stage loader, but this example is
> not complete because at this point you also have to patch into the DTB
> the initrd address (passed in r0/r1), and the various MAC addresses
> (getting them is the hard part..).
>
> The nice thing about the head.S patch is that it lives right after
> _entry, so there actually are no dependencies on the kernel code, it
> executes in the same environment as the example you presented.

Well, logically there's little difference unless your patch were to be
merged. An out-of-tree patch against head.S could be more effort to
maintain in some situations, but that code doesn't evolve rapidly.

> Anyhow, exploring a CONFIG_ARM_BUILT_IN_DTB.. I think I'd add
> something like this to head.S:
>
> #ifdef CONFIG_ARM_BUILT_IN_DTB
> adr r3,bootloader_r0
> stmia r3,{r0,r1,r2}
> mov r2, #0
> mov r1, #0
> mov r0, #0
> #else
> bl __vet_atags
> #endif
>
> And something like this to early_dt_fixup:
>
> #ifdef CONFIG_ARM_BUILT_IN_DTB
> for_each_dt_provider(dtp) {
> devtree = dtp->get_dtb();
> if (devtree)
> break;
> }
> #else
> devtree = phys_to_virt(dt_phys);
> #endif
>
> Where the 'dev tree provider' would use the stored bootloader
> registers and any other information to return the proper DTB.

It would need developing a bit, but something like that might be
possible -- it should probably be discussed via devicetree-discuss.

If it is doing anything less trivial than picking a pre-baked DT, the
rationale would need to be carefully argued.

Cheers
---Dave

2012-10-09 17:37:20

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] [ARM] Use AT() in the linker script to create correct program headers

On Mon, Oct 08, 2012 at 11:24:13AM +0100, Dave Martin wrote:

> Partly this came from some side speculation about whether we could do
> things like privileged read-only permissions on newer CPUs, for preventing
> unintended or undesired writes to the kernel's code or read-only data.

Some other arches page protect the kernel, but that tends to be at
odds with the desire to use huge pages for the kernel mapping, and
independent of the load headers..

Jason

2012-10-09 18:25:23

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] [ARM] Use AT() in the linker script to create correct program headers

On Mon, Oct 08, 2012 at 11:46:49AM +0100, Dave Martin wrote:

> > Yes, but we still need rely on complex code like I2C/MTD to create a
> > correct DTB, which again puts us back to patching the kernel for that
> > functionality.
>
> I'm still confused as to where this complexity is coming from.
>
> If you need to run some complex I2C and MTD code to generate the DT, what
> is that code doing? If it is probing to get the information, then can you
> avoid putting the info in the DT at all? Primarily the DT is to describe
> those aspects of the hardware which can't be probed.

At manufacturing the unit is programmed with a small datastructure that
contains all the unique ID's (MAC addresses, etc), manufacturing
variations (ie vendor A or B was used for a socket) and other
ancillary data.

So a fair amount of I2C and MTD stack is required just to fetch this
structure - a full CFI probe, for instance, is needed in the NOR flash
case just to locate the structure.

Once read, things like MAC addresses are copied into the DTB, and
certain sections of the DTB are NOP'd out - we have stanzas for chip
vendor A and vendor B, the one that was not put on the board is
replaced with NOP.

Similar to the A/B fix, a further fixup is done based on a runtime
probe of the programmable devices to learn their current
configuration/memory map.

It seems desirable to present a complete/correct DTB to the kernel,
it doesn't seem there are great places to hook in custom discovery
procedures.

>From a maintenance perspective we already have to test/etc the kernel
code for all of this, we don't want to do that twice by duplicating
this stuff outside the kernel.

> Otherwise, you that have a few static configurations: you could have one
> pre-baked DT per hardware platform, and choose the correct one once you
> have detected the platform.

We do that too, for instance the PPC kernel we build supports 4
different circuit boards, each served by a separate DTB, that needs a
fixup pass.

I think the biggest DTB describes about 49 devices..

> > Where the 'dev tree provider' would use the stored bootloader
> > registers and any other information to return the proper DTB.
>
> It would need developing a bit, but something like that might be
> possible -- it should probably be discussed via devicetree-discuss.
>
> If it is doing anything less trivial than picking a pre-baked DT, the
> rationale would need to be carefully argued.

I'm not sure there is a great interest in this? What are other folks
working on production embedded stuff doing? I suppose that will be
more clear as device tree is rolled out.

Jason

2012-10-10 09:18:45

by Dave Martin

[permalink] [raw]
Subject: Re: [PATCH] [ARM] Use AT() in the linker script to create correct program headers

On Tue, Oct 09, 2012 at 11:37:06AM -0600, Jason Gunthorpe wrote:
> On Mon, Oct 08, 2012 at 11:24:13AM +0100, Dave Martin wrote:
>
> > Partly this came from some side speculation about whether we could do
> > things like privileged read-only permissions on newer CPUs, for preventing
> > unintended or undesired writes to the kernel's code or read-only data.
>
> Some other arches page protect the kernel, but that tends to be at
> odds with the desire to use huge pages for the kernel mapping, and
> independent of the load headers..

This wasn't so much about that headers themselves as about fragmentation
of the page permissions which makes it difficult to map everything using
huge pages / sections. But as you say, there are conflicting concerns
here, and it seems not to be a priority.

Privileged write-protect is nice to have if non-disruptive, but not
essential.

Cheers
---Dave

2012-10-10 09:55:53

by Dave Martin

[permalink] [raw]
Subject: Re: [PATCH] [ARM] Use AT() in the linker script to create correct program headers

On Tue, Oct 09, 2012 at 12:25:14PM -0600, Jason Gunthorpe wrote:
> On Mon, Oct 08, 2012 at 11:46:49AM +0100, Dave Martin wrote:
>
> > > Yes, but we still need rely on complex code like I2C/MTD to create a
> > > correct DTB, which again puts us back to patching the kernel for that
> > > functionality.
> >
> > I'm still confused as to where this complexity is coming from.
> >
> > If you need to run some complex I2C and MTD code to generate the DT, what
> > is that code doing? If it is probing to get the information, then can you
> > avoid putting the info in the DT at all? Primarily the DT is to describe
> > those aspects of the hardware which can't be probed.
>
> At manufacturing the unit is programmed with a small datastructure that
> contains all the unique ID's (MAC addresses, etc), manufacturing
> variations (ie vendor A or B was used for a socket) and other
> ancillary data.

As a side comment, some things such as MAC addresses can be probed and
set from userspace after kernel boot, assuming that you have a way
to fetch the device description blob from userspace. If it's accessible
via MTD I'm guessing you may have some chance of doing that.

Identifying the board variant is harder to defer though.

> So a fair amount of I2C and MTD stack is required just to fetch this
> structure - a full CFI probe, for instance, is needed in the NOR flash
> case just to locate the structure.
>
> Once read, things like MAC addresses are copied into the DTB, and
> certain sections of the DTB are NOP'd out - we have stanzas for chip
> vendor A and vendor B, the one that was not put on the board is
> replaced with NOP.
>
> Similar to the A/B fix, a further fixup is done based on a runtime
> probe of the programmable devices to learn their current
> configuration/memory map.
>
> It seems desirable to present a complete/correct DTB to the kernel,
> it doesn't seem there are great places to hook in custom discovery
> procedures.
>
> From a maintenance perspective we already have to test/etc the kernel
> code for all of this, we don't want to do that twice by duplicating
> this stuff outside the kernel.

OK, that gives me a good understanding of what you're trying to achieve
here.

However, I worry that if you have to run hardware-dependent code in
order to fetch or generate parts of the DT, then we have a chicken-
and-egg problem with no guaranteed solution with the frameworks that
exist today -- although I am not familiar with how DT gets used on
all other arches, so you might have counterexamples I'm not aware of.

At least in ARM-land, the DT is inherently monolithic and static: the
DT is not expected to change once you enter the kernel.

Of course, more or less anything can be done with local patches, but
merging such functionality in a clean way might be a challenge.

> > Otherwise, you that have a few static configurations: you could have one
> > pre-baked DT per hardware platform, and choose the correct one once you
> > have detected the platform.
>
> We do that too, for instance the PPC kernel we build supports 4
> different circuit boards, each served by a separate DTB, that needs a
> fixup pass.
>
> I think the biggest DTB describes about 49 devices..
>
> > > Where the 'dev tree provider' would use the stored bootloader
> > > registers and any other information to return the proper DTB.
> >
> > It would need developing a bit, but something like that might be
> > possible -- it should probably be discussed via devicetree-discuss.
> >
> > If it is doing anything less trivial than picking a pre-baked DT, the
> > rationale would need to be carefully argued.
>
> I'm not sure there is a great interest in this? What are other folks
> working on production embedded stuff doing? I suppose that will be
> more clear as device tree is rolled out.

For now, the architecturally simplest solution still seems to me to be
to write your own boot shim which customises the DT before booting the
kernel. As discussed, this can continue to look like a simple ELF image
from the bootloader's point of view -- but I appreciate that it will
involve effort and some duplication of some low-level driver components.

If we could do dynamic DT properly (either fully dynamic, or dynamic
in a more constrained way as in your suggested patch), that would
provided an architected way to solve this problem from within the kernel.

This could certainly be worth raising on devicetree-discuss.

Your problem is unlikely to affect people outside the embedded space
too much, but it doesn't sound likely to be unique.

Cheers
---Dave

2012-10-12 21:24:15

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] [ARM] Use AT() in the linker script to create correct program headers

On Wed, Oct 10, 2012 at 10:55:44AM +0100, Dave Martin wrote:

> As a side comment, some things such as MAC addresses can be probed and
> set from userspace after kernel boot, assuming that you have a way
> to fetch the device description blob from userspace. If it's accessible
> via MTD I'm guessing you may have some chance of doing that.

Yes, there is a whole wack of other data in this structure (serial
numbers, etc) that are captured and used by userspace, so access is
always made possible.

Setting the MAC via userspace is interesting, I will bear that in
mind. However, we have several development-support boot scenarios that
enable the ethernet before the userspace code to fetch the data
structure is loaded - one of interest has the kernel's initramfs
perform a NBD mount of the actual application filesystem. So having
the kernel tend to this is very convenient.

> However, I worry that if you have to run hardware-dependent code in
> order to fetch or generate parts of the DT, then we have a chicken-
> and-egg problem with no guaranteed solution with the frameworks that
> exist today -- although I am not familiar with how DT gets used on
> all other arches, so you might have counterexamples I'm not aware of.

Certainly, in some cases, (like what I have done) such kernels are
non-generic and could only work with the hardware they are intended
for.

However, if you assume the bootloader provides a dtb (even an
incomplete one would do) then fixups/replacements/whatever could be
run based on tags in the dtb without creating a chicken-and-egg
problem.

> At least in ARM-land, the DT is inherently monolithic and static: the
> DT is not expected to change once you enter the kernel.

Sort of, it is the same on all arches, the DT block is expected to
remain valid, but all the of_node pointers go directly into the block
so you can alter values but not the structure without breaking any
invariants. As an example our DTBs all have mac address blocks, with a
0 value. The fixup simply overwrites with the correct value before the
ethernet driver reads from it.

> > I'm not sure there is a great interest in this? What are other folks
> > working on production embedded stuff doing? I suppose that will be
> > more clear as device tree is rolled out.
>
> For now, the architecturally simplest solution still seems to me to be
> to write your own boot shim which customises the DT before booting the
> kernel. As discussed, this can continue to look like a simple ELF image
> from the bootloader's point of view -- but I appreciate that it will
> involve effort and some duplication of some low-level driver components.

I admit, I am a bit conflicted. Viewing from the kernel perspective, I
definitely think this kind of very low level extremely board specific
stuff should not be in the kernel. That could be a maintenance
nightmare. However, as an OEM, I need the smallest, simplest boot
loader possible so I want to push this into the kernel to lighten my
development load.

> Your problem is unlikely to affect people outside the embedded space
> too much, but it doesn't sound likely to be unique.

Agreed..

This thread has left the original topic, what do you think I should do
with the linker patch?

Jason