2013-06-24 14:00:33

by Arnd Bergmann

[permalink] [raw]
Subject: scripts/kallsyms: Avoid ARM veneer symbols

When building ARM kernels that exceed a certain size, the linker
will add "veneers" that allow long branches. Unfortunately,
using a longer kallsyms section may lead to the extra veneers
being needed, which leads to inconsistent kallsyms data with the
message

Inconsistent kallsyms data
Try make KALLSYMS_EXTRA_PASS=1 as a workaround

In some case, not just one but up to three extra passes were
needed before getting to a state that needed no extra veneers.
The easiest solution is to skip veneers in kallsyms in the
same way we already skip a couple of other symbols.

Signed-off-by: Arnd Bergmann <[email protected]>
---
diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 487ac6f..53ec0bb 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -69,14 +69,32 @@ static void usage(void)
exit(1);
}

-/*
- * This ignores the intensely annoying "mapping symbols" found
- * in ARM ELF files: $a, $t and $d.
- */
static inline int is_arm_mapping_symbol(const char *str)
{
- return str[0] == '$' && strchr("atd", str[1])
- && (str[2] == '\0' || str[2] == '.');
+ size_t len;
+
+ /*
+ * This ignores the intensely annoying "mapping symbols" found
+ * in ARM ELF files: $a, $t and $d.
+ */
+ if (str[0] == '$' && strchr("atd", str[1])
+ && (str[2] == '\0' || str[2] == '.'))
+ return 1;
+
+ len = strlen(str);
+ if (len < 10)
+ return 0;
+
+ /*
+ * This ignores any __.*_veneer symbol, which get
+ * inserted for large kernels, in order to avoid
+ * inconsistent data.
+ */
+ if (str[0] == '_' && str[1] == '_' &&
+ strcmp(str + len - 7, "_veneer") == 0)
+ return 1;
+
+ return 0;
}

static int read_symbol_tr(const char *sym, unsigned long long addr)


2013-06-25 15:09:51

by Dave Martin

[permalink] [raw]
Subject: Re: scripts/kallsyms: Avoid ARM veneer symbols

On Mon, Jun 24, 2013 at 04:01:43PM +0200, Arnd Bergmann wrote:
> When building ARM kernels that exceed a certain size, the linker
> will add "veneers" that allow long branches. Unfortunately,
> using a longer kallsyms section may lead to the extra veneers
> being needed, which leads to inconsistent kallsyms data with the
> message
>
> Inconsistent kallsyms data
> Try make KALLSYMS_EXTRA_PASS=1 as a workaround
>
> In some case, not just one but up to three extra passes were
> needed before getting to a state that needed no extra veneers.

Do you understand why this was happening? It sounds like there
must have been branches crossing from one side of the kallsyms
data to the other, triggering veneer insertion any time the
kallsyms data grows.

Branches between .{init,exit}.text and .text (crossing .rodata) seem the
likeliest culprits, but that's guesswork on my part.


If that's whats going on, multiple kallsyms passes is actually a correct
approach here: I think they should terminate after a number of steps
roughly proportional to log(number of branches across .rodata). We
could come up with a heuristic which allows us to choose the right
limit with a high degree of reliability, since branch density in
compiled C code is likely to be roughly.

But including the veneer symbols in kallsyms is arguably not all
that useful.

The main potential effect is that profiling might occasionally
sample the PC as being in a completely bogus function which it
never passed through at all, because of the way kallsyms tracks
only symbol locations and not sizes (if I remember right) --
so a veneer will actually get accounted against some arbitrary
adjacent function.

I don't know how much this matters.

> The easiest solution is to skip veneers in kallsyms in the
> same way we already skip a couple of other symbols.

The other symbols are not stripped for the purpose of making
kallsyms terminate quickly. The mapping symbols are rather
different: masses of symbols with duplicate names which are
not very interesting for most people.

>
> Signed-off-by: Arnd Bergmann <[email protected]>
> ---
> diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
> index 487ac6f..53ec0bb 100644
> --- a/scripts/kallsyms.c
> +++ b/scripts/kallsyms.c
> @@ -69,14 +69,32 @@ static void usage(void)
> exit(1);
> }
>
> -/*
> - * This ignores the intensely annoying "mapping symbols" found
> - * in ARM ELF files: $a, $t and $d.
> - */
> static inline int is_arm_mapping_symbol(const char *str)

The function's name is now wrong. Should it be renamed or split up?

Cheers
---Dave

> {
> - return str[0] == '$' && strchr("atd", str[1])
> - && (str[2] == '\0' || str[2] == '.');
> + size_t len;
> +
> + /*
> + * This ignores the intensely annoying "mapping symbols" found
> + * in ARM ELF files: $a, $t and $d.
> + */
> + if (str[0] == '$' && strchr("atd", str[1])
> + && (str[2] == '\0' || str[2] == '.'))
> + return 1;
> +
> + len = strlen(str);
> + if (len < 10)
> + return 0;
> +
> + /*
> + * This ignores any __.*_veneer symbol, which get
> + * inserted for large kernels, in order to avoid
> + * inconsistent data.
> + */
> + if (str[0] == '_' && str[1] == '_' &&
> + strcmp(str + len - 7, "_veneer") == 0)
> + return 1;
> +
> + return 0;
> }
>
> static int read_symbol_tr(const char *sym, unsigned long long addr)
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

2013-07-03 16:03:18

by Arnd Bergmann

[permalink] [raw]
Subject: Re: scripts/kallsyms: Avoid ARM veneer symbols

On Tuesday 25 June 2013, Dave Martin wrote:
> On Mon, Jun 24, 2013 at 04:01:43PM +0200, Arnd Bergmann wrote:
> > When building ARM kernels that exceed a certain size, the linker
> > will add "veneers" that allow long branches. Unfortunately,
> > using a longer kallsyms section may lead to the extra veneers
> > being needed, which leads to inconsistent kallsyms data with the
> > message
> >
> > Inconsistent kallsyms data
> > Try make KALLSYMS_EXTRA_PASS=1 as a workaround
> >
> > In some case, not just one but up to three extra passes were
> > needed before getting to a state that needed no extra veneers.
>
> Do you understand why this was happening? It sounds like there
> must have been branches crossing from one side of the kallsyms
> data to the other, triggering veneer insertion any time the
> kallsyms data grows.
>
> Branches between .{init,exit}.text and .text (crossing .rodata) seem the
> likeliest culprits, but that's guesswork on my part.

kallsyms is part of rodata, which is between .text and .init.text.
I understand this is intentional, and we already use two passes
of kallsyms to make sure the size is right. However the assumption
is that only the data of the kallsyms table changes between runs,
not the size of it.

> If that's whats going on, multiple kallsyms passes is actually a correct
> approach here: I think they should terminate after a number of steps
> roughly proportional to log(number of branches across .rodata). We
> could come up with a heuristic which allows us to choose the right
> limit with a high degree of reliability, since branch density in
> compiled C code is likely to be roughly.

It also seems to be a part of the design to do exactly two passes
and treat anything beyond that as bugs, doing it the way you
suggest would significantly increase the build time since the
kallsyms+linker stage cannot be done in parallel or sped up
using ccache.

> But including the veneer symbols in kallsyms is arguably not all
> that useful.
>
> The main potential effect is that profiling might occasionally
> sample the PC as being in a completely bogus function which it
> never passed through at all, because of the way kallsyms tracks
> only symbol locations and not sizes (if I remember right) --
> so a veneer will actually get accounted against some arbitrary
> adjacent function.
>
> I don't know how much this matters.

Interesting point. No idea how often that happens. All the veneers
for one section are in one place though, so we could in theory
add a kallsyms entry for that section as long as can identify
it.

> > The easiest solution is to skip veneers in kallsyms in the
> > same way we already skip a couple of other symbols.
>
> The other symbols are not stripped for the purpose of making
> kallsyms terminate quickly. The mapping symbols are rather
> different: masses of symbols with duplicate names which are
> not very interesting for most people.

Right.

> > Signed-off-by: Arnd Bergmann <[email protected]>
> > ---
> > diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
> > index 487ac6f..53ec0bb 100644
> > --- a/scripts/kallsyms.c
> > +++ b/scripts/kallsyms.c
> > @@ -69,14 +69,32 @@ static void usage(void)
> > exit(1);
> > }
> >
> > -/*
> > - * This ignores the intensely annoying "mapping symbols" found
> > - * in ARM ELF files: $a, $t and $d.
> > - */
> > static inline int is_arm_mapping_symbol(const char *str)
>
> The function's name is now wrong. Should it be renamed or split up?

Sure I can rename it. Any suggestions?

Arnd

2013-07-05 14:51:44

by Dave Martin

[permalink] [raw]
Subject: Re: scripts/kallsyms: Avoid ARM veneer symbols

On Wed, Jul 03, 2013 at 06:03:04PM +0200, Arnd Bergmann wrote:
> On Tuesday 25 June 2013, Dave Martin wrote:
> > On Mon, Jun 24, 2013 at 04:01:43PM +0200, Arnd Bergmann wrote:
> > > When building ARM kernels that exceed a certain size, the linker
> > > will add "veneers" that allow long branches. Unfortunately,
> > > using a longer kallsyms section may lead to the extra veneers
> > > being needed, which leads to inconsistent kallsyms data with the
> > > message
> > >
> > > Inconsistent kallsyms data
> > > Try make KALLSYMS_EXTRA_PASS=1 as a workaround
> > >
> > > In some case, not just one but up to three extra passes were
> > > needed before getting to a state that needed no extra veneers.
> >
> > Do you understand why this was happening? It sounds like there
> > must have been branches crossing from one side of the kallsyms
> > data to the other, triggering veneer insertion any time the
> > kallsyms data grows.
> >
> > Branches between .{init,exit}.text and .text (crossing .rodata) seem the
> > likeliest culprits, but that's guesswork on my part.
>
> kallsyms is part of rodata, which is between .text and .init.text.
> I understand this is intentional, and we already use two passes
> of kallsyms to make sure the size is right. However the assumption
> is that only the data of the kallsyms table changes between runs,
> not the size of it.

I guess sticking the init stuff at the end makes it tidier to
discard without leaving a gap in the middle of the kernel image.
There might be other reasons.

> > If that's whats going on, multiple kallsyms passes is actually a correct
> > approach here: I think they should terminate after a number of steps
> > roughly proportional to log(number of branches across .rodata). We
> > could come up with a heuristic which allows us to choose the right
> > limit with a high degree of reliability, since branch density in
> > compiled C code is likely to be roughly.
>
> It also seems to be a part of the design to do exactly two passes
> and treat anything beyond that as bugs, doing it the way you

Sure -- I was that making the point the implementation is based on
an assumption that isn't true here.

> suggest would significantly increase the build time since the
> kallsyms+linker stage cannot be done in parallel or sped up
> using ccache.
>
> > But including the veneer symbols in kallsyms is arguably not all
> > that useful.
> >
> > The main potential effect is that profiling might occasionally
> > sample the PC as being in a completely bogus function which it
> > never passed through at all, because of the way kallsyms tracks
> > only symbol locations and not sizes (if I remember right) --
> > so a veneer will actually get accounted against some arbitrary
> > adjacent function.
> >
> > I don't know how much this matters.
>
> Interesting point. No idea how often that happens. All the veneers
> for one section are in one place though, so we could in theory
> add a kallsyms entry for that section as long as can identify
> it.

We could collapse any contiguous sequence of __veneer_* symbols down
to a single symbol to mark those holes.

However, many kallsyms passes could still be needed: each pass might add
extra veneer blocks, unless the size of kallsyms data is identical to
that in the previous pass. The expected convergence rate is faster,
though.

> > > The easiest solution is to skip veneers in kallsyms in the
> > > same way we already skip a couple of other symbols.

Your suggestion of omitting the symbols completely seems to be the only
way to ensure convergence in 2 passes, so far as I can see.


Adding size information to every entry in kallsyms would make it possible
to identify the "holes" without potentially requiring many kallsyms passes,
but it would bloat the table. The extra info would be interesting only
for a tiny subset of the symbols. I expect people aren't going to like
that much.

So I guess your original suggestion may be the best thing to do for now,
after all...

There is no proper reserved symbol namespace for linker-generated symbols,
so a real symbol could have the name __*_veneer, at which point things
start to get really confusing. But hopefully that won't happen much.
I don't see any such symbol right now, and hopefully nobody will be so
silly as to add one...

If we eventually need to fix the bogus symbol resolution problem, I can't
see an alternative to adding size info to every symbol. But we should
leave that for now. It doesn't sound like a serious problem.

> > The other symbols are not stripped for the purpose of making
> > kallsyms terminate quickly. The mapping symbols are rather
> > different: masses of symbols with duplicate names which are
> > not very interesting for most people.
>
> Right.
>
> > > Signed-off-by: Arnd Bergmann <[email protected]>
> > > ---
> > > diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
> > > index 487ac6f..53ec0bb 100644
> > > --- a/scripts/kallsyms.c
> > > +++ b/scripts/kallsyms.c
> > > @@ -69,14 +69,32 @@ static void usage(void)
> > > exit(1);
> > > }
> > >
> > > -/*
> > > - * This ignores the intensely annoying "mapping symbols" found
> > > - * in ARM ELF files: $a, $t and $d.
> > > - */
> > > static inline int is_arm_mapping_symbol(const char *str)
> >
> > The function's name is now wrong. Should it be renamed or split up?
>
> Sure I can rename it. Any suggestions?

Maybe just something more generic like is_arm_special_symbol()?

Cheers
---Dave

2013-07-05 16:43:21

by Arnd Bergmann

[permalink] [raw]
Subject: Re: scripts/kallsyms: Avoid ARM veneer symbols

On Friday 05 July 2013, Dave P Martin wrote:
> On Wed, Jul 03, 2013 at 06:03:04PM +0200, Arnd Bergmann wrote:
> > On Tuesday 25 June 2013, Dave Martin wrote:
> > suggest would significantly increase the build time since the
> > kallsyms+linker stage cannot be done in parallel or sped up
> > using ccache.
> >
> > > But including the veneer symbols in kallsyms is arguably not all
> > > that useful.
> > >
> > > The main potential effect is that profiling might occasionally
> > > sample the PC as being in a completely bogus function which it
> > > never passed through at all, because of the way kallsyms tracks
> > > only symbol locations and not sizes (if I remember right) --
> > > so a veneer will actually get accounted against some arbitrary
> > > adjacent function.
> > >
> > > I don't know how much this matters.
> >
> > Interesting point. No idea how often that happens. All the veneers
> > for one section are in one place though, so we could in theory
> > add a kallsyms entry for that section as long as can identify
> > it.
>
> We could collapse any contiguous sequence of __veneer_* symbols down
> to a single symbol to mark those holes.
>
> However, many kallsyms passes could still be needed: each pass might add
> extra veneer blocks, unless the size of kallsyms data is identical to
> that in the previous pass. The expected convergence rate is faster,
> though.

Right, it wouldn't guarantee to converge after two passes, just make
it very likely.

> > > > The easiest solution is to skip veneers in kallsyms in the
> > > > same way we already skip a couple of other symbols.
>
> Your suggestion of omitting the symbols completely seems to be the only
> way to ensure convergence in 2 passes, so far as I can see.
>
>
> Adding size information to every entry in kallsyms would make it possible
> to identify the "holes" without potentially requiring many kallsyms passes,
> but it would bloat the table. The extra info would be interesting only
> for a tiny subset of the symbols. I expect people aren't going to like
> that much.
>
> So I guess your original suggestion may be the best thing to do for now,
> after all...
>
> There is no proper reserved symbol namespace for linker-generated symbols,
> so a real symbol could have the name __*_veneer, at which point things
> start to get really confusing. But hopefully that won't happen much.
> I don't see any such symbol right now, and hopefully nobody will be so
> silly as to add one...
>
> If we eventually need to fix the bogus symbol resolution problem, I can't
> see an alternative to adding size info to every symbol. But we should
> leave that for now. It doesn't sound like a serious problem.

Unfortunately I have run into additional problems now after doing a few
hundred more builds:

* not all veneers end in _veneer, some also end in _from_arm or _from_thumb.
This is easy enough to check for in the same way I did for _veneer.

* There are actually symbols without a name on ARM, which screws up the
kallsyms.c parser. These also seem to be veneers, but attached to some
random function:

$ nm obj-tmp/.tmp_vmlinux1 | head
c09e8db1 t
c09e8db5 t
c09e8db9 t # <==========
c09e8dbd t
c0abfc29 t
c0008000 t $a
c0f7b640 t $a

$ objdump -Dr obj-tmp/.tmp_vmlinux1 | grep -C 30 c09e8db.
c0851fcc <wlc_phy_edcrs_lock>:
c0851fcc: b538 push {r3, r4, r5, lr}
c0851fce: b500 push {lr}
c0851fd0: f7bb d8dc bl c000d18c <__gnu_mcount_nc>
c0851fd4: f240 456b movw r5, #1131 ; 0x46b
c0851fd8: 4604 mov r4, r0
c0851fda: f880 14d5 strb.w r1, [r0, #1237] ; 0x4d5
c0851fde: 462a mov r2, r5
c0851fe0: f44f 710b mov.w r1, #556 ; 0x22c
c0851fe4: f7ff fe6d bl c0851cc2 <write_phy_reg>
c0851fe8: 4620 mov r0, r4
c0851fea: 462a mov r2, r5
c0851fec: f240 212d movw r1, #557 ; 0x22d
c0851ff0: f7ff fe67 bl c0851cc2 <write_phy_reg>
c0851ff4: 4620 mov r0, r4
c0851ff6: f240 212e movw r1, #558 ; 0x22e
c0851ffa: f44f 7270 mov.w r2, #960 ; 0x3c0
c0851ffe: f196 fedb bl c09e8db8 <tpci200_free_irq+0x78> # <===========
c0852002: 4620 mov r0, r4
c0852004: f240 212f movw r1, #559 ; 0x22f
c0852008: f44f 7270 mov.w r2, #960 ; 0x3c0
c085200c: e8bd 4038 ldmia.w sp!, {r3, r4, r5, lr}
c0852010: f7ff be57 b.w c0851cc2 <write_phy_reg>


... # in tpci200_free_irq:
c09e8d9e: e003 b.n c09e8da8 <tpci200_free_irq+0x68>
c09e8da0: f06f 0415 mvn.w r4, #21
c09e8da4: e000 b.n c09e8da8 <tpci200_free_irq+0x68>
c09e8da6: 4c01 ldr r4, [pc, #4] ; (c09e8dac <tpci200_free_irq+0x6c>)
c09e8da8: 4620 mov r0, r4
c09e8daa: bdf8 pop {r3, r4, r5, r6, r7, pc}
c09e8dac: fffffe00 ; <UNDEFINED> instruction: 0xfffffe00
c09e8db0: f4cf b814 b.w c06b7ddc <bna_enet_sm_chld_stop_wait_entry>
c09e8db4: f53e bed8 b.w c0727b68 <gem_do_stop>
c09e8db8: f668 bf83 b.w c0851cc2 <write_phy_reg> # <==========
c09e8dbc: d101 bne.n c09e8dc2 <tpci200_free_irq+0x82>
c09e8dbe: f435 b920 b.w c061e002 <twl_reset_sequence+0x34c>

It makes no sense to me at all that a function in one driver can just call
write_phy_reg a couple of times, but need a veneer in the middle, and put
that veneer in a totally unrelated function in another driver!

If this is a binutils bug or gcc bug, we should probably just fix it, but it
might be easier to work around it by changing kallsyms.c some more.

> > > > -/*
> > > > - * This ignores the intensely annoying "mapping symbols" found
> > > > - * in ARM ELF files: $a, $t and $d.
> > > > - */
> > > > static inline int is_arm_mapping_symbol(const char *str)
> > >
> > > The function's name is now wrong. Should it be renamed or split up?
> >
> > Sure I can rename it. Any suggestions?
>
> Maybe just something more generic like is_arm_special_symbol()?

Ok, I can do that.

Arnd

2013-07-05 17:27:39

by Dave Martin

[permalink] [raw]
Subject: Re: scripts/kallsyms: Avoid ARM veneer symbols

On Fri, Jul 05, 2013 at 05:42:44PM +0100, Arnd Bergmann wrote:
> On Friday 05 July 2013, Dave P Martin wrote:
> > On Wed, Jul 03, 2013 at 06:03:04PM +0200, Arnd Bergmann wrote:
> > > On Tuesday 25 June 2013, Dave Martin wrote:
> > > suggest would significantly increase the build time since the
> > > kallsyms+linker stage cannot be done in parallel or sped up
> > > using ccache.
> > >
> > > > But including the veneer symbols in kallsyms is arguably not all
> > > > that useful.
> > > >
> > > > The main potential effect is that profiling might occasionally
> > > > sample the PC as being in a completely bogus function which it
> > > > never passed through at all, because of the way kallsyms tracks
> > > > only symbol locations and not sizes (if I remember right) --
> > > > so a veneer will actually get accounted against some arbitrary
> > > > adjacent function.
> > > >
> > > > I don't know how much this matters.
> > >
> > > Interesting point. No idea how often that happens. All the veneers
> > > for one section are in one place though, so we could in theory
> > > add a kallsyms entry for that section as long as can identify
> > > it.
> >
> > We could collapse any contiguous sequence of __veneer_* symbols down
> > to a single symbol to mark those holes.
> >
> > However, many kallsyms passes could still be needed: each pass might add
> > extra veneer blocks, unless the size of kallsyms data is identical to
> > that in the previous pass. The expected convergence rate is faster,
> > though.
>
> Right, it wouldn't guarantee to converge after two passes, just make
> it very likely.
>
> > > > > The easiest solution is to skip veneers in kallsyms in the
> > > > > same way we already skip a couple of other symbols.
> >
> > Your suggestion of omitting the symbols completely seems to be the only
> > way to ensure convergence in 2 passes, so far as I can see.
> >
> >
> > Adding size information to every entry in kallsyms would make it possible
> > to identify the "holes" without potentially requiring many kallsyms passes,
> > but it would bloat the table. The extra info would be interesting only
> > for a tiny subset of the symbols. I expect people aren't going to like
> > that much.
> >
> > So I guess your original suggestion may be the best thing to do for now,
> > after all...
> >
> > There is no proper reserved symbol namespace for linker-generated symbols,
> > so a real symbol could have the name __*_veneer, at which point things
> > start to get really confusing. But hopefully that won't happen much.
> > I don't see any such symbol right now, and hopefully nobody will be so
> > silly as to add one...
> >
> > If we eventually need to fix the bogus symbol resolution problem, I can't
> > see an alternative to adding size info to every symbol. But we should
> > leave that for now. It doesn't sound like a serious problem.
>
> Unfortunately I have run into additional problems now after doing a few
> hundred more builds:
>
> * not all veneers end in _veneer, some also end in _from_arm or _from_thumb.
> This is easy enough to check for in the same way I did for _veneer.

I think there are a small number of patterns to check for.

__*_veneer, __*_from_arm and __*_from_thumb should cover most cases.

> * There are actually symbols without a name on ARM, which screws up the
> kallsyms.c parser. These also seem to be veneers, but attached to some
> random function:

Hmmm, I don't what those are. By default, we should probably ignore those
too. Maybe they have something to do with link-time relocation processing.

>
> $ nm obj-tmp/.tmp_vmlinux1 | head
> c09e8db1 t
> c09e8db5 t
> c09e8db9 t # <==========
> c09e8dbd t
> c0abfc29 t
> c0008000 t $a
> c0f7b640 t $a
>
> $ objdump -Dr obj-tmp/.tmp_vmlinux1 | grep -C 30 c09e8db.
> c0851fcc <wlc_phy_edcrs_lock>:
> c0851fcc: b538 push {r3, r4, r5, lr}
> c0851fce: b500 push {lr}
> c0851fd0: f7bb d8dc bl c000d18c <__gnu_mcount_nc>
> c0851fd4: f240 456b movw r5, #1131 ; 0x46b
> c0851fd8: 4604 mov r4, r0
> c0851fda: f880 14d5 strb.w r1, [r0, #1237] ; 0x4d5
> c0851fde: 462a mov r2, r5
> c0851fe0: f44f 710b mov.w r1, #556 ; 0x22c
> c0851fe4: f7ff fe6d bl c0851cc2 <write_phy_reg>
> c0851fe8: 4620 mov r0, r4
> c0851fea: 462a mov r2, r5
> c0851fec: f240 212d movw r1, #557 ; 0x22d
> c0851ff0: f7ff fe67 bl c0851cc2 <write_phy_reg>
> c0851ff4: 4620 mov r0, r4
> c0851ff6: f240 212e movw r1, #558 ; 0x22e
> c0851ffa: f44f 7270 mov.w r2, #960 ; 0x3c0
> c0851ffe: f196 fedb bl c09e8db8 <tpci200_free_irq+0x78> # <===========
> c0852002: 4620 mov r0, r4
> c0852004: f240 212f movw r1, #559 ; 0x22f
> c0852008: f44f 7270 mov.w r2, #960 ; 0x3c0
> c085200c: e8bd 4038 ldmia.w sp!, {r3, r4, r5, lr}
> c0852010: f7ff be57 b.w c0851cc2 <write_phy_reg>
>
>
> ... # in tpci200_free_irq:
> c09e8d9e: e003 b.n c09e8da8 <tpci200_free_irq+0x68>
> c09e8da0: f06f 0415 mvn.w r4, #21
> c09e8da4: e000 b.n c09e8da8 <tpci200_free_irq+0x68>
> c09e8da6: 4c01 ldr r4, [pc, #4] ; (c09e8dac <tpci200_free_irq+0x6c>)
> c09e8da8: 4620 mov r0, r4
> c09e8daa: bdf8 pop {r3, r4, r5, r6, r7, pc}
> c09e8dac: fffffe00 ; <UNDEFINED> instruction: 0xfffffe00
> c09e8db0: f4cf b814 b.w c06b7ddc <bna_enet_sm_chld_stop_wait_entry>
> c09e8db4: f53e bed8 b.w c0727b68 <gem_do_stop>
> c09e8db8: f668 bf83 b.w c0851cc2 <write_phy_reg> # <==========
> c09e8dbc: d101 bne.n c09e8dc2 <tpci200_free_irq+0x82>
> c09e8dbe: f435 b920 b.w c061e002 <twl_reset_sequence+0x34c>
>
> It makes no sense to me at all that a function in one driver can just call
> write_phy_reg a couple of times, but need a veneer in the middle, and put
> that veneer in a totally unrelated function in another driver!

I think that if ld inserts a veneer for a function anywhere, branches
from any object in the link to that target symbol can reuse the same
veneer as a trampoline, effectively appearing to branch through an
unrelated location to reach the destination.

ld inserts veneers between individual input sections, but I don't
think they have to go next to the same section the branch originates
from. In the above code, it looks like that series of unconditional
branches after the end of tpci200_free_irq might be a common veneer pool
for many different destinations.

LTO may also make the expected compilation unit boundaries disappear
completely. Anything could end up almost anywhere in that case.
Files could get intermingled, inlined and generally spread all over the
place.

Even so, veneers shouldn't be needed in the common case where we're not
jumping across .rodata.

>
> If this is a binutils bug or gcc bug, we should probably just fix it, but it
> might be easier to work around it by changing kallsyms.c some more.

I haven't found a trivial way to reproduce those nameless symbols.
I don't know whether they're a bug or not...

Making kallsyms robust against this might be a good idea anyway.

> > > > > -/*
> > > > > - * This ignores the intensely annoying "mapping symbols" found
> > > > > - * in ARM ELF files: $a, $t and $d.
> > > > > - */
> > > > > static inline int is_arm_mapping_symbol(const char *str)
> > > >
> > > > The function's name is now wrong. Should it be renamed or split up?
> > >
> > > Sure I can rename it. Any suggestions?
> >
> > Maybe just something more generic like is_arm_special_symbol()?
>
> Ok, I can do that.

Cheers
---Dave

2013-07-05 23:35:11

by Arnd Bergmann

[permalink] [raw]
Subject: Re: scripts/kallsyms: Avoid ARM veneer symbols

On Friday 05 July 2013, Dave P Martin wrote:
> On Fri, Jul 05, 2013 at 05:42:44PM +0100, Arnd Bergmann wrote:
> > On Friday 05 July 2013, Dave P Martin wrote:
> > > On Wed, Jul 03, 2013 at 06:03:04PM +0200, Arnd Bergmann wrote:
>
> I think there are a small number of patterns to check for.
>
> __*_veneer, __*_from_arm and __*_from_thumb should cover most cases.

Ok.

> > * There are actually symbols without a name on ARM, which screws up the
> > kallsyms.c parser. These also seem to be veneers, but attached to some
> > random function:
>
> Hmmm, I don't what those are. By default, we should probably ignore those
> too. Maybe they have something to do with link-time relocation processing.

Definitely link-time. It only shows up after the final link, and only
with ld.bfd not with ld.gold as I found out now.

> > $ nm obj-tmp/.tmp_vmlinux1 | head
> > c09e8db1 t
> > c09e8db5 t
> > c09e8db9 t # <==========
> > c09e8dbd t
> > c0abfc29 t
> > c0008000 t $a
> > c0f7b640 t $a
> >
> > $ objdump -Dr obj-tmp/.tmp_vmlinux1 | grep -C 30 c09e8db.
> > c0851fcc <wlc_phy_edcrs_lock>:
> > c0851fcc: b538 push {r3, r4, r5, lr}
> > c0851fce: b500 push {lr}
> > c0851fd0: f7bb d8dc bl c000d18c <__gnu_mcount_nc>
> > c0851fd4: f240 456b movw r5, #1131 ; 0x46b
> > c0851fd8: 4604 mov r4, r0
> > c0851fda: f880 14d5 strb.w r1, [r0, #1237] ; 0x4d5
> > c0851fde: 462a mov r2, r5
> > c0851fe0: f44f 710b mov.w r1, #556 ; 0x22c
> > c0851fe4: f7ff fe6d bl c0851cc2 <write_phy_reg>
> > c0851fe8: 4620 mov r0, r4
> > c0851fea: 462a mov r2, r5
> > c0851fec: f240 212d movw r1, #557 ; 0x22d
> > c0851ff0: f7ff fe67 bl c0851cc2 <write_phy_reg>
> > c0851ff4: 4620 mov r0, r4
> > c0851ff6: f240 212e movw r1, #558 ; 0x22e
> > c0851ffa: f44f 7270 mov.w r2, #960 ; 0x3c0
> > c0851ffe: f196 fedb bl c09e8db8 <tpci200_free_irq+0x78> # <===========
> > c0852002: 4620 mov r0, r4
> > c0852004: f240 212f movw r1, #559 ; 0x22f
> > c0852008: f44f 7270 mov.w r2, #960 ; 0x3c0
> > c085200c: e8bd 4038 ldmia.w sp!, {r3, r4, r5, lr}
> > c0852010: f7ff be57 b.w c0851cc2 <write_phy_reg>
> >
> >
> > ... # in tpci200_free_irq:
> > c09e8d9e: e003 b.n c09e8da8 <tpci200_free_irq+0x68>
> > c09e8da0: f06f 0415 mvn.w r4, #21
> > c09e8da4: e000 b.n c09e8da8 <tpci200_free_irq+0x68>
> > c09e8da6: 4c01 ldr r4, [pc, #4] ; (c09e8dac <tpci200_free_irq+0x6c>)
> > c09e8da8: 4620 mov r0, r4
> > c09e8daa: bdf8 pop {r3, r4, r5, r6, r7, pc}
> > c09e8dac: fffffe00 ; <UNDEFINED> instruction: 0xfffffe00
> > c09e8db0: f4cf b814 b.w c06b7ddc <bna_enet_sm_chld_stop_wait_entry>
> > c09e8db4: f53e bed8 b.w c0727b68 <gem_do_stop>
> > c09e8db8: f668 bf83 b.w c0851cc2 <write_phy_reg> # <==========
> > c09e8dbc: d101 bne.n c09e8dc2 <tpci200_free_irq+0x82>
> > c09e8dbe: f435 b920 b.w c061e002 <twl_reset_sequence+0x34c>
> >
> > It makes no sense to me at all that a function in one driver can just call
> > write_phy_reg a couple of times, but need a veneer in the middle, and put
> > that veneer in a totally unrelated function in another driver!
>
> I think that if ld inserts a veneer for a function anywhere, branches
> from any object in the link to that target symbol can reuse the same
> veneer as a trampoline, effectively appearing to branch through an
> unrelated location to reach the destination.

That part makes sense, but it doesn't explain why ld would do that just
for the third out of four identical function calls in the example above.

> ld inserts veneers between individual input sections, but I don't
> think they have to go next to the same section the branch originates
> from. In the above code, it looks like that series of unconditional
> branches after the end of tpci200_free_irq might be a common veneer pool
> for many different destinations.

Yes, exactly. In this build I had six of these nameless symbols, and five
of them were in this one function.

> LTO may also make the expected compilation unit boundaries disappear
> completely. Anything could end up almost anywhere in that case.
> Files could get intermingled, inlined and generally spread all over the
> place.

I'm not sure we actually want to enable that in the kernel ;-)

In particular in combination with kallsyms, it would make the kallsyms
information rather useless when we can no longer infer a function name
from an address.

> Even so, veneers shouldn't be needed in the common case where we're not
> jumping across .rodata.
>
> >
> > If this is a binutils bug or gcc bug, we should probably just fix it, but it
> > might be easier to work around it by changing kallsyms.c some more.
>
> I haven't found a trivial way to reproduce those nameless symbols.
> I don't know whether they're a bug or not...
>
> Making kallsyms robust against this might be a good idea anyway.

Maybe we can find a binutils expert next week at Linaro connect to take a
look at the data. I can prepare a test case.

Arnd

2013-07-08 10:00:23

by Dave Martin

[permalink] [raw]
Subject: Re: scripts/kallsyms: Avoid ARM veneer symbols

On Sat, Jul 06, 2013 at 01:34:56AM +0200, Arnd Bergmann wrote:
> On Friday 05 July 2013, Dave P Martin wrote:
> > On Fri, Jul 05, 2013 at 05:42:44PM +0100, Arnd Bergmann wrote:
> > > On Friday 05 July 2013, Dave P Martin wrote:
> > > > On Wed, Jul 03, 2013 at 06:03:04PM +0200, Arnd Bergmann wrote:
> >
> > I think there are a small number of patterns to check for.
> >
> > __*_veneer, __*_from_arm and __*_from_thumb should cover most cases.
>
> Ok.
>
> > > * There are actually symbols without a name on ARM, which screws up the
> > > kallsyms.c parser. These also seem to be veneers, but attached to some
> > > random function:
> >
> > Hmmm, I don't what those are. By default, we should probably ignore those
> > too. Maybe they have something to do with link-time relocation processing.
>
> Definitely link-time. It only shows up after the final link, and only
> with ld.bfd not with ld.gold as I found out now.
>
> > > $ nm obj-tmp/.tmp_vmlinux1 | head
> > > c09e8db1 t
> > > c09e8db5 t
> > > c09e8db9 t # <==========
> > > c09e8dbd t
> > > c0abfc29 t
> > > c0008000 t $a
> > > c0f7b640 t $a
> > >
> > > $ objdump -Dr obj-tmp/.tmp_vmlinux1 | grep -C 30 c09e8db.
> > > c0851fcc <wlc_phy_edcrs_lock>:
> > > c0851fcc: b538 push {r3, r4, r5, lr}
> > > c0851fce: b500 push {lr}
> > > c0851fd0: f7bb d8dc bl c000d18c <__gnu_mcount_nc>
> > > c0851fd4: f240 456b movw r5, #1131 ; 0x46b
> > > c0851fd8: 4604 mov r4, r0
> > > c0851fda: f880 14d5 strb.w r1, [r0, #1237] ; 0x4d5
> > > c0851fde: 462a mov r2, r5
> > > c0851fe0: f44f 710b mov.w r1, #556 ; 0x22c
> > > c0851fe4: f7ff fe6d bl c0851cc2 <write_phy_reg>
> > > c0851fe8: 4620 mov r0, r4
> > > c0851fea: 462a mov r2, r5
> > > c0851fec: f240 212d movw r1, #557 ; 0x22d
> > > c0851ff0: f7ff fe67 bl c0851cc2 <write_phy_reg>
> > > c0851ff4: 4620 mov r0, r4
> > > c0851ff6: f240 212e movw r1, #558 ; 0x22e
> > > c0851ffa: f44f 7270 mov.w r2, #960 ; 0x3c0
> > > c0851ffe: f196 fedb bl c09e8db8 <tpci200_free_irq+0x78> # <===========
> > > c0852002: 4620 mov r0, r4
> > > c0852004: f240 212f movw r1, #559 ; 0x22f
> > > c0852008: f44f 7270 mov.w r2, #960 ; 0x3c0
> > > c085200c: e8bd 4038 ldmia.w sp!, {r3, r4, r5, lr}
> > > c0852010: f7ff be57 b.w c0851cc2 <write_phy_reg>
> > >
> > >
> > > ... # in tpci200_free_irq:
> > > c09e8d9e: e003 b.n c09e8da8 <tpci200_free_irq+0x68>
> > > c09e8da0: f06f 0415 mvn.w r4, #21
> > > c09e8da4: e000 b.n c09e8da8 <tpci200_free_irq+0x68>
> > > c09e8da6: 4c01 ldr r4, [pc, #4] ; (c09e8dac <tpci200_free_irq+0x6c>)
> > > c09e8da8: 4620 mov r0, r4
> > > c09e8daa: bdf8 pop {r3, r4, r5, r6, r7, pc}
> > > c09e8dac: fffffe00 ; <UNDEFINED> instruction: 0xfffffe00
> > > c09e8db0: f4cf b814 b.w c06b7ddc <bna_enet_sm_chld_stop_wait_entry>
> > > c09e8db4: f53e bed8 b.w c0727b68 <gem_do_stop>
> > > c09e8db8: f668 bf83 b.w c0851cc2 <write_phy_reg> # <==========
> > > c09e8dbc: d101 bne.n c09e8dc2 <tpci200_free_irq+0x82>
> > > c09e8dbe: f435 b920 b.w c061e002 <twl_reset_sequence+0x34c>
> > >
> > > It makes no sense to me at all that a function in one driver can just call
> > > write_phy_reg a couple of times, but need a veneer in the middle, and put
> > > that veneer in a totally unrelated function in another driver!
> >
> > I think that if ld inserts a veneer for a function anywhere, branches
> > from any object in the link to that target symbol can reuse the same
> > veneer as a trampoline, effectively appearing to branch through an
> > unrelated location to reach the destination.
>
> That part makes sense, but it doesn't explain why ld would do that just
> for the third out of four identical function calls in the example above.
>
> > ld inserts veneers between individual input sections, but I don't
> > think they have to go next to the same section the branch originates
> > from. In the above code, it looks like that series of unconditional
> > branches after the end of tpci200_free_irq might be a common veneer pool
> > for many different destinations.
>
> Yes, exactly. In this build I had six of these nameless symbols, and five
> of them were in this one function.
>
> > LTO may also make the expected compilation unit boundaries disappear
> > completely. Anything could end up almost anywhere in that case.
> > Files could get intermingled, inlined and generally spread all over the
> > place.
>
> I'm not sure we actually want to enable that in the kernel ;-)
>
> In particular in combination with kallsyms, it would make the kallsyms
> information rather useless when we can no longer infer a function name
> from an address.

Well, indeed. But that's a separate discussion -- I don't think we want
to block it needlessly just due to an obscure feature of the ARM toolchain.

Ignoring veneers and nameless symbols in kallsyms sounds like a reasonable
approach for now, even if it's not a perfect.

> > Even so, veneers shouldn't be needed in the common case where we're not
> > jumping across .rodata.
> >
> > >
> > > If this is a binutils bug or gcc bug, we should probably just fix it, but it
> > > might be easier to work around it by changing kallsyms.c some more.
> >
> > I haven't found a trivial way to reproduce those nameless symbols.
> > I don't know whether they're a bug or not...
> >
> > Making kallsyms robust against this might be a good idea anyway.
>
> Maybe we can find a binutils expert next week at Linaro connect to take a
> look at the data. I can prepare a test case.

Sure, that could be worth a try.

Cheers
---Dave