2010-08-03 11:49:27

by Dave Martin

[permalink] [raw]
Subject: [PATCH 0/2] perf: symbol offset breakage with separated debug

Hi all,

I've hit some problems in trying to get perf record/report to
work correctly on systems with separated debug images (Ubuntu,
in particular). I worked on some patches to ensure that
separate debug images do actually get loaded, when present --
these commits have now been merged in linux-2.6-tip/master.
(See below for a list of the commits.)

Now that these are in place, I'm observing a new problem which
can mess up symbol locations -- though I think there might be a
practical workaround, I can't see a trivial solution, so I'm
keen to hear if anyone has any ideas.


The problem:

perf makes some incorrect assumptions, which mean that the
symbol locations seen by perf report and friends can be
incorrect.

My analysis:

a) perf represents symbols as offsets from the start of
the mmap'd code region which contains each
symbol.

b) For shared libs (ET_DYN objects), the desired offset
is (usually) equal to the VMA of each given
symbol in the image. Perf assumes that these
offsets are always equal, and this normally
works.

Atypical link configurations could cause this
to fail, but this would probably be unusual in
the real world.

c) For executables (ET_EXEC objects), the desired
offset is:

st_addr - (mmap_start + sh_offset - mmap_offset)

Where st_addr is the symbol VMA, sh_offset is
the offset of the parent *section* (not
segment) from the start of the image,
mmap_start is the base VMA of the corresponding
mmap'd *segment* (or part of a segment) in
memory, and mmap_offset is the file offset at
which the mapping starts (as in mmap(2))

(As a minor note, the pgoff field in the perf
mmap event data (from which mmap_offset should
be deducible) is probably also incorrect at
present -- see snippet from util/event.c
below.)

d) perf assumes that the above is equivalent to:

st_addr - (sh_addr - sh_offset)

Where sh_addr is the VMA of the start of the
symbol's parent section.

This is true only given certain assumptions,
namely that each symbol's parent section is
mapped in such a way that mmap_offset is zero.
In practice, this is true for the main
executable segment of ET_EXEC images in typical
link configurations (i.e., using the default
GNU ld scripts).

e) The assumptions in (d) break down when using
separated debug images, because although
separated debug images contain faithful VMA
information, the image is stripped of loadable
sections' contents, resulting in radically
different sh_offset values from the "real"
image.

Because sh_offset bears no relationship to the
layout of the data mmap'd at runtime, this
produces bogus symbol adjustments which results
in the wrong symbol offsets (a) being
calculated.


Solutions:

(e) needs to be solved in order for perf report etc. to produce
sensible output when using separated debug images.

I think that a comprehensive solution should also fix the
assumptions in (b) and (d) so that any valid ELF images will be
processed correctly.


Solving (e) doesn't appear trivial, since it requires section
layout information from the image that was mmap'd at _runtime_
to be correlated with the symbol information collected maybe
from other files at _report_ time. Currently, runtime and
report time are treated rather separately, so this isn't
straightforward to achieve.

A simple workaround in the meantime would be to assume that
.text is mmap'd with mmap_offset=0 for ET_EXEC images, and fix
the adjustment calculation appropriately. This is not a full
fix but is probably worth doing if (as I guess) it gets the
tools working correctly in the common case: see

[PATCH 2/2] perf symbols: work around incorrect ET_EXEC symbol adjustment

-------------------------------------------------------------------------------

My current list of patches in linux-2.6-tip/master:

commit 6da80ce8c43ddda153208cbb46b75290cf566fac
perf symbols: Improve debug image search when loading symbols
commit 8b1389ef93b36621c6acdeb623bd85aee3c405c9
perf tools: remove extra build-id check factored into dso__load
commit 21916c380d93ab59d6d07ee198fb31c8f1338e26
perf tools: Factor out buildid reading and make it implicit in dso__load
commit 88ca895dd4e0e64ebd942adb7925fa60ca5b2a98
perf tools: Remove unneeded code for tracking the cwd in perf sessions
commit 361d13462585474267a0c41e956f1a1c19a93f17
perf report: Don't abbreviate file paths relative to the cwd


Illustrative code snippets (from v2.6.35):

util/symbol.c:

968:static int dso__load_sym(struct dso *self, struct map *map, const char *name,
969: int fd, symbol_filter_t filter, int kmodule,
970: int want_symtab)
971:{
[...]
1055: elf_symtab__for_each_symbol(syms, nr_syms, idx, sym) {
1133: if (curr_dso->adjust_symbols) {
[...]
1134: pr_debug4("%s: adjusting symbol: st_value: %#Lx "
1135: "sh_addr: %#Lx sh_offset: %#Lx\n", __func__,
1136: (u64)sym.st_value, (u64)shdr.sh_addr,
1137: (u64)shdr.sh_offset);
1138: sym.st_value -= shdr.sh_addr - shdr.sh_offset;
1139: }

Probable incorrect symbol adjustment which currently gets used
for ET_EXEC images. Explanation above.


util/event.c:

109:static int event__synthesize_mmap_events(pid_t pid, pid_t tgid,
110: event__handler_t process,
111: struct perf_session *session)
112:{
[...]
153: if (*pbf == 'x') { /* vm_exec */
[...]
166: /* pgoff is in bytes, not pages */
167: if (n >= 0)
168: ev.mmap.pgoff = vm_pgoff << getpagesize();
169: else
170: ev.mmap.pgoff = 0;

Unless I've misunderstood, getpagesize(2) returns the size in
bytes of a page; in this case using it as a shift count is
wrong.

Also, the code appears to be trying to convert a page count
into an address for storage in the profile data. This would
mean that when interpreting the data, knowledge of the page
size is not needed. However, since I can't see any code which
actually uses the pgoff information from the profile data, I'm
not sure whether this was the intention.

In any case, the offset field from /proc/*/maps is already a
byte count, not a page offset, so it does not look necessary to
do convert it at all.

See [PATCH 1/2] perf events: Fix mmap offset determination.

Dave Martin (2):
perf events: Fix mmap offset determination
perf symbols: work around incorrect ET_EXEC symbol adjustment

tools/perf/util/event.c | 8 +-------
tools/perf/util/symbol.c | 10 +++++++++-
2 files changed, 10 insertions(+), 8 deletions(-)


2010-08-03 11:48:59

by Dave Martin

[permalink] [raw]
Subject: [PATCH 1/2] perf events: Fix mmap offset determination

Fix buggy-looking code which unnecessarily adjusts the file offset
fields read from /proc/*/maps.

This may have gone unnoticed since the offset is usually 0 (and the
logic in util/symbol.c may work incorrectly for other offset values).

I make assumptions about the intended design here. The cover note
accompanying this patch contains a more detailed explanation.

Signed-off-by: Dave Martin <[email protected]>
---
tools/perf/util/event.c | 8 +-------
1 files changed, 1 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 6b0db55..db8a1d4 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -151,7 +151,6 @@ static int event__synthesize_mmap_events(pid_t pid, pid_t tgid,
continue;
pbf += n + 3;
if (*pbf == 'x') { /* vm_exec */
- u64 vm_pgoff;
char *execname = strchr(bf, '/');

/* Catch VDSO */
@@ -162,12 +161,7 @@ static int event__synthesize_mmap_events(pid_t pid, pid_t tgid,
continue;

pbf += 3;
- n = hex2u64(pbf, &vm_pgoff);
- /* pgoff is in bytes, not pages */
- if (n >= 0)
- ev.mmap.pgoff = vm_pgoff << getpagesize();
- else
- ev.mmap.pgoff = 0;
+ n = hex2u64(pbf, &ev.mmap.pgoff);

size = strlen(execname);
execname[size - 1] = '\0'; /* Remove \n */
--
1.7.0.4

2010-08-03 11:49:01

by Dave Martin

[permalink] [raw]
Subject: [PATCH 2/2] perf symbols: work around incorrect ET_EXEC symbol adjustment

Workaround to adjust .text symbols from ET_EXEC images correctly.

This is not a complete fix, but addresses the common case and does not
create additional assumptions beyond those which perf already makes.
ELF images with a weird link map may still be processed incorrectly
(as at present) -- that will require a separate fix.

The workaround allows correct symbol offsets to be generated for
ET_EXEC executables on systems with separated debug info (where
section sh_offset fields from the debug image may bear no relation to
the layout of the executable image).

The cover note accompanying this patch contains a more detailed
explanation.

Signed-off-by: Dave Martin <[email protected]>
---
tools/perf/util/symbol.c | 10 +++++++++-
1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 3b8c005..3930398 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1135,7 +1135,15 @@ static int dso__load_sym(struct dso *self, struct map *map, const char *name,
"sh_addr: %#Lx sh_offset: %#Lx\n", __func__,
(u64)sym.st_value, (u64)shdr.sh_addr,
(u64)shdr.sh_offset);
- sym.st_value -= shdr.sh_addr - shdr.sh_offset;
+ /* Assumptions:
+ * a) shdr.sh_addr - shdr.sh_offset ==
+ * map->start - map->pgoff
+ * b) map->pgoff == 0
+ * These are true iff we are looking at a function
+ * symbol in the main executable segment _and_
+ * the main executable segment starts at the start of
+ * the ELF image (normally true). */
+ sym.st_value -= map->start;
}
/*
* We need to figure out if the object was created from C++ sources
--
1.7.0.4

2010-08-03 13:54:38

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 0/2] perf: symbol offset breakage with separated debug

Em Tue, Aug 03, 2010 at 12:48:34PM +0100, Dave Martin escreveu:
> Hi all,
>
> I've hit some problems in trying to get perf record/report to
> work correctly on systems with separated debug images (Ubuntu,
> in particular). I worked on some patches to ensure that
> separate debug images do actually get loaded, when present --
> these commits have now been merged in linux-2.6-tip/master.
> (See below for a list of the commits.)
>
> Now that these are in place, I'm observing a new problem which
> can mess up symbol locations -- though I think there might be a
> practical workaround, I can't see a trivial solution, so I'm
> keen to hear if anyone has any ideas.
>
>
> The problem:
>
> perf makes some incorrect assumptions, which mean that the
> symbol locations seen by perf report and friends can be
> incorrect.
>
> My analysis:
>
> a) perf represents symbols as offsets from the start of
> the mmap'd code region which contains each
> symbol.

The symbol library in perf uses separate map_ip and unmap_ip to cope
with those, covering things like prelinked binaries, the kernel with its
absolute addresses, kexec cases where we capture a reference relocation
symbol so that we can adjust later when using the unrelocated vmlinux,
etc.

> b) For shared libs (ET_DYN objects), the desired offset
> is (usually) equal to the VMA of each given

VMA here stands for Virtual Memory Address, not Virtual Memory Area
where a map is loaded, right? I guess so, but trying to get to common
ground.

> symbol in the image. Perf assumes that these
> offsets are always equal, and this normally
> works.
>
> Atypical link configurations could cause this

Which ones? We need to detect those and act accordingly, doing
adjustments at load time or by using a separate map_ip/unmap_ip pair of
functions in struct map.

I'll continue reading your patches, the first one probably fixes a bug,
but wonder if you looked at the struct map->{map_ip,unmap_ip} routines
in tools/perf/util/map.h.

- Arnaldo

2010-08-03 14:31:35

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 1/2] perf events: Fix mmap offset determination

Em Tue, Aug 03, 2010 at 12:48:35PM +0100, Dave Martin escreveu:
> Fix buggy-looking code which unnecessarily adjusts the file offset
> fields read from /proc/*/maps.
>
> This may have gone unnoticed since the offset is usually 0 (and the
> logic in util/symbol.c may work incorrectly for other offset values).
>
> I make assumptions about the intended design here. The cover note
> accompanying this patch contains a more detailed explanation.
>
> Signed-off-by: Dave Martin <[email protected]>

Doing some investigation here...

4af8b35d (Anton Blanchard 2010-04-03 164) pbf += 3;
4af8b35d (Anton Blanchard 2010-04-03 165) n = hex2u64(pbf, &vm_pgoff);
4af8b35d (Anton Blanchard 2010-04-03 166) /* pgoff is in bytes, not pages */
4af8b35d (Anton Blanchard 2010-04-03 167) if (n >= 0)
4af8b35d (Anton Blanchard 2010-04-03 168) ev.mmap.pgoff = vm_pgoff << getpagesize();
4af8b35d (Anton Blanchard 2010-04-03 169) else
4af8b35d (Anton Blanchard 2010-04-03 170) ev.mmap.pgoff = 0;

commit 4af8b35db6634dd1e0d616de689582b6c93550af
Author: Anton Blanchard <[email protected]>
Date: Sat Apr 3 22:53:31 2010 +1100

perf symbols: Fill in pgoff in mmap synthesized events

When we synthesize mmap events we need to fill in the pgoff field.

I wasn't able to test this completely since I couldn't find an
executable region with a non 0 offset. We will see it when we start
doing data profiling.

------------------------

Yeah, not reassuring comment, looking at fs/proc/task_mmu.c we see:

static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma)
{
struct mm_struct *mm = vma->vm_mm;
struct file *file = vma->vm_file;
int flags = vma->vm_flags;
unsigned long ino = 0;
unsigned long long pgoff = 0;
dev_t dev = 0;
int len;

if (file) {
struct inode *inode = vma->vm_file->f_path.dentry->d_inode;
dev = inode->i_sb->s_dev;
ino = inode->i_ino;
pgoff = ((loff_t)vma->vm_pgoff) << PAGE_SHIFT;
}

seq_printf(m, "%08lx-%08lx %c%c%c%c %08llx %02x:%02x %lu %n",
vma->vm_start,
vma->vm_end,
flags & VM_READ ? 'r' : '-',
flags & VM_WRITE ? 'w' : '-',
flags & VM_EXEC ? 'x' : '-',
flags & VM_MAYSHARE ? 's' : 'p',
pgoff,
MAJOR(dev), MINOR(dev), ino, &len);

----------------------------------------------------------------------------

So yes, we're double shifting that, your patch is correct, applying it.

> ---
> tools/perf/util/event.c | 8 +-------
> 1 files changed, 1 insertions(+), 7 deletions(-)
>
> diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
> index 6b0db55..db8a1d4 100644
> --- a/tools/perf/util/event.c
> +++ b/tools/perf/util/event.c
> @@ -151,7 +151,6 @@ static int event__synthesize_mmap_events(pid_t pid, pid_t tgid,
> continue;
> pbf += n + 3;
> if (*pbf == 'x') { /* vm_exec */
> - u64 vm_pgoff;
> char *execname = strchr(bf, '/');
>
> /* Catch VDSO */
> @@ -162,12 +161,7 @@ static int event__synthesize_mmap_events(pid_t pid, pid_t tgid,
> continue;
>
> pbf += 3;
> - n = hex2u64(pbf, &vm_pgoff);
> - /* pgoff is in bytes, not pages */
> - if (n >= 0)
> - ev.mmap.pgoff = vm_pgoff << getpagesize();
> - else
> - ev.mmap.pgoff = 0;
> + n = hex2u64(pbf, &ev.mmap.pgoff);
>
> size = strlen(execname);
> execname[size - 1] = '\0'; /* Remove \n */
> --
> 1.7.0.4

2010-08-04 08:29:07

by Dave Martin

[permalink] [raw]
Subject: Re: [PATCH 0/2] perf: symbol offset breakage with separated debug

A follow-up on this:

On Tue, Aug 3, 2010 at 12:48 PM, Dave Martin <[email protected]> wrote:

[...]

>
> ? ? ? ?a) perf represents symbols as offsets from the start of
> ? ? ? ? ? ? ? ?the mmap'd code region which contains each
> ? ? ? ? ? ? ? ?symbol.

I've looked at the code some more, and it looks like I've
misunderstood here. It looks like symbol addresses are stored and
searched as _file offsets_ in relation to each mapping -- the
map__map_ip implementation bears this out, now that I understand how
it's used to translate from run-time vitrual address to a file offset:

static inline u64 map__map_ip(struct map *map, u64 ip)
{
return ip - map->start + map->pgoff;
}

Does this conclusion look wrong to anyone?


Assuming I've understood correctly now, this means that the existing
code _is_ correct, except that seperated debug images aren't processed
correctly.

This means that my patch "work around incorrect ET_EXEC symbol
adjustment" is therefore a bodge, not a fix -- it will "work" in many
cases, for the reasons previously discussed, but it's probably not the
right solution.


Looks like I need to think about this a bit more--- basically we need
to have the ELF section or program headers from the each mmap'd file
available in order to translate correctly between file offset and
link-time virtual address for each image. The information would need
to be available either when loading symbols or when doing symbol
lookups. As already discussed, the headers from separated debug
images are junk (at least the p_offset and sh_offset values are junk)
and will lead to wrong address calculations.

We could
a) Capture the relevant information during perf record, in
addition to capturing the build-ids. This might break the perf data
file format, but has the advantage (?) that perf report can display
correct image-relative virtual addresses even if the debug symbols
can't be loaded. map__map_ip could be converted to output
b) Try to read the loadable image _and_ the debug image during
perf report. Currently the code only loads one image per mapping.
Loading both and cross-referencing information between them would make
the loading process more complicated.

For (a) we know the program headers will be correct, since they can be
obtained from the exact file that was mapped during the profiling run.
For (b) the "real" file may have gone, and we search other locations
instead. So, we would need a way to detect which set of program
headers are the correct ones when we search for images to load. I'm
not sure what the best approach is for this -- maybe some checks for
empty PT_LOAD segments (i.e., for which p_filesz == 0, or is much
smaller than p_memsz) would work...


Any views welcome on which approach would be best.

Cheers
---Dave

2010-08-05 08:01:19

by Dave Martin

[permalink] [raw]
Subject: [tip:perf/core] perf events: Fix mmap offset determination

Commit-ID: b5a6325464b700c4bdac8799c495970516eed41c
Gitweb: http://git.kernel.org/tip/b5a6325464b700c4bdac8799c495970516eed41c
Author: Dave Martin <[email protected]>
AuthorDate: Tue, 3 Aug 2010 12:48:35 +0100
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Wed, 4 Aug 2010 12:41:23 -0300

perf events: Fix mmap offset determination

Fix buggy-looking code which unnecessarily adjusts the file offset
fields read from /proc/*/maps.

This may have gone unnoticed since the offset is usually 0 (and the
logic in util/symbol.c may work incorrectly for other offset values).

Commiter note:

This fixes a bug introduced in 4af8b35, there is no need to shift pgoff
twice, the show_map_vma routine in fs/proc/task_mmu.c already converts
it from the number of pages to the size in bytes, and that is what
appears in /proc/PID/map.

Cc: Nicolas Pitre <[email protected]>
Cc: Will Deacon <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Dave Martin <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/event.c | 8 +-------
1 files changed, 1 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 6b0db55..db8a1d4 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -151,7 +151,6 @@ static int event__synthesize_mmap_events(pid_t pid, pid_t tgid,
continue;
pbf += n + 3;
if (*pbf == 'x') { /* vm_exec */
- u64 vm_pgoff;
char *execname = strchr(bf, '/');

/* Catch VDSO */
@@ -162,12 +161,7 @@ static int event__synthesize_mmap_events(pid_t pid, pid_t tgid,
continue;

pbf += 3;
- n = hex2u64(pbf, &vm_pgoff);
- /* pgoff is in bytes, not pages */
- if (n >= 0)
- ev.mmap.pgoff = vm_pgoff << getpagesize();
- else
- ev.mmap.pgoff = 0;
+ n = hex2u64(pbf, &ev.mmap.pgoff);

size = strlen(execname);
execname[size - 1] = '\0'; /* Remove \n */

2010-08-12 13:19:47

by Dave Martin

[permalink] [raw]
Subject: [PATCH v2] perf symbols: fix symbol offset breakage with separated debug

Applies to 2.6.35
(git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6.git
perf/core branch)

The new patch loads the ELF section headers from a separate
file if necessary, to avoid getting confused by the different
section file offsets seen in debug images. Invalid section
headers are detected by checking for the presence of non-
writable SHT_NOBITS sections, which don't make sense under
normal circumstances.

In particular, this allows symbols in ET_EXEC images to get
fixed up correctly in the presence of separated debug images.

The image search loop is also tidied up to fix a minor bug
which would perform the same image load attempt more than
once in some situations.

For non-separated images, the headers should get loaded from
the same image as the symbols, in the usual way.

Signed-off-by: Dave Martin <[email protected]>
---
tools/perf/util/symbol.c | 185 +++++++++++++++++++++++++++++++++++++++++++---
1 files changed, 173 insertions(+), 12 deletions(-)

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 1a36773..21ae9df 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1,4 +1,5 @@
#define _GNU_SOURCE
+#include <assert.h>
#include <ctype.h>
#include <dirent.h>
#include <errno.h>
@@ -978,8 +979,106 @@ static size_t elf_addr_to_index(Elf *elf, GElf_Addr addr)
return -1;
}

+/**
+ * Read all section headers, copying them into a separate array so they survive
+ * elf_end.
+ *
+ * @elf: the libelf instance to operate on.
+ * @ehdr: the elf header: this must already have been read with gelf_getehdr().
+ * @count: the number of headers read is assigned to *count on successful
+ * return. count must not be NULL.
+ *
+ * Returns a pointer to the allocated headers, which should be deallocated with
+ * free() when no longer needed.
+ */
+static GElf_Shdr *elf_get_all_shdrs(Elf *elf, GElf_Ehdr const *ehdr,
+ unsigned *count)
+{
+ GElf_Shdr *shdrs;
+ Elf_Scn *scn;
+ unsigned max_index = 0;
+ unsigned i;
+
+ shdrs = malloc(ehdr->e_shnum * sizeof *shdrs);
+ if (!shdrs)
+ return NULL;
+
+ for (i = 0; i < ehdr->e_shnum; i++)
+ shdrs[i].sh_type = SHT_NULL;
+
+ for (scn = NULL; (scn = elf_nextscn(elf, scn)); ) {
+ size_t j;
+
+ /*
+ * Just assuming we get section 0, 1, ... in sequence may lead
+ * to wrong section indices. Check the index explicitly:
+ */
+ j = elf_ndxscn(scn);
+ assert(j < ehdr->e_shnum);
+
+ if (j > max_index)
+ max_index = j;
+
+ if (!gelf_getshdr(scn, &shdrs[j]))
+ goto error;
+ }
+
+ *count = max_index + 1;
+ return shdrs;
+
+error:
+ free(shdrs);
+ return NULL;
+}
+
+/**
+ * Check that the section headers @shdrs reflect accurately the file data
+ * layout of the image that was loaded during perf record. This is generally
+ * not true for separated debug images generated with e.g.,
+ * objcopy --only-keep-debug.
+ *
+ * We identify invalid files by checking for non-empty sections which are
+ * declared as having no file data (SHT_NOBITS) but are not writable.
+ *
+ * @shdrs: the full set of section headers, as loaded by elf_get_all_shdrs().
+ * @count: the number of headers present in @shdrs.
+ *
+ * Returns 1 for valid headers, 0 otherwise.
+ */
+static int elf_check_shdrs_valid(GElf_Shdr const *shdrs, unsigned count)
+{
+ unsigned i;
+
+ for (i = 0; i < count; i++) {
+ if (shdrs[i].sh_type == SHT_NOBITS &&
+ !(shdrs[i].sh_flags & SHF_WRITE) &&
+ shdrs[i].sh_size != 0)
+ return 0;
+ }
+
+ return 1;
+}
+
+/*
+ * Notes:
+ *
+ * If saved_shdrs is non-NULL, the section headers will be read if found, and
+ * will be used for address fixups. saved_shdrs_count must also be non-NULL in
+ * this case. This may be needed for separated debug images, since the section
+ * headers and symbols may need to come from separate images in that case.
+ *
+ * Note: irrespective of whether this function returns successfully,
+ * *saved_shdrs may get initialised if saved_shdrs is non-NULL. It is the
+ * caller's responsibility to free() it when non longer needed.
+ *
+ * If want_symtab == 1, this function will only load symbols from .symtab
+ * sections. Otherwise (want_symtab == 0), .dynsym or .symtab symbols are
+ * loaded. This feature is used by dso__load() to search for the best image
+ * to load.
+ */
static int dso__load_sym(struct dso *self, struct map *map, const char *name,
int fd, symbol_filter_t filter, int kmodule,
+ GElf_Shdr **saved_shdrs, unsigned *saved_shdrs_count,
int want_symtab)
{
struct kmap *kmap = self->kernel ? map__kmap(map) : NULL;
@@ -998,6 +1097,9 @@ static int dso__load_sym(struct dso *self, struct map *map, const char *name,
int nr = 0;
size_t opdidx = 0;

+ if (saved_shdrs != NULL)
+ assert(saved_shdrs_count != NULL);
+
elf = elf_begin(fd, PERF_ELF_C_READ_MMAP, NULL);
if (elf == NULL) {
pr_debug("%s: cannot read %s ELF file.\n", __func__, name);
@@ -1021,6 +1123,34 @@ static int dso__load_sym(struct dso *self, struct map *map, const char *name,
goto out_elf_end;
}

+ /*
+ * Copy all section headers from the image if requested and if not
+ * already loaded.
+ */
+ if (saved_shdrs != NULL && *saved_shdrs == NULL) {
+ GElf_Shdr *shdrs;
+ unsigned count;
+
+ shdrs = elf_get_all_shdrs(elf, &ehdr, &count);
+ if (shdrs == NULL)
+ goto out_elf_end;
+
+ /*
+ * Only keep the headers if they reflect the actual run-time
+ * image's file layout:
+ */
+ if (elf_check_shdrs_valid(shdrs, count)) {
+ *saved_shdrs = shdrs;
+ *saved_shdrs_count = count;
+ } else
+ free(shdrs);
+ }
+
+ /* If no genuine ELF headers are available yet, give up: we can't
+ * adjust symbols correctly in that case: */
+ if (saved_shdrs != NULL && *saved_shdrs == NULL)
+ goto out_elf_end;
+
sec = elf_section_by_name(elf, &ehdr, &shdr, ".symtab", NULL);
if (sec == NULL) {
if (want_symtab)
@@ -1031,6 +1161,9 @@ static int dso__load_sym(struct dso *self, struct map *map, const char *name,
goto out_elf_end;
}

+ if (saved_shdrs != NULL && *saved_shdrs == NULL)
+ goto out_elf_end;
+
opdsec = elf_section_by_name(elf, &ehdr, &opdshdr, ".opd", &opdidx);
if (opdsec)
opddata = elf_rawdata(opdsec, NULL);
@@ -1153,12 +1286,25 @@ static int dso__load_sym(struct dso *self, struct map *map, const char *name,
goto new_symbol;
}

+ /*
+ * Currently, symbols for shared objects and PIE executables
+ * (i.e., ET_DYN) do not seem to get adjusted. This might need
+ * to change if file offset == virtual address is not actually
+ * guaranteed for these images. ELF doesn't provide this
+ * guarantee natively.
+ */
if (curr_dso->adjust_symbols) {
pr_debug4("%s: adjusting symbol: st_value: %#Lx "
"sh_addr: %#Lx sh_offset: %#Lx\n", __func__,
(u64)sym.st_value, (u64)shdr.sh_addr,
(u64)shdr.sh_offset);
- sym.st_value -= shdr.sh_addr - shdr.sh_offset;
+ if (saved_shdrs && *saved_shdrs &&
+ sym.st_shndx < *saved_shdrs_count)
+ sym.st_value -=
+ (*saved_shdrs)[sym.st_shndx].sh_addr -
+ (*saved_shdrs)[sym.st_shndx].sh_offset;
+ else
+ sym.st_value -= shdr.sh_addr - shdr.sh_offset;
}
/*
* We need to figure out if the object was created from C++ sources
@@ -1395,6 +1541,8 @@ int dso__load(struct dso *self, struct map *map, symbol_filter_t filter)
struct machine *machine;
const char *root_dir;
int want_symtab;
+ GElf_Shdr *saved_shdrs = NULL;
+ unsigned saved_shdrs_count;

dso__set_loaded(self, map->type);

@@ -1425,13 +1573,13 @@ int dso__load(struct dso *self, struct map *map, symbol_filter_t filter)
* On the first pass, only load images if they have a full symtab.
* Failing that, do a second pass where we accept .dynsym also
*/
- for (self->origin = DSO__ORIG_BUILD_ID_CACHE, want_symtab = 1;
- self->origin != DSO__ORIG_NOT_FOUND;
- self->origin++) {
+ self->origin = DSO__ORIG_BUILD_ID_CACHE;
+ want_symtab = 1;
+ while (1) {
switch (self->origin) {
case DSO__ORIG_BUILD_ID_CACHE:
if (dso__build_id_filename(self, name, size) == NULL)
- continue;
+ goto continue_next;
break;
case DSO__ORIG_FEDORA:
snprintf(name, size, "/usr/lib/debug%s.debug",
@@ -1445,7 +1593,7 @@ int dso__load(struct dso *self, struct map *map, symbol_filter_t filter)
char build_id_hex[BUILD_ID_SIZE * 2 + 1];

if (!self->has_build_id)
- continue;
+ goto continue_next;

build_id__sprintf(self->build_id,
sizeof(self->build_id),
@@ -1469,21 +1617,24 @@ int dso__load(struct dso *self, struct map *map, symbol_filter_t filter)
default:
/*
* If we wanted a full symtab but no image had one,
- * relax our requirements and repeat the search.
+ * relax our requirements and repeat the search,
+ * providing we saw some valid section headers:
*/
- if (want_symtab) {
+ if (want_symtab && saved_shdrs != NULL) {
want_symtab = 0;
self->origin = DSO__ORIG_BUILD_ID_CACHE;
- } else
continue;
+ } else
+ goto done;
}

/* Name is now the name of the next image to try */
fd = open(name, O_RDONLY);
if (fd < 0)
- continue;
+ goto continue_next;

ret = dso__load_sym(self, map, name, fd, filter, 0,
+ &saved_shdrs, &saved_shdrs_count,
want_symtab);
close(fd);

@@ -1492,7 +1643,7 @@ int dso__load(struct dso *self, struct map *map, symbol_filter_t filter)
* info!?!?
*/
if (!ret)
- continue;
+ goto continue_next;

if (ret > 0) {
int nr_plt = dso__synthesize_plt_symbols(self, map, filter);
@@ -1500,9 +1651,18 @@ int dso__load(struct dso *self, struct map *map, symbol_filter_t filter)
ret += nr_plt;
break;
}
+
+continue_next:
+ self->origin++;
+ continue;
}

+done:
free(name);
+
+ if (saved_shdrs)
+ free(saved_shdrs);
+
if (ret < 0 && strstr(self->name, " (deleted)") != NULL)
return 0;
return ret;
@@ -1768,7 +1928,8 @@ static int dso__load_vmlinux(struct dso *self, struct map *map,
return -1;

dso__set_loaded(self, map->type);
- err = dso__load_sym(self, map, vmlinux, fd, filter, 0, 0);
+ err = dso__load_sym(self, map, vmlinux, fd, filter, 0,
+ NULL, NULL, 0);
close(fd);

if (err > 0)
--
1.7.0.4

2010-08-13 09:27:48

by Dave Martin

[permalink] [raw]
Subject: [PATCH v3] perf symbols: fix symbol offset breakage with separated debug

Fixed a minor error in v2 where a check was erronueously done
twice.

Original commit message follows:

Applies to 2.6.35
(git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6.git
perf/core branch)

The new patch loads the ELF section headers from a separate
file if necessary, to avoid getting confused by the different
section file offsets seen in debug images. Invalid section
headers are detected by checking for the presence of non-
writable SHT_NOBITS sections, which don't make sense under
normal circumstances.

In particular, this allows symbols in ET_EXEC images to get
fixed up correctly in the presence of separated debug images.

The image search loop is also tidied up to fix a minor bug
which would perform the same image load attempt more than
once in some situations.

For non-separated images, the headers should get loaded from
the same image as the symbols, in the usual way.

Signed-off-by: Dave Martin <[email protected]>
---
tools/perf/util/symbol.c | 182 +++++++++++++++++++++++++++++++++++++++++++---
1 files changed, 170 insertions(+), 12 deletions(-)

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 1a36773..8328b60 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1,4 +1,5 @@
#define _GNU_SOURCE
+#include <assert.h>
#include <ctype.h>
#include <dirent.h>
#include <errno.h>
@@ -978,8 +979,106 @@ static size_t elf_addr_to_index(Elf *elf, GElf_Addr addr)
return -1;
}

+/**
+ * Read all section headers, copying them into a separate array so they survive
+ * elf_end.
+ *
+ * @elf: the libelf instance to operate on.
+ * @ehdr: the elf header: this must already have been read with gelf_getehdr().
+ * @count: the number of headers read is assigned to *count on successful
+ * return. count must not be NULL.
+ *
+ * Returns a pointer to the allocated headers, which should be deallocated with
+ * free() when no longer needed.
+ */
+static GElf_Shdr *elf_get_all_shdrs(Elf *elf, GElf_Ehdr const *ehdr,
+ unsigned *count)
+{
+ GElf_Shdr *shdrs;
+ Elf_Scn *scn;
+ unsigned max_index = 0;
+ unsigned i;
+
+ shdrs = malloc(ehdr->e_shnum * sizeof *shdrs);
+ if (!shdrs)
+ return NULL;
+
+ for (i = 0; i < ehdr->e_shnum; i++)
+ shdrs[i].sh_type = SHT_NULL;
+
+ for (scn = NULL; (scn = elf_nextscn(elf, scn)); ) {
+ size_t j;
+
+ /*
+ * Just assuming we get section 0, 1, ... in sequence may lead
+ * to wrong section indices. Check the index explicitly:
+ */
+ j = elf_ndxscn(scn);
+ assert(j < ehdr->e_shnum);
+
+ if (j > max_index)
+ max_index = j;
+
+ if (!gelf_getshdr(scn, &shdrs[j]))
+ goto error;
+ }
+
+ *count = max_index + 1;
+ return shdrs;
+
+error:
+ free(shdrs);
+ return NULL;
+}
+
+/**
+ * Check that the section headers @shdrs reflect accurately the file data
+ * layout of the image that was loaded during perf record. This is generally
+ * not true for separated debug images generated with e.g.,
+ * objcopy --only-keep-debug.
+ *
+ * We identify invalid files by checking for non-empty sections which are
+ * declared as having no file data (SHT_NOBITS) but are not writable.
+ *
+ * @shdrs: the full set of section headers, as loaded by elf_get_all_shdrs().
+ * @count: the number of headers present in @shdrs.
+ *
+ * Returns 1 for valid headers, 0 otherwise.
+ */
+static int elf_check_shdrs_valid(GElf_Shdr const *shdrs, unsigned count)
+{
+ unsigned i;
+
+ for (i = 0; i < count; i++) {
+ if (shdrs[i].sh_type == SHT_NOBITS &&
+ !(shdrs[i].sh_flags & SHF_WRITE) &&
+ shdrs[i].sh_size != 0)
+ return 0;
+ }
+
+ return 1;
+}
+
+/*
+ * Notes:
+ *
+ * If saved_shdrs is non-NULL, the section headers will be read if found, and
+ * will be used for address fixups. saved_shdrs_count must also be non-NULL in
+ * this case. This may be needed for separated debug images, since the section
+ * headers and symbols may need to come from separate images in that case.
+ *
+ * Note: irrespective of whether this function returns successfully,
+ * *saved_shdrs may get initialised if saved_shdrs is non-NULL. It is the
+ * caller's responsibility to free() it when non longer needed.
+ *
+ * If want_symtab == 1, this function will only load symbols from .symtab
+ * sections. Otherwise (want_symtab == 0), .dynsym or .symtab symbols are
+ * loaded. This feature is used by dso__load() to search for the best image
+ * to load.
+ */
static int dso__load_sym(struct dso *self, struct map *map, const char *name,
int fd, symbol_filter_t filter, int kmodule,
+ GElf_Shdr **saved_shdrs, unsigned *saved_shdrs_count,
int want_symtab)
{
struct kmap *kmap = self->kernel ? map__kmap(map) : NULL;
@@ -998,6 +1097,9 @@ static int dso__load_sym(struct dso *self, struct map *map, const char *name,
int nr = 0;
size_t opdidx = 0;

+ if (saved_shdrs != NULL)
+ assert(saved_shdrs_count != NULL);
+
elf = elf_begin(fd, PERF_ELF_C_READ_MMAP, NULL);
if (elf == NULL) {
pr_debug("%s: cannot read %s ELF file.\n", __func__, name);
@@ -1021,6 +1123,34 @@ static int dso__load_sym(struct dso *self, struct map *map, const char *name,
goto out_elf_end;
}

+ /*
+ * Copy all section headers from the image if requested and if not
+ * already loaded.
+ */
+ if (saved_shdrs != NULL && *saved_shdrs == NULL) {
+ GElf_Shdr *shdrs;
+ unsigned count;
+
+ shdrs = elf_get_all_shdrs(elf, &ehdr, &count);
+ if (shdrs == NULL)
+ goto out_elf_end;
+
+ /*
+ * Only keep the headers if they reflect the actual run-time
+ * image's file layout:
+ */
+ if (elf_check_shdrs_valid(shdrs, count)) {
+ *saved_shdrs = shdrs;
+ *saved_shdrs_count = count;
+ } else
+ free(shdrs);
+ }
+
+ /* If no genuine ELF headers are available yet, give up: we can't
+ * adjust symbols correctly in that case: */
+ if (saved_shdrs != NULL && *saved_shdrs == NULL)
+ goto out_elf_end;
+
sec = elf_section_by_name(elf, &ehdr, &shdr, ".symtab", NULL);
if (sec == NULL) {
if (want_symtab)
@@ -1153,12 +1283,25 @@ static int dso__load_sym(struct dso *self, struct map *map, const char *name,
goto new_symbol;
}

+ /*
+ * Currently, symbols for shared objects and PIE executables
+ * (i.e., ET_DYN) do not seem to get adjusted. This might need
+ * to change if file offset == virtual address is not actually
+ * guaranteed for these images. ELF doesn't provide this
+ * guarantee natively.
+ */
if (curr_dso->adjust_symbols) {
pr_debug4("%s: adjusting symbol: st_value: %#Lx "
"sh_addr: %#Lx sh_offset: %#Lx\n", __func__,
(u64)sym.st_value, (u64)shdr.sh_addr,
(u64)shdr.sh_offset);
- sym.st_value -= shdr.sh_addr - shdr.sh_offset;
+ if (saved_shdrs && *saved_shdrs &&
+ sym.st_shndx < *saved_shdrs_count)
+ sym.st_value -=
+ (*saved_shdrs)[sym.st_shndx].sh_addr -
+ (*saved_shdrs)[sym.st_shndx].sh_offset;
+ else
+ sym.st_value -= shdr.sh_addr - shdr.sh_offset;
}
/*
* We need to figure out if the object was created from C++ sources
@@ -1395,6 +1538,8 @@ int dso__load(struct dso *self, struct map *map, symbol_filter_t filter)
struct machine *machine;
const char *root_dir;
int want_symtab;
+ GElf_Shdr *saved_shdrs = NULL;
+ unsigned saved_shdrs_count;

dso__set_loaded(self, map->type);

@@ -1425,13 +1570,13 @@ int dso__load(struct dso *self, struct map *map, symbol_filter_t filter)
* On the first pass, only load images if they have a full symtab.
* Failing that, do a second pass where we accept .dynsym also
*/
- for (self->origin = DSO__ORIG_BUILD_ID_CACHE, want_symtab = 1;
- self->origin != DSO__ORIG_NOT_FOUND;
- self->origin++) {
+ self->origin = DSO__ORIG_BUILD_ID_CACHE;
+ want_symtab = 1;
+ while (1) {
switch (self->origin) {
case DSO__ORIG_BUILD_ID_CACHE:
if (dso__build_id_filename(self, name, size) == NULL)
- continue;
+ goto continue_next;
break;
case DSO__ORIG_FEDORA:
snprintf(name, size, "/usr/lib/debug%s.debug",
@@ -1445,7 +1590,7 @@ int dso__load(struct dso *self, struct map *map, symbol_filter_t filter)
char build_id_hex[BUILD_ID_SIZE * 2 + 1];

if (!self->has_build_id)
- continue;
+ goto continue_next;

build_id__sprintf(self->build_id,
sizeof(self->build_id),
@@ -1469,21 +1614,24 @@ int dso__load(struct dso *self, struct map *map, symbol_filter_t filter)
default:
/*
* If we wanted a full symtab but no image had one,
- * relax our requirements and repeat the search.
+ * relax our requirements and repeat the search,
+ * providing we saw some valid section headers:
*/
- if (want_symtab) {
+ if (want_symtab && saved_shdrs != NULL) {
want_symtab = 0;
self->origin = DSO__ORIG_BUILD_ID_CACHE;
- } else
continue;
+ } else
+ goto done;
}

/* Name is now the name of the next image to try */
fd = open(name, O_RDONLY);
if (fd < 0)
- continue;
+ goto continue_next;

ret = dso__load_sym(self, map, name, fd, filter, 0,
+ &saved_shdrs, &saved_shdrs_count,
want_symtab);
close(fd);

@@ -1492,7 +1640,7 @@ int dso__load(struct dso *self, struct map *map, symbol_filter_t filter)
* info!?!?
*/
if (!ret)
- continue;
+ goto continue_next;

if (ret > 0) {
int nr_plt = dso__synthesize_plt_symbols(self, map, filter);
@@ -1500,9 +1648,18 @@ int dso__load(struct dso *self, struct map *map, symbol_filter_t filter)
ret += nr_plt;
break;
}
+
+continue_next:
+ self->origin++;
+ continue;
}

+done:
free(name);
+
+ if (saved_shdrs)
+ free(saved_shdrs);
+
if (ret < 0 && strstr(self->name, " (deleted)") != NULL)
return 0;
return ret;
@@ -1768,7 +1925,8 @@ static int dso__load_vmlinux(struct dso *self, struct map *map,
return -1;

dso__set_loaded(self, map->type);
- err = dso__load_sym(self, map, vmlinux, fd, filter, 0, 0);
+ err = dso__load_sym(self, map, vmlinux, fd, filter, 0,
+ NULL, NULL, 0);
close(fd);

if (err > 0)
--
1.7.0.4