Hello,
I found that perf tools don't work well with tasks in a chroot. The
filenames in MMAP record are from the root directory of the task so
it's different than what it sees from outside.
I've tested it with a simple program (myprog - statically built) which
just consumes cpu cycles in a loop for a while (default 1 sec, can by
overridden by a command-line argument).
# cd $HOME
# mkdir -p myroot/bin
# cp myprog myroot/bin
# perf record chroot myroot myprog
# perf report -D | grep MMAP | grep myprog
2084916774977911 0xa2e4 [0x70]: PERF_RECORD_MMAP2 3363818/3363818: \
[0x401000(0x80000) @ 0x1000 fe:01 4346398 2543719070]: r-xp /bin/myprog
So it's reported as /bin/myprog and then it's unable to symbolize the
samples. It seems hard to fix it for the above use case as the record
ended after the task exited. It cannot know the root directory of the
task anymore. But we can fix it for real-time use cases like perf top
or pipe-mode at least.
I tested it again with the following command.
# perf record -o- chroot myroot myprog | perf report -i-
...
#
# Overhead Command Shared Object Symbol
# ........ ....... ................. ................................
#
46.02% myprog myprog [.] 0x000000000000178a
36.71% myprog myprog [.] 0x0000000000001792
17.12% myprog myprog [.] 0x000000000000178e
0.05% myprog myprog [.] 0x0000000000001796
0.05% chroot ld-2.33.so [.] intel_check_word.constprop.0
The symbols are not resolved because it failed to load the symbol
table as it didn't find the file in the given path.
So I modified the code to try a new name prepended with the task's
root directory when it's not "/". With this change, I can see the
symbols of myprog. In fact, it depends on timing if perf report hits
the file before the task is gone. Increasing the loop time to 3 sec
helped it to get symbols reliably.
# perf record -o- chroot myroot myprog 3 | perf report -i-
...
#
# Overhead Command Shared Object Symbol
# ........ ....... ................. .............................
#
99.83% myprog myprog [.] loop
0.04% chroot [kernel.kallsyms] [k] fxregs_fixup
0.04% chroot [kernel.kallsyms] [k] rsm_load_seg_32
0.03% chroot [kernel.kallsyms] [k] show_cpuinfo_cur_freq
0.01% myprog [kernel.kallsyms] [k] alarmtimer_fired
I also found that perf inject and perf annotate need the similar changes.
You can get it from 'perf/dso-chroot-v1' branch at
git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
Thanks,
Namhyung
Namhyung Kim (3):
perf tools: Try chroot'ed filename when opening dso/symbol
perf inject: Try chroot directory when reading build-id
perf annotate: Try chroot filename for objdump
tools/perf/builtin-inject.c | 10 ++++++++++
tools/perf/util/annotate.c | 11 +++++++++++
tools/perf/util/dso.c | 15 +++++++++++++--
tools/perf/util/dsos.c | 13 +++++++++++++
tools/perf/util/symbol.c | 10 ++++++++++
tools/perf/util/util.c | 31 +++++++++++++++++++++++++++++++
tools/perf/util/util.h | 2 ++
7 files changed, 90 insertions(+), 2 deletions(-)
base-commit: e783362eb54cd99b2cac8b3a9aeac942e6f6ac07
--
2.35.0.rc2.247.g8bbb082509-goog
Likewise, it should use a proper name in case the task runs under
chroot. The child_process.err was needed to set to -1 to show error
messages properly in TUI.
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/annotate.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 01900689dc00..e4c641b240df 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -33,6 +33,7 @@
#include "string2.h"
#include "util/event.h"
#include "arch/common.h"
+#include "namespaces.h"
#include <regex.h>
#include <pthread.h>
#include <linux/bitops.h>
@@ -1696,6 +1697,15 @@ static int dso__disassemble_filename(struct dso *dso, char *filename, size_t fil
* DSO is the same as when 'perf record' ran.
*/
__symbol__join_symfs(filename, filename_size, dso->long_name);
+
+ if (access(filename, R_OK) && errno == ENOENT && dso->nsinfo) {
+ char *new_name = filename_with_chroot(dso->nsinfo->pid,
+ filename);
+ if (new_name) {
+ strlcpy(filename, new_name, filename_size);
+ free(new_name);
+ }
+ }
}
free(build_id_path);
@@ -2036,6 +2046,7 @@ static int symbol__disassemble(struct symbol *sym, struct annotate_args *args)
memset(&objdump_process, 0, sizeof(objdump_process));
objdump_process.argv = objdump_argv;
objdump_process.out = -1;
+ objdump_process.err = -1;
if (start_command(&objdump_process)) {
pr_err("Failure starting to run %s\n", command);
err = -1;
--
2.35.0.rc2.247.g8bbb082509-goog
When reading build-id from a DSO, it should consider if it's from a
chroot task. In that case, the path is different so it needs to
prepend the root directory to access the file correctly.
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-inject.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index fbf43a454cba..3be581abbdb6 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -25,6 +25,7 @@
#include "util/synthetic-events.h"
#include "util/thread.h"
#include "util/namespaces.h"
+#include "util/util.h"
#include <linux/err.h>
#include <subcmd/parse-options.h>
@@ -550,6 +551,15 @@ static int dso__read_build_id(struct dso *dso)
nsinfo__mountns_enter(dso->nsinfo, &nsc);
if (filename__read_build_id(dso->long_name, &dso->bid) > 0)
dso->has_build_id = true;
+ else if (dso->nsinfo) {
+ char *new_name;
+
+ new_name = filename_with_chroot(dso->nsinfo->pid,
+ dso->long_name);
+ if (new_name && filename__read_build_id(new_name, &dso->bid) > 0)
+ dso->has_build_id = true;
+ free(new_name);
+ }
nsinfo__mountns_exit(&nsc);
return dso->has_build_id ? 0 : -1;
--
2.35.0.rc2.247.g8bbb082509-goog
On Fri, Jan 28, 2022 at 12:39:50PM -0800, Namhyung Kim wrote:
> Likewise, it should use a proper name in case the task runs under
> chroot. The child_process.err was needed to set to -1 to show error
> messages properly in TUI.
>
> Signed-off-by: Namhyung Kim <[email protected]>
> ---
> tools/perf/util/annotate.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> index 01900689dc00..e4c641b240df 100644
> --- a/tools/perf/util/annotate.c
> +++ b/tools/perf/util/annotate.c
> @@ -33,6 +33,7 @@
> #include "string2.h"
> #include "util/event.h"
> #include "arch/common.h"
> +#include "namespaces.h"
> #include <regex.h>
> #include <pthread.h>
> #include <linux/bitops.h>
> @@ -1696,6 +1697,15 @@ static int dso__disassemble_filename(struct dso *dso, char *filename, size_t fil
> * DSO is the same as when 'perf record' ran.
> */
> __symbol__join_symfs(filename, filename_size, dso->long_name);
> +
> + if (access(filename, R_OK) && errno == ENOENT && dso->nsinfo) {
> + char *new_name = filename_with_chroot(dso->nsinfo->pid,
> + filename);
> + if (new_name) {
> + strlcpy(filename, new_name, filename_size);
> + free(new_name);
> + }
> + }
> }
>
> free(build_id_path);
> @@ -2036,6 +2046,7 @@ static int symbol__disassemble(struct symbol *sym, struct annotate_args *args)
> memset(&objdump_process, 0, sizeof(objdump_process));
> objdump_process.argv = objdump_argv;
> objdump_process.out = -1;
> + objdump_process.err = -1;
is this unrelated fix?
otherwise the whole patchset looks good to me
I guess we'd need to add something like PERF_RECORD_CHROOT to
handle and instrument sys_chroot to handle this completely?
Acked-by: Jiri Olsa <[email protected]>
thanks,
jirka
> if (start_command(&objdump_process)) {
> pr_err("Failure starting to run %s\n", command);
> err = -1;
> --
> 2.35.0.rc2.247.g8bbb082509-goog
>
On 1/28/2022 12:39 PM, Namhyung Kim wrote:
> Hello,
>
> I found that perf tools don't work well with tasks in a chroot. The
> filenames in MMAP record are from the root directory of the task so
> it's different than what it sees from outside.
While that's a real problem, and for chroot it can be fixed, it's much
more complicated for the more complex container namespace case with
custom mounts, including loop back, etc.
It seems it really need some kind of agent to handle all cases. For
example the agent could understand container metadata formats and then
do the right thing.
-Andi
Em Tue, Feb 01, 2022 at 11:54:14AM -0800, Namhyung Kim escreveu:
> Hi Jiri,
>
> On Mon, Jan 31, 2022 at 11:23 AM Jiri Olsa <[email protected]> wrote:
> >
> > On Fri, Jan 28, 2022 at 12:39:50PM -0800, Namhyung Kim wrote:
> > > Likewise, it should use a proper name in case the task runs under
> > > chroot. The child_process.err was needed to set to -1 to show error
> > > messages properly in TUI.
> > >
> > > Signed-off-by: Namhyung Kim <[email protected]>
> > > ---
> > > tools/perf/util/annotate.c | 11 +++++++++++
> > > 1 file changed, 11 insertions(+)
> > >
> > > diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> > > index 01900689dc00..e4c641b240df 100644
> > > --- a/tools/perf/util/annotate.c
> > > +++ b/tools/perf/util/annotate.c
> > > @@ -33,6 +33,7 @@
> > > #include "string2.h"
> > > #include "util/event.h"
> > > #include "arch/common.h"
> > > +#include "namespaces.h"
> > > #include <regex.h>
> > > #include <pthread.h>
> > > #include <linux/bitops.h>
> > > @@ -1696,6 +1697,15 @@ static int dso__disassemble_filename(struct dso *dso, char *filename, size_t fil
> > > * DSO is the same as when 'perf record' ran.
> > > */
> > > __symbol__join_symfs(filename, filename_size, dso->long_name);
> > > +
> > > + if (access(filename, R_OK) && errno == ENOENT && dso->nsinfo) {
> > > + char *new_name = filename_with_chroot(dso->nsinfo->pid,
> > > + filename);
> > > + if (new_name) {
> > > + strlcpy(filename, new_name, filename_size);
> > > + free(new_name);
> > > + }
> > > + }
> > > }
> > >
> > > free(build_id_path);
> > > @@ -2036,6 +2046,7 @@ static int symbol__disassemble(struct symbol *sym, struct annotate_args *args)
> > > memset(&objdump_process, 0, sizeof(objdump_process));
> > > objdump_process.argv = objdump_argv;
> > > objdump_process.out = -1;
> > > + objdump_process.err = -1;
> >
> > is this unrelated fix?
>
> Right, it's unrelated. Maybe I can split it if Arnaldo wants.
Yes, please split.
> >
> > otherwise the whole patchset looks good to me
> >
> > I guess we'd need to add something like PERF_RECORD_CHROOT to
> > handle and instrument sys_chroot to handle this completely?
>
> Are you talking about an env variable? Maybe we can track
> chroot syscall. But it needs to be task-specific rather than a
> global setup.
>
> As it hasn't been an issue so far, I think it's ok to have it as
> a fallback right now.
>
> >
> > Acked-by: Jiri Olsa <[email protected]>
>
> Thanks for your review!
Indeed, thank you guys!
> Namhyung
>
>
> >
> > > if (start_command(&objdump_process)) {
> > > pr_err("Failure starting to run %s\n", command);
> > > err = -1;
> > > --
> > > 2.35.0.rc2.247.g8bbb082509-goog
> > >
> >
--
- Arnaldo
Hi Jiri,
On Mon, Jan 31, 2022 at 11:23 AM Jiri Olsa <[email protected]> wrote:
>
> On Fri, Jan 28, 2022 at 12:39:50PM -0800, Namhyung Kim wrote:
> > Likewise, it should use a proper name in case the task runs under
> > chroot. The child_process.err was needed to set to -1 to show error
> > messages properly in TUI.
> >
> > Signed-off-by: Namhyung Kim <[email protected]>
> > ---
> > tools/perf/util/annotate.c | 11 +++++++++++
> > 1 file changed, 11 insertions(+)
> >
> > diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> > index 01900689dc00..e4c641b240df 100644
> > --- a/tools/perf/util/annotate.c
> > +++ b/tools/perf/util/annotate.c
> > @@ -33,6 +33,7 @@
> > #include "string2.h"
> > #include "util/event.h"
> > #include "arch/common.h"
> > +#include "namespaces.h"
> > #include <regex.h>
> > #include <pthread.h>
> > #include <linux/bitops.h>
> > @@ -1696,6 +1697,15 @@ static int dso__disassemble_filename(struct dso *dso, char *filename, size_t fil
> > * DSO is the same as when 'perf record' ran.
> > */
> > __symbol__join_symfs(filename, filename_size, dso->long_name);
> > +
> > + if (access(filename, R_OK) && errno == ENOENT && dso->nsinfo) {
> > + char *new_name = filename_with_chroot(dso->nsinfo->pid,
> > + filename);
> > + if (new_name) {
> > + strlcpy(filename, new_name, filename_size);
> > + free(new_name);
> > + }
> > + }
> > }
> >
> > free(build_id_path);
> > @@ -2036,6 +2046,7 @@ static int symbol__disassemble(struct symbol *sym, struct annotate_args *args)
> > memset(&objdump_process, 0, sizeof(objdump_process));
> > objdump_process.argv = objdump_argv;
> > objdump_process.out = -1;
> > + objdump_process.err = -1;
>
> is this unrelated fix?
Right, it's unrelated. Maybe I can split it if Arnaldo wants.
>
> otherwise the whole patchset looks good to me
>
> I guess we'd need to add something like PERF_RECORD_CHROOT to
> handle and instrument sys_chroot to handle this completely?
Are you talking about an env variable? Maybe we can track
chroot syscall. But it needs to be task-specific rather than a
global setup.
As it hasn't been an issue so far, I think it's ok to have it as
a fallback right now.
>
> Acked-by: Jiri Olsa <[email protected]>
Thanks for your review!
Namhyung
>
> > if (start_command(&objdump_process)) {
> > pr_err("Failure starting to run %s\n", command);
> > err = -1;
> > --
> > 2.35.0.rc2.247.g8bbb082509-goog
> >
>
Em Tue, Feb 01, 2022 at 12:01:49PM -0800, Namhyung Kim escreveu:
> On Mon, Jan 31, 2022 at 5:16 PM Andi Kleen <[email protected]> wrote:
> > On 1/28/2022 12:39 PM, Namhyung Kim wrote:
> > > I found that perf tools don't work well with tasks in a chroot. The
> > > filenames in MMAP record are from the root directory of the task so
> > > it's different than what it sees from outside.
> > While that's a real problem, and for chroot it can be fixed, it's much
> > more complicated for the more complex container namespace case with
> > custom mounts, including loop back, etc.
> Probably. Note that perf tool can handle namespaces (to some extent)
> but missed chroot support. I have a bug report because of this issue.
> Let's fix the simple case first.
> > It seems it really need some kind of agent to handle all cases. For
> > example the agent could understand container metadata formats and then
> > do the right thing.
>
> Sounds like a good idea for a long term solution.
I think build ids are the way to go and would be really happy to have
arguments against it to be properly spelled out so we can harvest
documentation.
Its really difficult to associate a mmap to a backing storage, its race
prone, so having something that uniquely identifies the contents at the
time of mmap placement hugely desirable.
We seem to have that in most cases, and there is discussion about what
to do to the cases where parsing the ELF headers when that area is not
in memory, so there is hope that the buildid for an exec VMA will be
always available.
- Arnaldo
Hi Andi,
On Mon, Jan 31, 2022 at 5:16 PM Andi Kleen <[email protected]> wrote:
>
>
> On 1/28/2022 12:39 PM, Namhyung Kim wrote:
> > Hello,
> >
> > I found that perf tools don't work well with tasks in a chroot. The
> > filenames in MMAP record are from the root directory of the task so
> > it's different than what it sees from outside.
>
>
> While that's a real problem, and for chroot it can be fixed, it's much
> more complicated for the more complex container namespace case with
> custom mounts, including loop back, etc.
Probably. Note that perf tool can handle namespaces (to some extent)
but missed chroot support. I have a bug report because of this issue.
Let's fix the simple case first.
>
> It seems it really need some kind of agent to handle all cases. For
> example the agent could understand container metadata formats and then
> do the right thing.
Sounds like a good idea for a long term solution.
Thanks,
Namhyung