Hi,
I was discussing recently with Will Cohen about how to get perf to
understand dynamic languages (java, python, ruby) better. Currently, perf
samples and address, stores it in a mmap region (from the kernel side),
the mmap region is read (from user side async) and stored in a file.
During 'perf report' those instruction addresses are looked up in the
dwarf table?? of the binary they were mapped to, to resolve their symbols.
This works great for statically compiled binaries (like C), where the
addresses stay the same during each run of the binary.
However, for dynamic languages like java, python, ruby not only do those
addresses change each run of the binary, those address can change
_during_ the execution of the binary. As a result the normal perf
collection method fails.
Oprofile has a mechanism to work around this, by creating a debug library
for java that records class information. This library is linked?? during
the initial execution of the java program and all its symbol info is
recorded in a temp file. During post-processing this temp file is read
back in and symbol info is obtained.
However, this approach is java specific and only works for programs that
initially start with it (can not attach to running programs).
Thoughts have come up about using a SIGPROF from the kernel to signal the
userspace interpreters to dump information to a temp file that can be used
later during post-processing.
Does anyone have any thoughts or experience on this?
Cheers,
Don
Hi Don,
I have been working on the JIT code support for a while now.
I have something working well for more than Java now. It reuses
some of the same principles as the OProfile support but extend
them to support more advanced JIT features such as address
recycling and code movements.
I intend to contribute that code for perf once it is finalized.
Note that it uses a module developed by Sonny Rao to
export the perf timestamp time source via a posix-clock.
This clock discussion has been going on for a while and
never reached a conclusion. So I decided to go with the
simple posix-clock module for the time being.
Thanks.
On Tue, Apr 22, 2014 at 8:03 PM, Don Zickus <[email protected]> wrote:
> Hi,
>
> I was discussing recently with Will Cohen about how to get perf to
> understand dynamic languages (java, python, ruby) better. Currently, perf
> samples and address, stores it in a mmap region (from the kernel side),
> the mmap region is read (from user side async) and stored in a file.
>
> During 'perf report' those instruction addresses are looked up in the
> dwarf table?? of the binary they were mapped to, to resolve their symbols.
>
> This works great for statically compiled binaries (like C), where the
> addresses stay the same during each run of the binary.
>
> However, for dynamic languages like java, python, ruby not only do those
> addresses change each run of the binary, those address can change
> _during_ the execution of the binary. As a result the normal perf
> collection method fails.
>
> Oprofile has a mechanism to work around this, by creating a debug library
> for java that records class information. This library is linked?? during
> the initial execution of the java program and all its symbol info is
> recorded in a temp file. During post-processing this temp file is read
> back in and symbol info is obtained.
>
> However, this approach is java specific and only works for programs that
> initially start with it (can not attach to running programs).
>
> Thoughts have come up about using a SIGPROF from the kernel to signal the
> userspace interpreters to dump information to a temp file that can be used
> later during post-processing.
>
> Does anyone have any thoughts or experience on this?
>
> Cheers,
> Don
>
On 4/22/14, 1:05 PM, Stephane Eranian wrote:
> I intend to contribute that code for perf once it is finalized.
> Note that it uses a module developed by Sonny Rao to
> export the perf timestamp time source via a posix-clock.
> This clock discussion has been going on for a while and
> never reached a conclusion. So I decided to go with the
> simple posix-clock module for the time being.
I don't recall Sonny creating one. If you are referring to the one I
mention here:
https://github.com/dsahern/linux/blob/perf-full-monty/README.ahern
It's from Pawel Moll.
David
On Tue, Apr 22, 2014 at 02:03:05PM -0400, Don Zickus wrote:
> Hi,
>
> I was discussing recently with Will Cohen about how to get perf to
> understand dynamic languages (java, python, ruby) better. Currently, perf
> samples and address, stores it in a mmap region (from the kernel side),
> the mmap region is read (from user side async) and stored in a file.
>
> During 'perf report' those instruction addresses are looked up in the
> dwarf table?? of the binary they were mapped to, to resolve their symbols.
>
> This works great for statically compiled binaries (like C), where the
> addresses stay the same during each run of the binary.
>
> However, for dynamic languages like java, python, ruby not only do those
> addresses change each run of the binary, those address can change
> _during_ the execution of the binary. As a result the normal perf
> collection method fails.
So we have one JIT supported, I forgot the exact details, but it writes
it symbol table to /tmp/perf-* files. I think the JIT in question will
never over-write symbols in debug mode.
One way to do this would be having two JIT areas, and copy the active
symbols into the 'new' one, and recycle the 'old' one.
Pekka used it for his JIT, so he might have some 'sample' code.
> Oprofile has a mechanism to work around this, by creating a debug library
> for java that records class information. This library is linked?? during
> the initial execution of the java program and all its symbol info is
> recorded in a temp file. During post-processing this temp file is read
> back in and symbol info is obtained.
>
> However, this approach is java specific and only works for programs that
> initially start with it (can not attach to running programs).
Right, we're in the same position.
> Thoughts have come up about using a SIGPROF from the kernel to signal the
> userspace interpreters to dump information to a temp file that can be used
> later during post-processing.
>
> Does anyone have any thoughts or experience on this?
I know Stephane worked with some JIT languages, I'll let him tell.
> Does anyone have any thoughts or experience on this?
perf has a JIT interface today, but it's extremely primitive
and only supports symbols. Clearly it could be done better.
Various JITs (e.g. Java) have special debug interfaces for this.
Various non perf profilers support it too. e.g. Vtune has a special
API for it:
https://software.intel.com/sites/default/files/jit_profiling_api_lin_0.pdf
Essentially you would need to write a JIT specific adapter
that translates to perf format. Or emulate the Vtune interface
and reuse existing Vtune adaptations.
perf record needs some kind of side band interface where the JIT adapter
can report to it:
- symbols
- the assembler code (so it can be shown)
- source lines
- report any changes when JITed code changes
Then perf record could put that information into the perf.data.
In theory that information could be passed through the kernel,
but just using some user protocol (unix sockets or files) would be
likely enough. The current interface uses files in /tmp.
I would likely change that, it's not clear even if it's secure.
It's likely a substantial project.
It would be even useful for the kernel, as the kernel does JITing
itself these days.
-Andi
On Tue, Apr 22, 2014 at 09:05:11PM +0200, Stephane Eranian wrote:
> Hi Don,
>
> I have been working on the JIT code support for a while now.
> I have something working well for more than Java now. It reuses
> some of the same principles as the OProfile support but extend
> them to support more advanced JIT features such as address
> recycling and code movements.
>
> I intend to contribute that code for perf once it is finalized.
> Note that it uses a module developed by Sonny Rao to
> export the perf timestamp time source via a posix-clock.
> This clock discussion has been going on for a while and
> never reached a conclusion. So I decided to go with the
> simple posix-clock module for the time being.
Nice! I am in no rush for it, just didn't want to waste time
investigating it if someone else was already doing some work. Any
thoughts on a timeframe until it is finalized? A couple of months or so?
Cheers,
Don
>
>
> Thanks.
>
>
> On Tue, Apr 22, 2014 at 8:03 PM, Don Zickus <[email protected]> wrote:
> > Hi,
> >
> > I was discussing recently with Will Cohen about how to get perf to
> > understand dynamic languages (java, python, ruby) better. Currently, perf
> > samples and address, stores it in a mmap region (from the kernel side),
> > the mmap region is read (from user side async) and stored in a file.
> >
> > During 'perf report' those instruction addresses are looked up in the
> > dwarf table?? of the binary they were mapped to, to resolve their symbols.
> >
> > This works great for statically compiled binaries (like C), where the
> > addresses stay the same during each run of the binary.
> >
> > However, for dynamic languages like java, python, ruby not only do those
> > addresses change each run of the binary, those address can change
> > _during_ the execution of the binary. As a result the normal perf
> > collection method fails.
> >
> > Oprofile has a mechanism to work around this, by creating a debug library
> > for java that records class information. This library is linked?? during
> > the initial execution of the java program and all its symbol info is
> > recorded in a temp file. During post-processing this temp file is read
> > back in and symbol info is obtained.
> >
> > However, this approach is java specific and only works for programs that
> > initially start with it (can not attach to running programs).
> >
> > Thoughts have come up about using a SIGPROF from the kernel to signal the
> > userspace interpreters to dump information to a temp file that can be used
> > later during post-processing.
> >
> > Does anyone have any thoughts or experience on this?
> >
> > Cheers,
> > Don
> >
On Tue, Apr 22, 2014 at 10:54 PM, Andi Kleen <[email protected]> wrote:
>> Does anyone have any thoughts or experience on this?
>
> perf has a JIT interface today, but it's extremely primitive
> and only supports symbols. Clearly it could be done better.
>
> Various JITs (e.g. Java) have special debug interfaces for this.
>
> Various non perf profilers support it too. e.g. Vtune has a special
> API for it:
> https://software.intel.com/sites/default/files/jit_profiling_api_lin_0.pdf
>
> Essentially you would need to write a JIT specific adapter
> that translates to perf format. Or emulate the Vtune interface
> and reuse existing Vtune adaptations.
>
> perf record needs some kind of side band interface where the JIT adapter
> can report to it:
> - symbols
> - the assembler code (so it can be shown)
> - source lines
> - report any changes when JITed code changes
>
Forgot to mention that my implementation does go all the way to jitted
code assembly view via perf annotate + source view.
So it does cover all aspects.
As for attaching to a running JIT, this again needs some cooperation
from JIT environment.
> Then perf record could put that information into the perf.data.
>
> In theory that information could be passed through the kernel,
> but just using some user protocol (unix sockets or files) would be
> likely enough. The current interface uses files in /tmp.
> I would likely change that, it's not clear even if it's secure.
>
> It's likely a substantial project.
>
> It would be even useful for the kernel, as the kernel does JITing
> itself these days.
>
> -Andi
Hi Stephane,
On Tue, 22 Apr 2014 21:05:11 +0200, Stephane Eranian wrote:
> Hi Don,
>
> I have been working on the JIT code support for a while now.
> I have something working well for more than Java now. It reuses
> some of the same principles as the OProfile support but extend
> them to support more advanced JIT features such as address
> recycling and code movements.
>
> I intend to contribute that code for perf once it is finalized.
> Note that it uses a module developed by Sonny Rao to
> export the perf timestamp time source via a posix-clock.
> This clock discussion has been going on for a while and
> never reached a conclusion. So I decided to go with the
> simple posix-clock module for the time being.
I'm looking forward to seeing your patches soon!
Thanks,
Namhyung