On Wed, May 27, 2020 at 12:27 PM Steve MacLean
<[email protected]> wrote:
>
> >> ** Implemented solution
> >>
> >> This patch solves the issue by removing // anon mmap events for any
> >> process which has a valid jit-<pid>.dump file.
> >>
> >> It tracks on a per process basis to handle the case where some running
> >> apps support jit-<pid>.dump, but some only support perf-<pid>.map.
> >>
> >> It adds new assumptions:
> >> * // anon mmap events are only required for perf-<pid>.map support.
> >> * An app that uses jit-<pid>.dump, no longer needs perf-<pid>.map
> >> support. It assumes that any perf-<pid>.map info is inferior.
> >>
> >> *** Details
> >>
> >> Use thread->priv to store whether a jitdump file has been processed
> >>
> >> During "perf inject --jit", discard "//anon*" mmap events for any pid
> >> which has sucessfully processed a jitdump file.
> >
> >
> > Thanks Steve this is an important fix! As //anon could be for malloc or other uses, should the stripping behavior be behind a flag?
> >
> > Ian
>
> I hadn't anticipated a need to preserve the //anon mmap events when profiling JIT generated code.
>
> As far as I know mmap events are captured by perf only for mapping code to symbols. File mappings are kept
> by the change. Only // anon mappings are stripped. (Only for processes which emitted jitdump files.)
> And these are stripped only during the `perf inject --jit` step. I believe the // Anon mapping are only
> generally useful for mapping JIT code.
>
> I suppose if someone was trying to count mmap events it might be confusing, but `perf inject --jit` creates
> synthetic mmap file events which would also make this scenario confusing.
>
> I personally don't see a good reason to add a flag. I also don't see a simple way either. Not running `perf inject --jit`
> would preserve existing behavior w/o jitdump support. Without stripping the anon events jitdump support is painfully
> broken....
Agreed that things are broken. In general only executable mappings are
held onto by perf, so it could be I'm over worrying about //anon
stripping breaking around memory allocations. We have some other use
cases for //anon at Google but they aren't impacted by jitdump. We
have also been trying to migrate jit caches to using memfd_create,
which has the same problem that this patch fixes for //anon. Fixing
memfd_create is a separate issue to //anon. I'll try to get a repro
for Java that demonstrates the problem and then add a Tested-by.
Thanks,
Ian
On Wed, May 27, 2020 at 1:59 PM Ian Rogers <[email protected]> wrote:
>
> On Wed, May 27, 2020 at 12:27 PM Steve MacLean
> <[email protected]> wrote:
> >
> > >> ** Implemented solution
> > >>
> > >> This patch solves the issue by removing // anon mmap events for any
> > >> process which has a valid jit-<pid>.dump file.
> > >>
> > >> It tracks on a per process basis to handle the case where some running
> > >> apps support jit-<pid>.dump, but some only support perf-<pid>.map.
> > >>
> > >> It adds new assumptions:
> > >> * // anon mmap events are only required for perf-<pid>.map support.
> > >> * An app that uses jit-<pid>.dump, no longer needs perf-<pid>.map
> > >> support. It assumes that any perf-<pid>.map info is inferior.
> > >>
> > >> *** Details
> > >>
> > >> Use thread->priv to store whether a jitdump file has been processed
> > >>
> > >> During "perf inject --jit", discard "//anon*" mmap events for any pid
> > >> which has sucessfully processed a jitdump file.
> > >
> > >
> > > Thanks Steve this is an important fix! As //anon could be for malloc or other uses, should the stripping behavior be behind a flag?
> > >
> > > Ian
> >
> > I hadn't anticipated a need to preserve the //anon mmap events when profiling JIT generated code.
> >
> > As far as I know mmap events are captured by perf only for mapping code to symbols. File mappings are kept
> > by the change. Only // anon mappings are stripped. (Only for processes which emitted jitdump files.)
> > And these are stripped only during the `perf inject --jit` step. I believe the // Anon mapping are only
> > generally useful for mapping JIT code.
> >
> > I suppose if someone was trying to count mmap events it might be confusing, but `perf inject --jit` creates
> > synthetic mmap file events which would also make this scenario confusing.
> >
> > I personally don't see a good reason to add a flag. I also don't see a simple way either. Not running `perf inject --jit`
> > would preserve existing behavior w/o jitdump support. Without stripping the anon events jitdump support is painfully
> > broken....
>
> Agreed that things are broken. In general only executable mappings are
> held onto by perf, so it could be I'm over worrying about //anon
> stripping breaking around memory allocations. We have some other use
> cases for //anon at Google but they aren't impacted by jitdump. We
> have also been trying to migrate jit caches to using memfd_create,
> which has the same problem that this patch fixes for //anon. Fixing
> memfd_create is a separate issue to //anon. I'll try to get a repro
> for Java that demonstrates the problem and then add a Tested-by.
>
> Thanks,
> Ian
So on tip/perf/core with:
1c0cd2dbb993 perf jvmti: Fix jitdump for methods without debug info
3ce17c1e52f4 perf jvmti: remove redundant jitdump line table entries
I've been trying variants of:
Before:
/tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java
-agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer
-XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar
dacapo-9.12-bach.jar jython
/tmp/perf/perf inject -i /tmp/perf.data -o /tmp/perf-jit.data -j
/tmp/perf/perf report -i /tmp/perf-jit.data |grep class\ |wc -l
578
/tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
6
After:
/tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java
-agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer
-XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar
dacapo-9.12-bach.jar jython
/tmp/perf/perf inject -i /tmp/perf.data -o /tmp/perf-jit.data -j
/tmp/perf/perf report -i /tmp/perf-jit.data |grep class\ |wc -l
589
/tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
60
So maybe the jit cache isn't behaving the way that the patch cures,
the uptick in unknowns appears consistent in my testing though. I
expect user error, is there an obvious explanation I'm missing?
Thanks,
Ian
> I've been trying variants of:
>
> Before:
>/tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java -agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer -XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar dacapo-9.12-bach.jar jython /tmp/perf/perf inject -i /tmp/perf.data -o /tmp/perf-jit.data -j /tmp/perf/perf report -i /tmp/perf-jit.data |grep class\ |wc -l
> 578
> /tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
> 6
>
> After:
> /tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java -agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer -XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar dacapo-9.12-bach.jar jython /tmp/perf/perf inject -i /tmp/perf.data -o /tmp/perf-jit.data -j /tmp/perf/perf report -i /tmp/perf-jit.data |grep class\ |wc -l
> 589
> /tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
> 60
>
> So maybe the jit cache isn't behaving the way that the patch cures, the uptick in unknowns appears consistent in my testing though. I expect user error, is there an obvious explanation I'm missing?
That is certainly disturbing. I don't see an obvious user error, but I haven't played with java jitdump
Couple of ideas come to mind.
+ When I was drafting .NET jitdump support, I looked briefly at the java agent. My impression was the java agent support wasn't thread safe. The jitdump file format requires that certain records are collocated to simplify the reader. I wasn't sure how that was being accomplished by the writer if JAVA's JIT was multithreaded (like CoreCLR JIT is).
+ The way I implemented jitdump on CoreCLR was to hook into the existing system to write perf-map files. So I am pretty confident there is a 1:1 correspondence of perf-map records with jitdump records. Is it possible that Java choose to only emit interesting jitdump records? Perhaps skipping thunks/stubs/trampolines?
+ There are still some `unknown` records in CoreCLR, but I don't believe there is an increase. I am pretty sure some of our stubs are not being generated. They were present in our before case too.
+ Your methodology would be more repeatable if you record once, and analyze multiple times. record, report, inject / report, inject / report
+ I was focusing on eliminating duplicate symbols for the same address. So having the module name in the report allowed me to see if the symbol was from jitdump or from perf-map. In the before case I could see duplicate symbols for the same address. This was how the problem was first observed.
+ I would be more interested in looking at the diff of the reports. Possibly sorted by address. Probably with zero context.
If I were to guess, I would think Java choose to optimize jitdump and only record interesting code.
So if we need to support that scenario the approach of this patch wouldn't work.
We would need to use a different approach. Ideas...
+ Keep anon mappings from overwriting jitdump mappings. Introduce a weak add map method, that would only fill empty/anon.
+ Move the anon mapping timestamp to the beginning of time, so they are processed first
+ Fix the kernel perf map events
+ Keep the anon mappings in a separate fall back map
I kind of like the add weak map approach.
>/tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java
>-agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer
>-XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar
>dacapo-9.12-bach.jar jython
Its also possible the `InitialCodeCacheSize=20M` argument is masking the problem. Something like 4K would make the problem much easier to see.
I see the problem every time .NET grows the cache across a 64K page boundary. 20M may mean you are allocating huge pages or something which might mask the problem. We may be allocating code pages 64K at a time.
On 05/28/20 17:32 PM, Ian Rogers wrote:
>
> So on tip/perf/core with:
> 1c0cd2dbb993 perf jvmti: Fix jitdump for methods without debug info
> 3ce17c1e52f4 perf jvmti: remove redundant jitdump line table entries
>
> I've been trying variants of:
>
> Before:
> /tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java
> -agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer
> -XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar
> dacapo-9.12-bach.jar jython
> /tmp/perf/perf inject -i /tmp/perf.data -o /tmp/perf-jit.data -j
> /tmp/perf/perf report -i /tmp/perf-jit.data |grep class\ |wc -l
> 578
> /tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
> 6
>
> After:
> /tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java
> -agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer
> -XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar
> dacapo-9.12-bach.jar jython
> /tmp/perf/perf inject -i /tmp/perf.data -o /tmp/perf-jit.data -j
> /tmp/perf/perf report -i /tmp/perf-jit.data |grep class\ |wc -l
> 589
> /tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
> 60
>
> So maybe the jit cache isn't behaving the way that the patch cures,
> the uptick in unknowns appears consistent in my testing though. I
> expect user error, is there an obvious explanation I'm missing?
>
Hi Ian,
I tried this as well with latest perf/core. The difference is that
unresolved addresses currently look like:
0.00% java [JIT] tid 221782 [.] 0x0000ffff451499a4
0.00% java [JIT] tid 221782 [.] 0x0000ffff4514f3e8
0.00% java [JIT] tid 221782 [.] 0x0000ffff45149394
But after Steve's patch this becomes:
0.00% java [unknown] [.] 0x0000ffff58557d14
0.00% java [unknown] [.] 0x0000ffff785c03b4
0.00% java [unknown] [.] 0x0000ffff58386520
I couldn't see any events that were symbolised before but are no longer
symbolised after this patch.
I think most of these unknown events are caused by the asynchronous
nature of the JVMTI event handling. After an nmethod is compiled the
JVMTI event is posted to the Service Thread (*). So there can be a delay
between the time the compiled code starts executing and the time the
plugin receives the compiled code load event.
Here's an edited down example:
java 215881 750014.947873: 1289634 cycles:u: ffff7856ad10 [unknown] ([unknown])
Service Thread 215895 750014.947971: PERF_RECORD_MMAP2 215879/215895: [0xffff785694c0(0x640) @ 0x40 fd:01 121010091 1]:
java 215881 750014.948665: 1295994 cycles:u: ffff7856ad10 org.python.core.__builtin__.range(org.python
The plugin receives the event ~100us after the first sample inside that
method. Ideally we would use the timestamp when the method was actually
compiled, but I can't see any way to extract this information.
However I also saw a few recurring [unknown] addresses that never have a
corresponding code load event. I'm not sure where these come from.
(*) http://hg.openjdk.java.net/jdk/jdk/file/50fe8727ed79/src/hotspot/share/code/nmethod.cpp#l1591
--
Nick
On Sun, May 31, 2020 at 11:17 PM Nick Gasson <[email protected]> wrote:
>
> On 05/28/20 17:32 PM, Ian Rogers wrote:
> >
> > So on tip/perf/core with:
> > 1c0cd2dbb993 perf jvmti: Fix jitdump for methods without debug info
> > 3ce17c1e52f4 perf jvmti: remove redundant jitdump line table entries
> >
> > I've been trying variants of:
> >
> > Before:
> > /tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java
> > -agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer
> > -XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar
> > dacapo-9.12-bach.jar jython
> > /tmp/perf/perf inject -i /tmp/perf.data -o /tmp/perf-jit.data -j
> > /tmp/perf/perf report -i /tmp/perf-jit.data |grep class\ |wc -l
> > 578
> > /tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
> > 6
> >
> > After:
> > /tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java
> > -agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer
> > -XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar
> > dacapo-9.12-bach.jar jython
> > /tmp/perf/perf inject -i /tmp/perf.data -o /tmp/perf-jit.data -j
> > /tmp/perf/perf report -i /tmp/perf-jit.data |grep class\ |wc -l
> > 589
> > /tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
> > 60
> >
> > So maybe the jit cache isn't behaving the way that the patch cures,
> > the uptick in unknowns appears consistent in my testing though. I
> > expect user error, is there an obvious explanation I'm missing?
> >
>
> Hi Ian,
>
> I tried this as well with latest perf/core. The difference is that
> unresolved addresses currently look like:
>
> 0.00% java [JIT] tid 221782 [.] 0x0000ffff451499a4
> 0.00% java [JIT] tid 221782 [.] 0x0000ffff4514f3e8
> 0.00% java [JIT] tid 221782 [.] 0x0000ffff45149394
>
> But after Steve's patch this becomes:
>
> 0.00% java [unknown] [.] 0x0000ffff58557d14
> 0.00% java [unknown] [.] 0x0000ffff785c03b4
> 0.00% java [unknown] [.] 0x0000ffff58386520
>
> I couldn't see any events that were symbolised before but are no longer
> symbolised after this patch.
I see this, thanks for digging into the explanation! Were you able to
get a test case where the unknowns went down? For example, by forcing
the code cache size to be small? This is the result I'd expect to see.
> I think most of these unknown events are caused by the asynchronous
> nature of the JVMTI event handling. After an nmethod is compiled the
> JVMTI event is posted to the Service Thread (*). So there can be a delay
> between the time the compiled code starts executing and the time the
> plugin receives the compiled code load event.
>
> Here's an edited down example:
>
> java 215881 750014.947873: 1289634 cycles:u: ffff7856ad10 [unknown] ([unknown])
> Service Thread 215895 750014.947971: PERF_RECORD_MMAP2 215879/215895: [0xffff785694c0(0x640) @ 0x40 fd:01 121010091 1]:
> java 215881 750014.948665: 1295994 cycles:u: ffff7856ad10 org.python.core.__builtin__.range(org.python
>
> The plugin receives the event ~100us after the first sample inside that
> method. Ideally we would use the timestamp when the method was actually
> compiled, but I can't see any way to extract this information.
Hmm.. this is a bit weird as the compile_info at one point was a
resource allocation and so would be cleared out when the resource mark
was passed on the compiler's return. Not something you'd want to do
asynchronously. Presumably this has changed to improve performance,
but doing the jvmti on a separate thread defeats jitdump - if we see
samples in code ahead of the code being loaded. Perhaps a profiler
like async-profiler
(https://github.com/jvm-profiling-tools/async-profiler) has a
workaround for the lag.
> However I also saw a few recurring [unknown] addresses that never have a
> corresponding code load event. I'm not sure where these come from.
C2 is pretty bad for having assembly stub routine helpers (aka stub
routines) that aren't "documented". C1 is less like this, but it is
increasingly hard to force C1 as the compiler. If you are seeing these
things as leaf routines then its likely some stub routine clean up
could fix the issue.
Thanks,
Ian
> (*) http://hg.openjdk.java.net/jdk/jdk/file/50fe8727ed79/src/hotspot/share/code/nmethod.cpp#l1591
>
> --
> Nick
>
On 06/01/20 16:53 PM, Ian Rogers wrote:
> On Sun, May 31, 2020 at 11:17 PM Nick Gasson <[email protected]> wrote:
>>
>> On 05/28/20 17:32 PM, Ian Rogers wrote:
>> >
>> > So on tip/perf/core with:
>> > 1c0cd2dbb993 perf jvmti: Fix jitdump for methods without debug info
>> > 3ce17c1e52f4 perf jvmti: remove redundant jitdump line table entries
>> >
>> > I've been trying variants of:
>> >
>> > Before:
>> > /tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java
>> > -agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer
>> > -XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar
>> > dacapo-9.12-bach.jar jython
>> > /tmp/perf/perf inject -i /tmp/perf.data -o /tmp/perf-jit.data -j
>> > /tmp/perf/perf report -i /tmp/perf-jit.data |grep class\ |wc -l
>> > 578
>> > /tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
>> > 6
>> >
>> > After:
>> > /tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java
>> > -agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer
>> > -XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar
>> > dacapo-9.12-bach.jar jython
>> > /tmp/perf/perf inject -i /tmp/perf.data -o /tmp/perf-jit.data -j
>> > /tmp/perf/perf report -i /tmp/perf-jit.data |grep class\ |wc -l
>> > 589
>> > /tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
>> > 60
>> >
>> > So maybe the jit cache isn't behaving the way that the patch cures,
>> > the uptick in unknowns appears consistent in my testing though. I
>> > expect user error, is there an obvious explanation I'm missing?
>> >
>>
>> Hi Ian,
>>
>> I tried this as well with latest perf/core. The difference is that
>> unresolved addresses currently look like:
>>
>> 0.00% java [JIT] tid 221782 [.] 0x0000ffff451499a4
>> 0.00% java [JIT] tid 221782 [.] 0x0000ffff4514f3e8
>> 0.00% java [JIT] tid 221782 [.] 0x0000ffff45149394
>>
>> But after Steve's patch this becomes:
>>
>> 0.00% java [unknown] [.] 0x0000ffff58557d14
>> 0.00% java [unknown] [.] 0x0000ffff785c03b4
>> 0.00% java [unknown] [.] 0x0000ffff58386520
>>
>> I couldn't see any events that were symbolised before but are no longer
>> symbolised after this patch.
>
> I see this, thanks for digging into the explanation! Were you able to
> get a test case where the unknowns went down? For example, by forcing
> the code cache size to be small? This is the result I'd expect to see.
I tried the same Dacapo benchmark as you with different values of
InitialCodeCacheSize and grepped for -e '\[unknown\]' -e '\[JIT\]'.
Base Patched
100M 338 373
50M 333 315
25M 323 368
15M 1238 309
10M 2600 333
1M 6035 337
This looks fairly convincing to me: the cliff at 15M is where the code
cache starts needing to be enlarged.
>
>> I think most of these unknown events are caused by the asynchronous
>> nature of the JVMTI event handling. After an nmethod is compiled the
>> JVMTI event is posted to the Service Thread (*). So there can be a delay
>> between the time the compiled code starts executing and the time the
>> plugin receives the compiled code load event.
>>
>> Here's an edited down example:
>>
>> java 215881 750014.947873: 1289634 cycles:u: ffff7856ad10 [unknown] ([unknown])
>> Service Thread 215895 750014.947971: PERF_RECORD_MMAP2 215879/215895: [0xffff785694c0(0x640) @ 0x40 fd:01 121010091 1]:
>> java 215881 750014.948665: 1295994 cycles:u: ffff7856ad10 org.python.core.__builtin__.range(org.python
>>
>> The plugin receives the event ~100us after the first sample inside that
>> method. Ideally we would use the timestamp when the method was actually
>> compiled, but I can't see any way to extract this information.
>
> Hmm.. this is a bit weird as the compile_info at one point was a
> resource allocation and so would be cleared out when the resource mark
> was passed on the compiler's return. Not something you'd want to do
> asynchronously. Presumably this has changed to improve performance,
> but doing the jvmti on a separate thread defeats jitdump - if we see
> samples in code ahead of the code being loaded. Perhaps a profiler
> like async-profiler
> (https://github.com/jvm-profiling-tools/async-profiler) has a
> workaround for the lag.
>
I had a brief look at the async-profiler JVMTI code but I can't see
anything it's doing differently. Their profiler.sh attaches to a running
JVM - I suspect this problem will be less obvious once the JIT has
warmed up.
The compile_info structure is still resource-allocated in
JvmtiExport::post_compiled_method_load() but that function is called
from the service thread not the compiler thread. The compiler thread
just pushes a nmethod* on a queue and the inlining data is recovered
from the nmethod object.
It seems we could extend JVMTI here to pass the code installation
timestamp as another CMLR record type.
--
Nick
>>> Hi Ian,
>>>
>>>> I tried this as well with latest perf/core. The difference is that
>>> unresolved addresses currently look like:
>>>
>>> 0.00% java [JIT] tid 221782 [.] 0x0000ffff451499a4
>>> 0.00% java [JIT] tid 221782 [.] 0x0000ffff4514f3e8
>>> 0.00% java [JIT] tid 221782 [.] 0x0000ffff45149394
>>>
>>> But after Steve's patch this becomes:
>>>
>>> 0.00% java [unknown] [.] 0x0000ffff58557d14
>>> 0.00% java [unknown] [.] 0x0000ffff785c03b4
>>> 0.00% java [unknown] [.] 0x0000ffff58386520
>>>
>>> I couldn't see any events that were symbolised before but are no
>>> longer symbolised after this patch.
>>
>> I see this, thanks for digging into the explanation! Were you able to
>> get a test case where the unknowns went down? For example, by forcing
>> the code cache size to be small? This is the result I'd expect to see.
>
>I tried the same Dacapo benchmark as you with different values of InitialCodeCacheSize and grepped for -e '\[unknown\]' -e '\[JIT\]'.
>
> Base Patched
> 100M 338 373
> 50M 333 315
> 25M 323 368
> 15M 1238 309
> 10M 2600 333
> 1M 6035 337
>
>This looks fairly convincing to me: the cliff at 15M is where the code cache starts needing to be enlarged.
>
Removing the anonymous mappings causes a small regression. Specifically,
the reporting of the module name goes from "[JIT] tid <tid>" to "[unknown]".
This occurs when the JIT fails to report memory used in jitdump before it
is used.
However there is also confirmation that JAVA does see the reported issue
when using a small code cache. The current patch resolves the issue in
this case.
I see two options:
+ Accept the regression. Since this is a regression for a jit dump
reporting synchronization error, this may be a reasonable option.
+ Design a more complicated patch. Either
+ Only strip parts of // anon mmap events overlapping existing
jitted-<pid>-<code_index>.so mmap events.
+ Only strip parts of // anon mmap events overlapping prior
// anon mmap events
Any opinions?
On Fri, Jun 12, 2020 at 12:00 PM Steve MacLean
<[email protected]> wrote:
>
> >>> Hi Ian,
> >>>
> >>>> I tried this as well with latest perf/core. The difference is that
> >>> unresolved addresses currently look like:
> >>>
> >>> 0.00% java [JIT] tid 221782 [.] 0x0000ffff451499a4
> >>> 0.00% java [JIT] tid 221782 [.] 0x0000ffff4514f3e8
> >>> 0.00% java [JIT] tid 221782 [.] 0x0000ffff45149394
> >>>
> >>> But after Steve's patch this becomes:
> >>>
> >>> 0.00% java [unknown] [.] 0x0000ffff58557d14
> >>> 0.00% java [unknown] [.] 0x0000ffff785c03b4
> >>> 0.00% java [unknown] [.] 0x0000ffff58386520
> >>>
> >>> I couldn't see any events that were symbolised before but are no
> >>> longer symbolised after this patch.
> >>
> >> I see this, thanks for digging into the explanation! Were you able to
> >> get a test case where the unknowns went down? For example, by forcing
> >> the code cache size to be small? This is the result I'd expect to see.
> >
> >I tried the same Dacapo benchmark as you with different values of InitialCodeCacheSize and grepped for -e '\[unknown\]' -e '\[JIT\]'.
> >
> > Base Patched
> > 100M 338 373
> > 50M 333 315
> > 25M 323 368
> > 15M 1238 309
> > 10M 2600 333
> > 1M 6035 337
> >
> >This looks fairly convincing to me: the cliff at 15M is where the code cache starts needing to be enlarged.
> >
>
> Removing the anonymous mappings causes a small regression. Specifically,
> the reporting of the module name goes from "[JIT] tid <tid>" to "[unknown]".
> This occurs when the JIT fails to report memory used in jitdump before it
> is used.
>
> However there is also confirmation that JAVA does see the reported issue
> when using a small code cache. The current patch resolves the issue in
> this case.
>
> I see two options:
>
> + Accept the regression. Since this is a regression for a jit dump
> reporting synchronization error, this may be a reasonable option.
>
> + Design a more complicated patch. Either
> + Only strip parts of // anon mmap events overlapping existing
> jitted-<pid>-<code_index>.so mmap events.
> + Only strip parts of // anon mmap events overlapping prior
> // anon mmap events
>
> Any opinions?
Hi Steve,
I think we should merge this change. I wanted to get a test case
together showing the benefit. Based on Nick Gasson's feedback I was
trying with command lines like:
/tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data
/usr/lib/jvm/java-14-openjdk-amd64/bin/java
-agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer
-XX:InitialCodeCacheSize=10M -jar dacapo-9.12-bach.jar jython
I wanted to be able to demonstrate a non-zero effect in getting
samples. I spent limited time on the test and didn't get a result that
demonstrated a benefit, things weren't worse. I think the lack of
benefit is another bug where HotSpot's JVMTI code runs in parallel to
the new code being available in the JIT cache. We get samples ahead of
when the jitdump thinks the code is there and so it symbolizes them as
unknown. We should get HotSpot to provide the earliest timestamp to
avoid this problem.
I think a good way forward would be to merge this code as other issues
can be fixed as a follow up. Arnaldo, does that work for you? If so,
Acked-by: Ian Rogers <[email protected]>.
Thanks,
Ian