From: Steve MacLean <[email protected]>
**perf-<pid>.map and jit-<pid>.dump designs:
When a JIT generates code to be executed, it must allocate memory and
mark it executable using an mmap call.
*** perf-<pid>.map design
The perf-<pid>.map assumes that any sample recorded in an anonymous
memory page is JIT code. It then tries to resolve the symbol name by
looking at the process' perf-<pid>.map.
*** jit-<pid>.dump design
The jit-<pid>.dump mechanism takes a different approach. It requires a JIT
to write a `<path>/jit-<pid>.dump` file. This file must also be mmapped
so that perf inject -jit can find the file. The JIT must also add
JIT_CODE_LOAD records for any functions it generates. The records are
timestamped using a clock which can be correlated to the perf record
clock.
After perf record, the `perf inject -jit` pass parses the recording
looking for a `<path>/jit-<pid>.dump` file. When it finds the file, it
parses it and for each JIT_CODE_LOAD record:
* creates an elf file `<path>/jitted-<pid>-<code_index>.so
* injects a new mmap record mapping the new elf file into the process.
*** Coexistence design
The kernel and perf support both of these mechanisms. We need to make
sure perf works on an app supporting either or both of these mechanisms.
Both designs rely on mmap records to determine how to resolve an ip
address.
The mmap records of both techniques by definition overlap. When the JIT
compiles a method, it must:
* allocate memory (mmap)
* add execution privilege (mprotect or mmap. either will
generate an mmap event form the kernel to perf)
* compile code into memory
* add a function record to perf-<pid>.map and/or jit-<pid>.dump
Because the jit-<pid>.dump mechanism supports greater capabilities, perf
prefers the symbols from jit-<pid>.dump. It implements this based on
timestamp ordering of events. There is an implicit ASSUMPTION that the
JIT_CODE_LOAD record timestamp will be after the // anon mmap event that
was generated during memory allocation or adding the execution privilege setting.
*** Problems with the ASSUMPTION
The ASSUMPTION made in the Coexistence design section above is violated
in the following scenario.
*** Scenario
While a JIT is jitting code it will eventually need to commit more
pages and change these pages to executable permissions. Typically the
JIT will want these collocated to minimize branch displacements.
The kernel will coalesce these anonymous mapping with identical
permissions before sending an MMAP event for the new pages. The address
range of the new mmap will not be just the most recently mmap pages.
It will include the entire coalesced mmap region.
See mm/mmap.c
unsigned long mmap_region(struct file *file, unsigned long addr,
unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
struct list_head *uf)
{
...
/*
* Can we just expand an old mapping?
*/
...
perf_event_mmap(vma);
...
}
*** Symptoms
The coalesced // anon mmap event will be timestamped after the
JIT_CODE_LOAD records. This means it will be used as the most recent
mapping for that entire address range. For remaining events it will look at the
inferior perf-<pid>.map for symbols.
If both mechanisms are supported, the symbol will appear twice with
different module names. This causes weird behavior in reporting.
If only jit-<pid>.dump is supported, the symbol will no longer be resolved.
** Implemented solution
This patch solves the issue by removing // anon mmap events for any
process which has a valid jit-<pid>.dump file.
It tracks on a per process basis to handle the case where some running
apps support jit-<pid>.dump, but some only support perf-<pid>.map.
It adds new assumptions:
* // anon mmap events are only required for perf-<pid>.map support.
* An app that uses jit-<pid>.dump, no longer needs
perf-<pid>.map support. It assumes that any perf-<pid>.map info is
inferior.
*** Details
Use thread->priv to store whether a jitdump file has been processed
During "perf inject --jit", discard "//anon*" mmap events for any pid which
has sucessfully processed a jitdump file.
** Committer testing:
// jitdump case
perf record <app with jitdump>
perf inject --jit --input perf.data --output perfjit.data
// verify mmap "//anon" events present initially
perf script --input perf.data --show-mmap-events | grep '//anon'
// verify mmap "//anon" events removed
perf script --input perfjit.data --show-mmap-events | grep '//anon'
// no jitdump case
perf record <app without jitdump>
perf inject --jit --input perf.data --output perfjit.data
// verify mmap "//anon" events present initially
perf script --input perf.data --show-mmap-events | grep '//anon'
// verify mmap "//anon" events not removed
perf script --input perfjit.data --show-mmap-events | grep '//anon'
** Repro:
This issue was discovered while testing the initial CoreCLR jitdump
implementation. https://github.com/dotnet/coreclr/pull/26897.
** Alternate solutions considered
These were also briefly considered
* Change kernel to not coalesce mmap regions.
* Change kernel reporting of coalesced mmap regions to perf. Only
include newly mapped memory.
* Only strip parts of // anon mmap events overlapping existing
jitted-<pid>-<code_index>.so mmap events.
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: [email protected]
Signed-off-by: Steve MacLean <[email protected]>
---
tools/perf/builtin-inject.c | 4 ++--
tools/perf/util/jitdump.c | 31 ++++++++++++++++++++++++++++++-
2 files changed, 32 insertions(+), 3 deletions(-)
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 372ecb3..0f38862 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -263,7 +263,7 @@ static int perf_event__jit_repipe_mmap(struct perf_tool *tool,
* if jit marker, then inject jit mmaps and generate ELF images
*/
ret = jit_process(inject->session, &inject->output, machine,
- event->mmap.filename, sample->pid, &n);
+ event->mmap.filename, event->mmap.pid, &n);
if (ret < 0)
return ret;
if (ret) {
@@ -301,7 +301,7 @@ static int perf_event__jit_repipe_mmap2(struct perf_tool *tool,
* if jit marker, then inject jit mmaps and generate ELF images
*/
ret = jit_process(inject->session, &inject->output, machine,
- event->mmap2.filename, sample->pid, &n);
+ event->mmap2.filename, event->mmap2.pid, &n);
if (ret < 0)
return ret;
if (ret) {
diff --git a/tools/perf/util/jitdump.c b/tools/perf/util/jitdump.c
index e3ccb0c..d18596e 100644
--- a/tools/perf/util/jitdump.c
+++ b/tools/perf/util/jitdump.c
@@ -26,6 +26,7 @@
#include "jit.h"
#include "jitdump.h"
#include "genelf.h"
+#include "thread.h"
#include <linux/ctype.h>
#include <linux/zalloc.h>
@@ -749,6 +750,28 @@ static int jit_repipe_debug_info(struct jit_buf_desc *jd, union jr_entry *jr)
return 0;
}
+static void jit_add_pid(struct machine *machine, pid_t pid)
+{
+ struct thread *thread = machine__findnew_thread(machine, pid, pid);
+
+ if (!thread) {
+ pr_err("%s: thread %d not found or created\n", __func__, pid);
+ return;
+ }
+
+ thread->priv = (void *)1;
+}
+
+static bool jit_has_pid(struct machine *machine, pid_t pid)
+{
+ struct thread *thread = machine__find_thread(machine, pid, pid);
+
+ if (!thread)
+ return 0;
+
+ return (bool)thread->priv;
+}
+
int
jit_process(struct perf_session *session,
struct perf_data *output,
@@ -764,8 +787,13 @@ static int jit_repipe_debug_info(struct jit_buf_desc *jd, union jr_entry *jr)
/*
* first, detect marker mmap (i.e., the jitdump mmap)
*/
- if (jit_detect(filename, pid))
+ if (jit_detect(filename, pid)) {
+ // Strip //anon* mmaps if we processed a jitdump for this pid
+ if (jit_has_pid(machine, pid) && (strncmp(filename, "//anon", 6) == 0))
+ return 1;
+
return 0;
+ }
memset(&jd, 0, sizeof(jd));
@@ -784,6 +812,7 @@ static int jit_repipe_debug_info(struct jit_buf_desc *jd, union jr_entry *jr)
ret = jit_inject(&jd, filename);
if (!ret) {
+ jit_add_pid(machine, pid);
*nbytes = jd.bytes_written;
ret = 1;
}
--
1.8.3.1
On Tue, May 26, 2020 at 6:52 PM Steve MacLean
<[email protected]> wrote:
>
> From: Steve MacLean <[email protected]>
>
> **perf-<pid>.map and jit-<pid>.dump designs:
>
> When a JIT generates code to be executed, it must allocate memory and
> mark it executable using an mmap call.
>
> *** perf-<pid>.map design
>
> The perf-<pid>.map assumes that any sample recorded in an anonymous
> memory page is JIT code. It then tries to resolve the symbol name by
> looking at the process' perf-<pid>.map.
>
> *** jit-<pid>.dump design
>
> The jit-<pid>.dump mechanism takes a different approach. It requires a JIT
> to write a `<path>/jit-<pid>.dump` file. This file must also be mmapped
> so that perf inject -jit can find the file. The JIT must also add
> JIT_CODE_LOAD records for any functions it generates. The records are
> timestamped using a clock which can be correlated to the perf record
> clock.
>
> After perf record, the `perf inject -jit` pass parses the recording
> looking for a `<path>/jit-<pid>.dump` file. When it finds the file, it
> parses it and for each JIT_CODE_LOAD record:
> * creates an elf file `<path>/jitted-<pid>-<code_index>.so
> * injects a new mmap record mapping the new elf file into the process.
>
> *** Coexistence design
>
> The kernel and perf support both of these mechanisms. We need to make
> sure perf works on an app supporting either or both of these mechanisms.
> Both designs rely on mmap records to determine how to resolve an ip
> address.
>
> The mmap records of both techniques by definition overlap. When the JIT
> compiles a method, it must:
> * allocate memory (mmap)
> * add execution privilege (mprotect or mmap. either will
> generate an mmap event form the kernel to perf)
> * compile code into memory
> * add a function record to perf-<pid>.map and/or jit-<pid>.dump
>
> Because the jit-<pid>.dump mechanism supports greater capabilities, perf
> prefers the symbols from jit-<pid>.dump. It implements this based on
> timestamp ordering of events. There is an implicit ASSUMPTION that the
> JIT_CODE_LOAD record timestamp will be after the // anon mmap event that
> was generated during memory allocation or adding the execution privilege setting.
>
> *** Problems with the ASSUMPTION
>
> The ASSUMPTION made in the Coexistence design section above is violated
> in the following scenario.
>
> *** Scenario
>
> While a JIT is jitting code it will eventually need to commit more
> pages and change these pages to executable permissions. Typically the
> JIT will want these collocated to minimize branch displacements.
>
> The kernel will coalesce these anonymous mapping with identical
> permissions before sending an MMAP event for the new pages. The address
> range of the new mmap will not be just the most recently mmap pages.
> It will include the entire coalesced mmap region.
>
> See mm/mmap.c
>
> unsigned long mmap_region(struct file *file, unsigned long addr,
> unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
> struct list_head *uf)
> {
> ...
> /*
> * Can we just expand an old mapping?
> */
> ...
> perf_event_mmap(vma);
> ...
> }
>
> *** Symptoms
>
> The coalesced // anon mmap event will be timestamped after the
> JIT_CODE_LOAD records. This means it will be used as the most recent
> mapping for that entire address range. For remaining events it will look at the
> inferior perf-<pid>.map for symbols.
>
> If both mechanisms are supported, the symbol will appear twice with
> different module names. This causes weird behavior in reporting.
>
> If only jit-<pid>.dump is supported, the symbol will no longer be resolved.
>
> ** Implemented solution
>
> This patch solves the issue by removing // anon mmap events for any
> process which has a valid jit-<pid>.dump file.
>
> It tracks on a per process basis to handle the case where some running
> apps support jit-<pid>.dump, but some only support perf-<pid>.map.
>
> It adds new assumptions:
> * // anon mmap events are only required for perf-<pid>.map support.
> * An app that uses jit-<pid>.dump, no longer needs
> perf-<pid>.map support. It assumes that any perf-<pid>.map info is
> inferior.
Thanks Steve this is an important fix! As //anon could be for malloc
or other uses, should the stripping behavior be behind a flag?
Ian
> *** Details
>
> Use thread->priv to store whether a jitdump file has been processed
>
> During "perf inject --jit", discard "//anon*" mmap events for any pid which
> has sucessfully processed a jitdump file.
>
> ** Committer testing:
>
> // jitdump case
> perf record <app with jitdump>
> perf inject --jit --input perf.data --output perfjit.data
> // verify mmap "//anon" events present initially
> perf script --input perf.data --show-mmap-events | grep '//anon'
> // verify mmap "//anon" events removed
> perf script --input perfjit.data --show-mmap-events | grep '//anon'
> // no jitdump case
> perf record <app without jitdump>
> perf inject --jit --input perf.data --output perfjit.data
> // verify mmap "//anon" events present initially
> perf script --input perf.data --show-mmap-events | grep '//anon'
> // verify mmap "//anon" events not removed
> perf script --input perfjit.data --show-mmap-events | grep '//anon'
>
> ** Repro:
>
> This issue was discovered while testing the initial CoreCLR jitdump
> implementation. https://github.com/dotnet/coreclr/pull/26897.
>
> ** Alternate solutions considered
>
> These were also briefly considered
> * Change kernel to not coalesce mmap regions.
> * Change kernel reporting of coalesced mmap regions to perf. Only
> include newly mapped memory.
> * Only strip parts of // anon mmap events overlapping existing
> jitted-<pid>-<code_index>.so mmap events.
>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Alexander Shishkin <[email protected]>
> Cc: Jiri Olsa <[email protected]>
> Cc: Namhyung Kim <[email protected]>
> Cc: Stephane Eranian <[email protected]>
> Cc: [email protected]
> Signed-off-by: Steve MacLean <[email protected]>
> ---
> tools/perf/builtin-inject.c | 4 ++--
> tools/perf/util/jitdump.c | 31 ++++++++++++++++++++++++++++++-
> 2 files changed, 32 insertions(+), 3 deletions(-)
>
> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> index 372ecb3..0f38862 100644
> --- a/tools/perf/builtin-inject.c
> +++ b/tools/perf/builtin-inject.c
> @@ -263,7 +263,7 @@ static int perf_event__jit_repipe_mmap(struct perf_tool *tool,
> * if jit marker, then inject jit mmaps and generate ELF images
> */
> ret = jit_process(inject->session, &inject->output, machine,
> - event->mmap.filename, sample->pid, &n);
> + event->mmap.filename, event->mmap.pid, &n);
> if (ret < 0)
> return ret;
> if (ret) {
> @@ -301,7 +301,7 @@ static int perf_event__jit_repipe_mmap2(struct perf_tool *tool,
> * if jit marker, then inject jit mmaps and generate ELF images
> */
> ret = jit_process(inject->session, &inject->output, machine,
> - event->mmap2.filename, sample->pid, &n);
> + event->mmap2.filename, event->mmap2.pid, &n);
> if (ret < 0)
> return ret;
> if (ret) {
> diff --git a/tools/perf/util/jitdump.c b/tools/perf/util/jitdump.c
> index e3ccb0c..d18596e 100644
> --- a/tools/perf/util/jitdump.c
> +++ b/tools/perf/util/jitdump.c
> @@ -26,6 +26,7 @@
> #include "jit.h"
> #include "jitdump.h"
> #include "genelf.h"
> +#include "thread.h"
>
> #include <linux/ctype.h>
> #include <linux/zalloc.h>
> @@ -749,6 +750,28 @@ static int jit_repipe_debug_info(struct jit_buf_desc *jd, union jr_entry *jr)
> return 0;
> }
>
> +static void jit_add_pid(struct machine *machine, pid_t pid)
> +{
> + struct thread *thread = machine__findnew_thread(machine, pid, pid);
> +
> + if (!thread) {
> + pr_err("%s: thread %d not found or created\n", __func__, pid);
> + return;
> + }
> +
> + thread->priv = (void *)1;
> +}
> +
> +static bool jit_has_pid(struct machine *machine, pid_t pid)
> +{
> + struct thread *thread = machine__find_thread(machine, pid, pid);
> +
> + if (!thread)
> + return 0;
> +
> + return (bool)thread->priv;
> +}
> +
> int
> jit_process(struct perf_session *session,
> struct perf_data *output,
> @@ -764,8 +787,13 @@ static int jit_repipe_debug_info(struct jit_buf_desc *jd, union jr_entry *jr)
> /*
> * first, detect marker mmap (i.e., the jitdump mmap)
> */
> - if (jit_detect(filename, pid))
> + if (jit_detect(filename, pid)) {
> + // Strip //anon* mmaps if we processed a jitdump for this pid
> + if (jit_has_pid(machine, pid) && (strncmp(filename, "//anon", 6) == 0))
> + return 1;
> +
> return 0;
> + }
>
> memset(&jd, 0, sizeof(jd));
>
> @@ -784,6 +812,7 @@ static int jit_repipe_debug_info(struct jit_buf_desc *jd, union jr_entry *jr)
>
> ret = jit_inject(&jd, filename);
> if (!ret) {
> + jit_add_pid(machine, pid);
> *nbytes = jd.bytes_written;
> ret = 1;
> }
> --
> 1.8.3.1
>
>> ** Implemented solution
>>
>> This patch solves the issue by removing // anon mmap events for any
>> process which has a valid jit-<pid>.dump file.
>>
>> It tracks on a per process basis to handle the case where some running
>> apps support jit-<pid>.dump, but some only support perf-<pid>.map.
>>
>> It adds new assumptions:
>> * // anon mmap events are only required for perf-<pid>.map support.
>> * An app that uses jit-<pid>.dump, no longer needs perf-<pid>.map
>> support. It assumes that any perf-<pid>.map info is inferior.
>>
>> *** Details
>>
>> Use thread->priv to store whether a jitdump file has been processed
>>
>> During "perf inject --jit", discard "//anon*" mmap events for any pid
>> which has sucessfully processed a jitdump file.
>
>
> Thanks Steve this is an important fix! As //anon could be for malloc or other uses, should the stripping behavior be behind a flag?
>
> Ian
I hadn't anticipated a need to preserve the //anon mmap events when profiling JIT generated code.
As far as I know mmap events are captured by perf only for mapping code to symbols. File mappings are kept
by the change. Only // anon mappings are stripped. (Only for processes which emitted jitdump files.)
And these are stripped only during the `perf inject --jit` step. I believe the // Anon mapping are only
generally useful for mapping JIT code.
I suppose if someone was trying to count mmap events it might be confusing, but `perf inject --jit` creates
synthetic mmap file events which would also make this scenario confusing.
I personally don't see a good reason to add a flag. I also don't see a simple way either. Not running `perf inject --jit`
would preserve existing behavior w/o jitdump support. Without stripping the anon events jitdump support is painfully
broken....