2017-11-02 18:19:11

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v2 0/5] perf: add support for capturing skid IP

This patchs adds a new sample record type called
PERF_SAMPLE_SKID_IP. The goal is to record the
unmodified interrupted instruction pointer (IP) as
seen by the kernel and reflected in the machine state.

On some architectures, it is possible to avoid the IP skid using
hardware support. For instance, on Intel x86, the use of PEBS helps
eliminate the skid on Haswell and later processors. On older Intel
processor, software, i.e., the kernel, may succeed in eliminating
the skid.

Without this patch, on Haswell processors, if you set:
- attr.precise = 0, then you get the skid IP
- attr.precise = 1, then you get the skid PEBS ip (off-by-1)
- attr.precise = 2, then you get the skidless PEBS ip

The IP is captured when the event has PERF_SAMPLE_IP set in sample_type.
However, there are certain measurements where you need to have BOTH
the skidless IP and the skid IP. For instance, when studying branches,
the skid IP usually points to the target of the branch while the skidless
IP points to the branch instruction itself. Today, it is not possible to retrieve
both at the same time. This patch makes this possible by specifying
PERF_SAMPLE_IP|PERF_SAMPLE_SKID_IP.

As an example, consider the following code snipet:
37.51 42c2ed je 42c2f3
42c2ef add $0x1,%rdx
42c2f3 sub $0x1,%rax

When using PEBS (precise=2) and sampling on BR_INST_RETIRED.CONDITIONAL,
the IP always points to 0x42c2ed. With precise=1, the IP would point to
0x42c2f3. It is interesting to collect both IPs in a single run to determine
how often the conditional branch is taken vs. non-taken.

Understanding the skid is also interesting for other precise events.

In V2, we rebased to 10d94ff4d558 (v4.14-rc7).

Stephane Eranian (5):
perf/core: add PERF_RECORD_SAMPLE_SKID_IP record type
perf/x86: add PERF_SAMPLE_SKID_IP support for X86 PEBS
perf/tools: add support for PERF_SAMPLE_SKID_IP
perf/record: add support for sampling skid ip
perf/script: add support for skid ip

arch/x86/events/intel/ds.c | 7 +++++++
include/linux/perf_event.h | 2 ++
include/uapi/linux/perf_event.h | 4 +++-
kernel/events/core.c | 14 ++++++++++++++
tools/include/uapi/linux/perf_event.h | 4 +++-
tools/perf/Documentation/perf-record.txt | 8 ++++++++
tools/perf/builtin-record.c | 2 ++
tools/perf/builtin-script.c | 6 ++++++
tools/perf/perf.h | 1 +
tools/perf/util/event.h | 1 +
tools/perf/util/evsel.c | 10 ++++++++++
tools/perf/util/session.c | 3 +++
12 files changed, 60 insertions(+), 2 deletions(-)

--
2.7.4


From 1589602324806603749@xxx Sun Jan 14 20:48:30 +0000 2018
X-GM-THRID: 1582987891472235339
X-Gmail-Labels: Inbox,Category Forums,Downloaded_2018-01


2017-11-02 18:17:59

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v2 1/5] perf/core: add PERF_RECORD_SAMPLE_SKID_IP record type

From: Stephane Eranian <[email protected]>

This patchs adds a new sample record type. The goal
is to record the interrupted instruction pointer (IP)
as seen by the kernel and reflected in the machine state (pt_regs).

On some architectures, it is possible to avoid the IP skid using
hardware support. For instance, on Intel x86, the use of PEBS helps
eliminate the skid on Haswell and later processors.

Without this patch, on Haswell processors, if you set:
- attr.precise = 0, then you get the skid IP
- attr.precise > 0, then you get the PEBS ip corrected for skid

The IP normally comes when the event has PERF_RECORD_SAMPLE_IP set.
However, there are certain measuremewnts where you need to have BOTH
the corrected IP and the skid IP. For instance, when studying branches,
the skid IP usually points to the target of the branch while the corrected
IP point to the branch instruction itself. Today, it is not possible to retrieve
both at the same time. This patch makes this possible by specifying
PERF_SAMPLE_IP|PERF_SAMPLE_SKID_IP.

Signed-off-by: Stephane Eranian <[email protected]>
---
include/linux/perf_event.h | 2 ++
include/uapi/linux/perf_event.h | 4 +++-
kernel/events/core.c | 14 ++++++++++++++
3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 874b71a70058..772530501025 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -917,6 +917,7 @@ struct perf_sample_data {
u64 stack_user_size;

u64 phys_addr;
+ u64 skid_ip;
} ____cacheline_aligned;

/* default value for data source */
@@ -937,6 +938,7 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
data->weight = 0;
data->data_src.val = PERF_MEM_NA;
data->txn = 0;
+ data->skid_ip = 0; /* mark as uinitialized */
}

extern void perf_output_sample(struct perf_output_handle *handle,
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 140ae638cfd6..93e1970e9421 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -140,8 +140,9 @@ enum perf_event_sample_format {
PERF_SAMPLE_TRANSACTION = 1U << 17,
PERF_SAMPLE_REGS_INTR = 1U << 18,
PERF_SAMPLE_PHYS_ADDR = 1U << 19,
+ PERF_SAMPLE_SKID_IP = 1U << 20,

- PERF_SAMPLE_MAX = 1U << 20, /* non-ABI */
+ PERF_SAMPLE_MAX = 1U << 21, /* non-ABI */
};

/*
@@ -816,6 +817,7 @@ enum perf_event_type {
* { u64 abi; # enum perf_sample_regs_abi
* u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_INTR
* { u64 phys_addr;} && PERF_SAMPLE_PHYS_ADDR
+ * { u64 skid_ip; } && PERF_SAMPLE_SKID_IP
* };
*/
PERF_RECORD_SAMPLE = 9,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 0649a84204e6..40f2839c8b94 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1565,6 +1565,9 @@ static void __perf_event_header_size(struct perf_event *event, u64 sample_type)
if (sample_type & PERF_SAMPLE_PHYS_ADDR)
size += sizeof(data->phys_addr);

+ if (sample_type & PERF_SAMPLE_SKID_IP)
+ size += sizeof(data->skid_ip);
+
event->header_size = size;
}

@@ -5934,6 +5937,9 @@ void perf_output_sample(struct perf_output_handle *handle,
if (sample_type & PERF_SAMPLE_PHYS_ADDR)
perf_output_put(handle, data->phys_addr);

+ if (sample_type & PERF_SAMPLE_SKID_IP)
+ perf_output_put(handle, data->skid_ip);
+
if (!event->attr.watermark) {
int wakeup_events = event->attr.wakeup_events;

@@ -5999,6 +6005,14 @@ void perf_prepare_sample(struct perf_event_header *header,
if (sample_type & PERF_SAMPLE_IP)
data->ip = perf_instruction_pointer(regs);

+ /*
+ * if skid_ip has not been set by arch specific code, then
+ * we initialize it to IP as interrupt-based sampling has
+ * skid
+ */
+ if (!data->skid_ip && sample_type & PERF_SAMPLE_SKID_IP)
+ data->skid_ip = perf_instruction_pointer(regs);
+
if (sample_type & PERF_SAMPLE_CALLCHAIN) {
int size = 1;

--
2.7.4


From 1582934082623750952@xxx Thu Nov 02 06:19:39 +0000 2017
X-GM-THRID: 1581147142217357430
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread

2017-11-02 18:17:02

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v2 3/5] perf/tools: add support for PERF_SAMPLE_SKID_IP

This patch adds the support code to handle the PERF_SAMPLE_SKID_IP
record type.

Signed-off-by: Stephane Eranian <[email protected]>
---
tools/include/uapi/linux/perf_event.h | 4 +++-
tools/perf/perf.h | 1 +
tools/perf/util/event.h | 1 +
tools/perf/util/evsel.c | 10 ++++++++++
tools/perf/util/session.c | 3 +++
5 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 140ae638cfd6..fd0f5111e433 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -140,8 +140,9 @@ enum perf_event_sample_format {
PERF_SAMPLE_TRANSACTION = 1U << 17,
PERF_SAMPLE_REGS_INTR = 1U << 18,
PERF_SAMPLE_PHYS_ADDR = 1U << 19,
+ PERF_SAMPLE_SKID_IP = 1U << 20,

- PERF_SAMPLE_MAX = 1U << 20, /* non-ABI */
+ PERF_SAMPLE_MAX = 1U << 21, /* non-ABI */
};

/*
@@ -816,6 +817,7 @@ enum perf_event_type {
* { u64 abi; # enum perf_sample_regs_abi
* u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_INTR
* { u64 phys_addr;} && PERF_SAMPLE_PHYS_ADDR
+ * { u64 skid_ip; } && PERF_SAMPLE_SKID_IP
* };
*/
PERF_RECORD_SAMPLE = 9,
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index fbb0a9cd0ac6..b7637d6b34ec 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -59,6 +59,7 @@ struct record_opts {
bool tail_synthesize;
bool overwrite;
bool ignore_missing_thread;
+ bool skid_ip;
unsigned int freq;
unsigned int mmap_pages;
unsigned int auxtrace_mmap_pages;
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index d6cbb0a0d919..e4165e824fdf 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -201,6 +201,7 @@ struct perf_sample {
u32 raw_size;
u64 data_src;
u64 phys_addr;
+ u64 skid_ip;
u32 flags;
u16 insn_len;
u8 cpumode;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index f894893c203d..08549bd92c05 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -980,6 +980,9 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts,
if (opts->sample_weight)
perf_evsel__set_sample_bit(evsel, WEIGHT);

+ if (opts->skid_ip)
+ perf_evsel__set_sample_bit(evsel, SKID_IP);
+
attr->task = track;
attr->mmap = track;
attr->mmap2 = track && !perf_missing_features.mmap2;
@@ -1478,6 +1481,7 @@ static void __p_sample_type(char *buf, size_t size, u64 value)
bit_name(BRANCH_STACK), bit_name(REGS_USER), bit_name(STACK_USER),
bit_name(IDENTIFIER), bit_name(REGS_INTR), bit_name(DATA_SRC),
bit_name(WEIGHT), bit_name(PHYS_ADDR),
+ bit_name(SKID_IP),
{ .name = NULL, }
};
#undef bit_name
@@ -2225,6 +2229,12 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event,
array++;
}

+ data->skid_ip = 0;
+ if (type & PERF_SAMPLE_SKID_IP) {
+ data->skid_ip = *array;
+ array++;
+ }
+
return 0;
}

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index b3fd62f7e4c9..9390cedee6f4 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1130,6 +1130,9 @@ static void dump_sample(struct perf_evsel *evsel, union perf_event *event,

if (sample_type & PERF_SAMPLE_READ)
sample_read__printf(sample, evsel->attr.read_format);
+
+ if (sample_type & PERF_SAMPLE_SKID_IP)
+ printf("... skid_ip: %" PRIu64 "\n", sample->skid_ip);
}

static void dump_read(struct perf_evsel *evsel, union perf_event *event)
--
2.7.4


From 1582965280004733410@xxx Thu Nov 02 14:35:31 +0000 2017
X-GM-THRID: 1582953512357690832
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread