2017-11-08 07:58:17

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v3 0/5] perf: add support for capturing skid IP

This patchs adds a new sample record type called
PERF_SAMPLE_SKID_IP. The goal is to record the
unmodified interrupted instruction pointer (IP) as
seen by the kernel and reflected in the machine state.

On some architectures, it is possible to avoid the IP skid using
hardware support. For instance, on Intel x86, the use of PEBS helps
eliminate the skid on Haswell and later processors. On older Intel
processor, software, i.e., the kernel, may succeed in eliminating
the skid.

Without this patch, on Haswell processors, if you set:
- attr.precise = 0, then you get the skid IP
- attr.precise = 1, then you get the skid PEBS ip (off-by-1)
- attr.precise = 2, then you get the skidless PEBS ip

The IP is captured when the event has PERF_SAMPLE_IP set in sample_type.
However, there are certain measurements where you need to have BOTH
the skidless IP and the skid IP. For instance, when studying branches,
the skid IP usually points to the target of the branch while the skidless
IP points to the branch instruction itself. Today, it is not possible to retrieve
both at the same time. This patch makes this possible by specifying
PERF_SAMPLE_IP|PERF_SAMPLE_SKID_IP.

As an example, consider the following code snipet:
37.51 42c2ed je 42c2f3
42c2ef add $0x1,%rdx
42c2f3 sub $0x1,%rax

When using PEBS (precise=2) and sampling on BR_INST_RETIRED.CONDITIONAL,
the IP always points to 0x42c2ed. With precise=1, the IP would point to
0x42c2f3. It is interesting to collect both IPs in a single run to determine
how often the conditional branch is taken vs. non-taken.

Understanding the skid is also interesting for other precise events.

In V2, we rebased to 10d94ff4d558 (v4.14-rc7).

In V3, code is rebased to 4.14-rc8, LKML comments have been integrated.
The new way to specify skid ip is per event:
$ perf record -e cpu/event=0xc5,skid-ip=1/ ....

Stephane Eranian (5):
perf/core: add PERF_RECORD_SAMPLE_SKID_IP record type
perf/x86: add PERF_SAMPLE_SKID_IP support for X86 PEBS
perf/tools: add support for PERF_SAMPLE_SKID_IP
perf/record: add documentation for sampling skid ip
perf/script: add support for skid ip

arch/x86/events/intel/ds.c | 7 +++++++
include/linux/perf_event.h | 2 ++
include/uapi/linux/perf_event.h | 4 +++-
kernel/events/core.c | 14 ++++++++++++++
tools/include/uapi/linux/perf_event.h | 4 +++-
tools/perf/Documentation/perf-record.txt | 8 ++++++++
tools/perf/Documentation/perf-script.txt | 2 +-
tools/perf/builtin-script.c | 10 ++++++++--
tools/perf/util/event.h | 1 +
tools/perf/util/evsel.c | 11 +++++++++++
tools/perf/util/evsel.h | 2 ++
tools/perf/util/parse-events.c | 7 +++++++
tools/perf/util/parse-events.h | 1 +
tools/perf/util/parse-events.l | 1 +
tools/perf/util/session.c | 3 +++
15 files changed, 72 insertions(+), 5 deletions(-)

--
2.7.4

From 1583504282160511558@xxx Wed Nov 08 13:22:43 +0000 2017
X-GM-THRID: 1583462782445245590
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread


2017-11-08 08:00:24

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v3 1/5] perf/core: add PERF_RECORD_SAMPLE_SKID_IP record type

This patchs adds a new sample record type. The goal
is to record the interrupted instruction pointer (IP)
as seen by the kernel and reflected in the machine state (pt_regs).

On some architectures, it is possible to avoid the IP skid using
hardware support. For instance, on Intel x86, the use of PEBS helps
eliminate the skid on Haswell and later processors.

Without this patch, on Haswell processors, if you set:
- attr.precise = 0, then you get the skid IP
- attr.precise > 0, then you get the PEBS ip corrected for skid

The IP normally comes when the event has PERF_RECORD_SAMPLE_IP set.
However, there are certain measuremewnts where you need to have BOTH
the corrected IP and the skid IP. For instance, when studying branches,
the skid IP usually points to the target of the branch while the corrected
IP point to the branch instruction itself. Today, it is not possible to retrieve
both at the same time. This patch makes this possible by specifying
PERF_SAMPLE_IP|PERF_SAMPLE_SKID_IP.

Signed-off-by: Stephane Eranian <[email protected]>
---
include/linux/perf_event.h | 2 ++
include/uapi/linux/perf_event.h | 4 +++-
kernel/events/core.c | 14 ++++++++++++++
3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 874b71a70058..772530501025 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -917,6 +917,7 @@ struct perf_sample_data {
u64 stack_user_size;

u64 phys_addr;
+ u64 skid_ip;
} ____cacheline_aligned;

/* default value for data source */
@@ -937,6 +938,7 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
data->weight = 0;
data->data_src.val = PERF_MEM_NA;
data->txn = 0;
+ data->skid_ip = 0; /* mark as uinitialized */
}

extern void perf_output_sample(struct perf_output_handle *handle,
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 362493a2f950..48a65a90fcab 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -141,8 +141,9 @@ enum perf_event_sample_format {
PERF_SAMPLE_TRANSACTION = 1U << 17,
PERF_SAMPLE_REGS_INTR = 1U << 18,
PERF_SAMPLE_PHYS_ADDR = 1U << 19,
+ PERF_SAMPLE_SKID_IP = 1U << 20,

- PERF_SAMPLE_MAX = 1U << 20, /* non-ABI */
+ PERF_SAMPLE_MAX = 1U << 21, /* non-ABI */
};

/*
@@ -817,6 +818,7 @@ enum perf_event_type {
* { u64 abi; # enum perf_sample_regs_abi
* u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_INTR
* { u64 phys_addr;} && PERF_SAMPLE_PHYS_ADDR
+ * { u64 skid_ip; } && PERF_SAMPLE_SKID_IP
* };
*/
PERF_RECORD_SAMPLE = 9,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 0649a84204e6..40f2839c8b94 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1565,6 +1565,9 @@ static void __perf_event_header_size(struct perf_event *event, u64 sample_type)
if (sample_type & PERF_SAMPLE_PHYS_ADDR)
size += sizeof(data->phys_addr);

+ if (sample_type & PERF_SAMPLE_SKID_IP)
+ size += sizeof(data->skid_ip);
+
event->header_size = size;
}

@@ -5934,6 +5937,9 @@ void perf_output_sample(struct perf_output_handle *handle,
if (sample_type & PERF_SAMPLE_PHYS_ADDR)
perf_output_put(handle, data->phys_addr);

+ if (sample_type & PERF_SAMPLE_SKID_IP)
+ perf_output_put(handle, data->skid_ip);
+
if (!event->attr.watermark) {
int wakeup_events = event->attr.wakeup_events;

@@ -5999,6 +6005,14 @@ void perf_prepare_sample(struct perf_event_header *header,
if (sample_type & PERF_SAMPLE_IP)
data->ip = perf_instruction_pointer(regs);

+ /*
+ * if skid_ip has not been set by arch specific code, then
+ * we initialize it to IP as interrupt-based sampling has
+ * skid
+ */
+ if (!data->skid_ip && sample_type & PERF_SAMPLE_SKID_IP)
+ data->skid_ip = perf_instruction_pointer(regs);
+
if (sample_type & PERF_SAMPLE_CALLCHAIN) {
int size = 1;

--
2.7.4


From 1583578242513651290@xxx Thu Nov 09 08:58:17 +0000 2017
X-GM-THRID: 1583578242513651290
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread

2017-11-08 07:58:48

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v3 3/5] perf/tools: add support for PERF_SAMPLE_SKID_IP

This patch adds the support code to handle the PERF_SAMPLE_SKID_IP
record type. This is done as an event term and as such can be enabled
per event: cpu/event=xxx,skid-ip=1/. This is a boolean term which is
false by default.

Signed-off-by: Stephane Eranian <[email protected]>
---
tools/include/uapi/linux/perf_event.h | 4 +++-
tools/perf/util/event.h | 1 +
tools/perf/util/evsel.c | 11 +++++++++++
tools/perf/util/evsel.h | 2 ++
tools/perf/util/parse-events.c | 7 +++++++
tools/perf/util/parse-events.h | 1 +
tools/perf/util/parse-events.l | 1 +
tools/perf/util/session.c | 3 +++
8 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 362493a2f950..79655228dd9b 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -141,8 +141,9 @@ enum perf_event_sample_format {
PERF_SAMPLE_TRANSACTION = 1U << 17,
PERF_SAMPLE_REGS_INTR = 1U << 18,
PERF_SAMPLE_PHYS_ADDR = 1U << 19,
+ PERF_SAMPLE_SKID_IP = 1U << 20,

- PERF_SAMPLE_MAX = 1U << 20, /* non-ABI */
+ PERF_SAMPLE_MAX = 1U << 21, /* non-ABI */
};

/*
@@ -817,6 +818,7 @@ enum perf_event_type {
* { u64 abi; # enum perf_sample_regs_abi
* u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_INTR
* { u64 phys_addr;} && PERF_SAMPLE_PHYS_ADDR
+ * { u64 skid_ip; } && PERF_SAMPLE_SKID_IP
* };
*/
PERF_RECORD_SAMPLE = 9,
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 1ae95efbfb95..41622a7ed649 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -202,6 +202,7 @@ struct perf_sample {
u32 raw_size;
u64 data_src;
u64 phys_addr;
+ u64 skid_ip;
u32 flags;
u16 insn_len;
u8 cpumode;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index f894893c203d..679954ed2201 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -775,6 +775,10 @@ static void apply_config_terms(struct perf_evsel *evsel,
case PERF_EVSEL__CONFIG_TERM_OVERWRITE:
attr->write_backward = term->val.overwrite ? 1 : 0;
break;
+ case PERF_EVSEL__CONFIG_TERM_SKID_IP:
+ if (term->val.skid_ip)
+ perf_evsel__set_sample_bit(evsel, SKID_IP);
+ break;
default:
break;
}
@@ -1478,6 +1482,7 @@ static void __p_sample_type(char *buf, size_t size, u64 value)
bit_name(BRANCH_STACK), bit_name(REGS_USER), bit_name(STACK_USER),
bit_name(IDENTIFIER), bit_name(REGS_INTR), bit_name(DATA_SRC),
bit_name(WEIGHT), bit_name(PHYS_ADDR),
+ bit_name(SKID_IP),
{ .name = NULL, }
};
#undef bit_name
@@ -2225,6 +2230,12 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event,
array++;
}

+ data->skid_ip = 0;
+ if (type & PERF_SAMPLE_SKID_IP) {
+ data->skid_ip = *array;
+ array++;
+ }
+
return 0;
}

diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 9277df96ffda..8555095f0d48 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -49,6 +49,7 @@ enum {
PERF_EVSEL__CONFIG_TERM_OVERWRITE,
PERF_EVSEL__CONFIG_TERM_DRV_CFG,
PERF_EVSEL__CONFIG_TERM_BRANCH,
+ PERF_EVSEL__CONFIG_TERM_SKID_IP,
PERF_EVSEL__CONFIG_TERM_MAX,
};

@@ -66,6 +67,7 @@ struct perf_evsel_config_term {
bool inherit;
bool overwrite;
char *branch;
+ bool skid_ip;
} val;
};

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index a7fcd95961ef..1a1d9fc509bd 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -918,6 +918,7 @@ static const char *config_term_names[__PARSE_EVENTS__TERM_TYPE_NR] = {
[PARSE_EVENTS__TERM_TYPE_OVERWRITE] = "overwrite",
[PARSE_EVENTS__TERM_TYPE_NOOVERWRITE] = "no-overwrite",
[PARSE_EVENTS__TERM_TYPE_DRV_CFG] = "driver-config",
+ [PARSE_EVENTS__TERM_TYPE_SKID_IP] = "skid-ip",
};

static bool config_term_shrinked;
@@ -1026,6 +1027,9 @@ do { \
case PARSE_EVENTS__TERM_TYPE_MAX_STACK:
CHECK_TYPE_VAL(NUM);
break;
+ case PARSE_EVENTS__TERM_TYPE_SKID_IP:
+ CHECK_TYPE_VAL(NUM);
+ break;
default:
err->str = strdup("unknown term");
err->idx = term->err_term;
@@ -1159,6 +1163,9 @@ do { \
case PARSE_EVENTS__TERM_TYPE_DRV_CFG:
ADD_CONFIG_TERM(DRV_CFG, drv_cfg, term->val.str);
break;
+ case PARSE_EVENTS__TERM_TYPE_SKID_IP:
+ ADD_CONFIG_TERM(SKID_IP, skid_ip, term->val.num ? 1 : 0);
+ break;
default:
break;
}
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index be337c266697..d331f078a389 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -74,6 +74,7 @@ enum {
PARSE_EVENTS__TERM_TYPE_NOOVERWRITE,
PARSE_EVENTS__TERM_TYPE_OVERWRITE,
PARSE_EVENTS__TERM_TYPE_DRV_CFG,
+ PARSE_EVENTS__TERM_TYPE_SKID_IP,
__PARSE_EVENTS__TERM_TYPE_NR,
};

diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 241396cd059d..383443f4fe0d 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -257,6 +257,7 @@ inherit { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_INHERIT); }
no-inherit { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_NOINHERIT); }
overwrite { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_OVERWRITE); }
no-overwrite { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_NOOVERWRITE); }
+skid-ip { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_SKID_IP); }
, { return ','; }
"/" { BEGIN(INITIAL); return '/'; }
{name_minus} { return str(yyscanner, PE_NAME); }
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 5c412310f266..223165055d41 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1131,6 +1131,9 @@ static void dump_sample(struct perf_evsel *evsel, union perf_event *event,

if (sample_type & PERF_SAMPLE_READ)
sample_read__printf(sample, evsel->attr.read_format);
+
+ if (sample_type & PERF_SAMPLE_SKID_IP)
+ printf("... skid_ip: %" PRIu64 "\n", sample->skid_ip);
}

static void dump_read(struct perf_evsel *evsel, union perf_event *event)
--
2.7.4


From 1583478242737858563@xxx Wed Nov 08 06:28:50 +0000 2017
X-GM-THRID: 1583459324964515829
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread

2017-11-08 08:00:07

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v3 2/5] perf/x86: add PERF_SAMPLE_SKID_IP support for X86 PEBS

This patch adds support for SKID_IP for Intel x86 processors
when PEBS mode is enabled.

Signed-off-by: Stephane Eranian <[email protected]>
---
arch/x86/events/intel/ds.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 3674a4b6f8bd..52866a470b0d 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1190,6 +1190,13 @@ static void setup_pebs_sample_data(struct perf_event *event,
x86_pmu.intel_cap.pebs_format >= 1)
data->addr = pebs->dla;

+ /*
+ * unmodified, skid IP which is guaranteed to be the next
+ * dynamic instruction
+ */
+ if (sample_type & PERF_SAMPLE_SKID_IP)
+ data->skid_ip = pebs->ip;
+
if (x86_pmu.intel_cap.pebs_format >= 2) {
/* Only set the TSX weight when no memory weight. */
if ((sample_type & PERF_SAMPLE_WEIGHT) && !fll)
--
2.7.4


From 1583364856316342283@xxx Tue Nov 07 00:26:36 +0000 2017
X-GM-THRID: 1583364856316342283
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread

2017-11-08 07:58:33

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v3 4/5] perf/record: add documentation for sampling skid ip

This patch adds documentation to describe how to use the skid
ip support with perf record. The sample type can be provided
per event as follows: pmu_instance/...,skid-ip=1/

For instance on Intel X86:

$ perf record -e cpu/event=0xc5,skid-ip=1/pp

does record the precise address of retired branches and their target.

Signed-off-by: Stephane Eranian <[email protected]>
---
tools/perf/Documentation/perf-record.txt | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 5a626ef666c2..3b156fa03c99 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -57,6 +57,14 @@ OPTIONS
FP mode, "dwarf" for DWARF mode, "lbr" for LBR mode and
"no" for disable callgraph.
- 'stack-size': user stack size for dwarf mode
+ - 'skid_ip' : boolean, captures the unmodified interrupt instruction pointer
+ (IP) in each sample. Usually with event-based sampling, the IP
+ has skid and rarely point to the instruction which caused the
+ event to overflow. On some architectures, the hardware can eliminate
+ the skid and perf_events returns it as the IP with precise sampling is
+ enabled. But for certain measurements, it may be useful to have both
+ the correct and skid ip. This option enable capturing the skid ip in
+ additional to the corrected ip. Default is: false

See the linkperf:perf-list[1] man page for more parameters.

--
2.7.4


From 1583388206842246506@xxx Tue Nov 07 06:37:45 +0000 2017
X-GM-THRID: 1583372847586539879
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread