2015-12-02 18:28:40

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v1 0/2] perf/x86: fixes and improvement for Intel Atom PEBS support

This short series fixes total breakage of Intel Atom PEBS support in recent kernels.
The problems were introduced with the changes in the PEBS logic to handle
deeper buffer.

The first patch fixes PEBS and LBR problems, including NULL pointers, wrong pointer
arithmetic, and wrong pebs record layout assumption.

The second patch adds an alias for cycles:pp to Intel Atom given that perf record/top
uses cycles:pp nowadays.

Stephane Eranian (2):
perf/x86: fix PEBS and LBR issues on Intel Atom
perf/x86: enable cycles:pp for Intel Atom

arch/x86/kernel/cpu/perf_event_intel.c | 30 ++++++++++++++++++++++++++++++
arch/x86/kernel/cpu/perf_event_intel_ds.c | 11 ++++++++++-
arch/x86/kernel/cpu/perf_event_intel_lbr.c | 11 +++++++----
3 files changed, 47 insertions(+), 5 deletions(-)

--
2.5.0


2015-12-02 18:28:45

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v1 1/2] perf/x86: fix PEBS and LBR issues on Intel Atom

This patches fixes a number of problems in the PEBS
and LBR support of Intel Atom. Those bugs were introduced
by the recent changes to the PEBS code to handle multiple
entries.

The kernel was assuming that if the CPU support 64-bit format
LBR, then it has an LBR_SELECT MSR. Atom uses 64-bit LBR format
but does not have LBR_SELECT. That was causing NULL pointer
dereferences in a couple of places.

The kernel had a pointer arithmetic error in intel_pmu_drain_pebs_core()
when calculating the number of records present in the PEBS buffer.

The get_next_pebs_record_by_bit() was called on PEBS fm0 which does
not use the pebs_record_nhm layout.

This patch fixes all those problems and has PEBS and LBR working again.

Signed-off-by: Stephane Eranian <[email protected]>
---
arch/x86/kernel/cpu/perf_event_intel_ds.c | 9 ++++++++-
arch/x86/kernel/cpu/perf_event_intel_lbr.c | 11 +++++++----
2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 5db1c77..dae5f93 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -1101,6 +1101,13 @@ get_next_pebs_record_by_bit(void *base, void *top, int bit)
void *at;
u64 pebs_status;

+ /*
+ * fmt0 does not have a status bitfield (does not use
+ * perf_record_nhm format)
+ */
+ if (x86_pmu.intel_cap.pebs_format < 1)
+ return base;
+
if (base == NULL)
return NULL;

@@ -1186,7 +1193,7 @@ static void intel_pmu_drain_pebs_core(struct pt_regs *iregs)
if (!event->attr.precise_ip)
return;

- n = (top - at) / x86_pmu.pebs_record_size;
+ n = top - at;
if (n <= 0)
return;

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index e2fad0c..1390148 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -161,7 +161,7 @@ static void __intel_pmu_lbr_enable(bool pmi)
*/
if (cpuc->lbr_sel)
lbr_select = cpuc->lbr_sel->config & x86_pmu.lbr_sel_mask;
- if (!pmi)
+ if (!pmi && cpuc->lbr_sel)
wrmsrl(MSR_LBR_SELECT, lbr_select);

rdmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
@@ -430,7 +430,7 @@ static void intel_pmu_lbr_read_32(struct cpu_hw_events *cpuc)
*/
static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc)
{
- bool need_info = !(cpuc->lbr_sel->config & LBR_NO_INFO);
+ bool need_info = false;
unsigned long mask = x86_pmu.lbr_nr - 1;
int lbr_format = x86_pmu.intel_cap.lbr_format;
u64 tos = intel_pmu_lbr_tos();
@@ -438,8 +438,11 @@ static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc)
int out = 0;
int num = x86_pmu.lbr_nr;

- if (cpuc->lbr_sel->config & LBR_CALL_STACK)
- num = tos;
+ if (cpuc->lbr_sel) {
+ need_info = !(cpuc->lbr_sel->config & LBR_NO_INFO);
+ if (cpuc->lbr_sel->config & LBR_CALL_STACK)
+ num = tos;
+ }

for (i = 0; i < num; i++) {
unsigned long lbr_idx = (tos - i) & mask;
--
2.5.0

2015-12-02 18:29:00

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v1 2/2] perf/x86: add cycles:pp alias for Intel Atom

This patch updates the PEBS support for Intel Atom to provide
an alias for the cycles:pp event used by perf record/top by default
nowadays.

On Atom only INST_RETIRED:ANY supports PEBS, so we use this event
instead with a large cmask to count cycles.

Signed-off-by: Stephane Eranian <[email protected]>
---
arch/x86/kernel/cpu/perf_event_intel.c | 30 ++++++++++++++++++++++++++++++
arch/x86/kernel/cpu/perf_event_intel_ds.c | 2 ++
2 files changed, 32 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 61f2577..7ff1e30 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2475,6 +2475,35 @@ static void intel_pebs_aliases_snb(struct perf_event *event)
}
}

+static void intel_pebs_aliases_atom(struct perf_event *event)
+{
+ if ((event->hw.config & X86_RAW_EVENT_MASK) == 0x003c) {
+ /*
+ * Use an alternative encoding for CPU_CLK_UNHALTED.THREAD_P
+ * (0x003c) so that we can use it with PEBS.
+ *
+ * The regular CPU_CLK_UNHALTED.THREAD_P event (0x003c) isn't
+ * PEBS capable. However we can use UOPS_RETIRED.ALL
+ * (0x01c2), which is a PEBS capable event, to get the same
+ * count.
+ *
+ * INST_RETIRED.ANY counts the number of cycles that retires
+ * CNTMASK instructions. By setting CNTMASK to a value (16)
+ * larger than the maximum number of instructions that can be
+ * retired per cycle (4) and then inverting the condition, we
+ * count all cycles that retire 16 or less instructions, which
+ * is every cycle.
+ *
+ * Thereby we gain a PEBS capable cycle counter.
+ */
+ u64 alt_config = X86_CONFIG(.event=0xc0, .umask=0x00, .inv=1, .cmask=16);
+
+ alt_config |= (event->hw.config & ~X86_RAW_EVENT_MASK);
+ event->hw.config = alt_config;
+ }
+}
+
+
static unsigned long intel_pmu_free_running_flags(struct perf_event *event)
{
unsigned long flags = x86_pmu.free_running_flags;
@@ -3332,6 +3361,7 @@ __init int intel_pmu_init(void)

x86_pmu.event_constraints = intel_gen_event_constraints;
x86_pmu.pebs_constraints = intel_atom_pebs_event_constraints;
+ x86_pmu.pebs_aliases = intel_pebs_aliases_atom;
pr_cont("Atom events, ");
break;

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index dae5f93..1b748ee 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -620,6 +620,8 @@ struct event_constraint intel_atom_pebs_event_constraints[] = {
INTEL_FLAGS_EVENT_CONSTRAINT(0xcb, 0x1), /* MEM_LOAD_RETIRED.* */
/* INST_RETIRED.ANY_P, inv=1, cmask=16 (cycles:p). */
INTEL_FLAGS_EVENT_CONSTRAINT(0x108000c0, 0x01),
+ /* Allow all events as PEBS with no flags */
+ INTEL_ALL_EVENT_CONSTRAINT(0, 0x1),
EVENT_CONSTRAINT_END
};

--
2.5.0

2015-12-02 19:37:33

by Liang, Kan

[permalink] [raw]
Subject: RE: [PATCH v1 2/2] perf/x86: add cycles:pp alias for Intel Atom



> -----Original Message-----
> From: Stephane Eranian [mailto:[email protected]]
> Sent: Wednesday, December 02, 2015 1:28 PM
> To: [email protected]
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; Liang, Kan
> Subject: [PATCH v1 2/2] perf/x86: add cycles:pp alias for Intel Atom
>
> This patch updates the PEBS support for Intel Atom to provide an alias for
> the cycles:pp event used by perf record/top by default nowadays.
>
> On Atom only INST_RETIRED:ANY supports PEBS, so we use this event
> instead with a large cmask to count cycles.
>
> Signed-off-by: Stephane Eranian <[email protected]>
> ---
> arch/x86/kernel/cpu/perf_event_intel.c | 30
> ++++++++++++++++++++++++++++++
> arch/x86/kernel/cpu/perf_event_intel_ds.c | 2 ++
> 2 files changed, 32 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/perf_event_intel.c
> b/arch/x86/kernel/cpu/perf_event_intel.c
> index 61f2577..7ff1e30 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel.c
> @@ -2475,6 +2475,35 @@ static void intel_pebs_aliases_snb(struct
> perf_event *event)
> }
> }
>
> +static void intel_pebs_aliases_atom(struct perf_event *event) {
> + if ((event->hw.config & X86_RAW_EVENT_MASK) == 0x003c) {
> + /*
> + * Use an alternative encoding for
> CPU_CLK_UNHALTED.THREAD_P
> + * (0x003c) so that we can use it with PEBS.
> + *
> + * The regular CPU_CLK_UNHALTED.THREAD_P event
> (0x003c) isn't
> + * PEBS capable. However we can use UOPS_RETIRED.ALL
> + * (0x01c2), which is a PEBS capable event, to get the same

The comment isn't consistent with the code.

> + * count.
> + *
> + * INST_RETIRED.ANY counts the number of cycles that
> retires
> + * CNTMASK instructions. By setting CNTMASK to a value
> (16)
> + * larger than the maximum number of instructions that
> can be
> + * retired per cycle (4) and then inverting the condition, we
> + * count all cycles that retire 16 or less instructions, which
> + * is every cycle.
> + *
> + * Thereby we gain a PEBS capable cycle counter.
> + */
> + u64 alt_config =
> X86_CONFIG(.event=0xc0, .umask=0x00, .inv=1,
> +.cmask=16);
> +
> + alt_config |= (event->hw.config &
> ~X86_RAW_EVENT_MASK);
> + event->hw.config = alt_config;
> + }
> +}
> +
> +
> static unsigned long intel_pmu_free_running_flags(struct perf_event
> *event) {
> unsigned long flags = x86_pmu.free_running_flags; @@ -3332,6
> +3361,7 @@ __init int intel_pmu_init(void)
>
> x86_pmu.event_constraints =
> intel_gen_event_constraints;
> x86_pmu.pebs_constraints =
> intel_atom_pebs_event_constraints;
> + x86_pmu.pebs_aliases = intel_pebs_aliases_atom;

intel_pebs_aliases_atom looks the same as intel_pebs_aliases_core2.
Why not reuse the existing intel_pebs_aliases_core2?

Thanks,
Kan

2015-12-02 22:28:30

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH v1 2/2] perf/x86: add cycles:pp alias for Intel Atom

On Wed, Dec 2, 2015 at 11:37 AM, Liang, Kan <[email protected]> wrote:
>
>
>> -----Original Message-----
>> From: Stephane Eranian [mailto:[email protected]]
>> Sent: Wednesday, December 02, 2015 1:28 PM
>> To: [email protected]
>> Cc: [email protected]; [email protected]; [email protected];
>> [email protected]; Liang, Kan
>> Subject: [PATCH v1 2/2] perf/x86: add cycles:pp alias for Intel Atom
>>
>> This patch updates the PEBS support for Intel Atom to provide an alias for
>> the cycles:pp event used by perf record/top by default nowadays.
>>
>> On Atom only INST_RETIRED:ANY supports PEBS, so we use this event
>> instead with a large cmask to count cycles.
>>
>> Signed-off-by: Stephane Eranian <[email protected]>
>> ---
>> arch/x86/kernel/cpu/perf_event_intel.c | 30
>> ++++++++++++++++++++++++++++++
>> arch/x86/kernel/cpu/perf_event_intel_ds.c | 2 ++
>> 2 files changed, 32 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/perf_event_intel.c
>> b/arch/x86/kernel/cpu/perf_event_intel.c
>> index 61f2577..7ff1e30 100644
>> --- a/arch/x86/kernel/cpu/perf_event_intel.c
>> +++ b/arch/x86/kernel/cpu/perf_event_intel.c
>> @@ -2475,6 +2475,35 @@ static void intel_pebs_aliases_snb(struct
>> perf_event *event)
>> }
>> }
>>
>> +static void intel_pebs_aliases_atom(struct perf_event *event) {
>> + if ((event->hw.config & X86_RAW_EVENT_MASK) == 0x003c) {
>> + /*
>> + * Use an alternative encoding for
>> CPU_CLK_UNHALTED.THREAD_P
>> + * (0x003c) so that we can use it with PEBS.
>> + *
>> + * The regular CPU_CLK_UNHALTED.THREAD_P event
>> (0x003c) isn't
>> + * PEBS capable. However we can use UOPS_RETIRED.ALL
>> + * (0x01c2), which is a PEBS capable event, to get the same
>
> The comment isn't consistent with the code.
>
>> + * count.
>> + *
>> + * INST_RETIRED.ANY counts the number of cycles that
>> retires
>> + * CNTMASK instructions. By setting CNTMASK to a value
>> (16)
>> + * larger than the maximum number of instructions that
>> can be
>> + * retired per cycle (4) and then inverting the condition, we
>> + * count all cycles that retire 16 or less instructions, which
>> + * is every cycle.
>> + *
>> + * Thereby we gain a PEBS capable cycle counter.
>> + */
>> + u64 alt_config =
>> X86_CONFIG(.event=0xc0, .umask=0x00, .inv=1,
>> +.cmask=16);
>> +
>> + alt_config |= (event->hw.config &
>> ~X86_RAW_EVENT_MASK);
>> + event->hw.config = alt_config;
>> + }
>> +}
>> +
>> +
>> static unsigned long intel_pmu_free_running_flags(struct perf_event
>> *event) {
>> unsigned long flags = x86_pmu.free_running_flags; @@ -3332,6
>> +3361,7 @@ __init int intel_pmu_init(void)
>>
>> x86_pmu.event_constraints =
>> intel_gen_event_constraints;
>> x86_pmu.pebs_constraints =
>> intel_atom_pebs_event_constraints;
>> + x86_pmu.pebs_aliases = intel_pebs_aliases_atom;
>
> intel_pebs_aliases_atom looks the same as intel_pebs_aliases_core2.
> Why not reuse the existing intel_pebs_aliases_core2?
>
Sure, will do this in V2.