2021-08-17 22:13:09

by Kim Phillips

[permalink] [raw]
Subject: [PATCH 0/8] perf/amd: Fixes, uncore as a module, new IBS header

Hello Linux kernel events users and maintainers,

These are some miscellaneous updates for kernel-side perf for AMD:

- Miscellaneous IBS, power, uncore fixes
- Allow the uncore driver to be built as a module
- Add a new IBS header for use in the driver and the perf tool.

A patch series for the tool is being submitted seperately
from this series.

Thanks,

Kim

Kim Phillips (8):
arch/x86/events/Kconfig | 10 +++
arch/x86/events/amd/Makefile | 6 +-
arch/x86/events/amd/ibs.c | 32 ++++----
arch/x86/events/amd/power.c | 1 +
arch/x86/events/amd/uncore.c | 40 ++++++++--
arch/x86/include/asm/amd-ibs.h | 132 +++++++++++++++++++++++++++++++
arch/x86/include/asm/processor.h | 2 +
arch/x86/kernel/cpu/amd.c | 2 +-
arch/x86/kernel/cpu/common.c | 6 ++
9 files changed, 205 insertions(+), 26 deletions(-)
create mode 100644 arch/x86/include/asm/amd-ibs.h

Cc: Alexander Shishkin <[email protected]>
Cc: Andrew Murray <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Joao Martins <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]

--
2.31.1


2021-08-17 22:13:58

by Kim Phillips

[permalink] [raw]
Subject: [PATCH 2/8] perf/x86/amd/ibs: Add workaround for erratum #1,197

Erratum #1197 "IBS (Instruction Based Sampling) Register State May be
Incorrect After Restore From CC6" is published in a document available
at the link tag below:

"Revision Guide for AMD Family 19h Models 00h-0Fh Processors"
56683 Rev. 1.04 July 2021

Implement the erratum's suggested workaround and ignore IBS samples
if MSRC001_1031 == 0.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Kim Phillips <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andrew Murray <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Joao Martins <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
arch/x86/events/amd/ibs.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 8c25fbd5142e..222c890527a2 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -90,6 +90,7 @@ struct perf_ibs {
unsigned long offset_mask[1];
int offset_max;
unsigned int fetch_count_reset_broken : 1;
+ unsigned int fetch_ignore_if_zero_rip : 1;
struct cpu_perf_ibs __percpu *pcpu;

struct attribute **format_attrs;
@@ -673,6 +674,10 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
if (check_rip && (ibs_data.regs[2] & IBS_RIP_INVALID)) {
regs.flags &= ~PERF_EFLAGS_EXACT;
} else {
+ /* Workaround for erratum #1,197 */
+ if (perf_ibs->fetch_ignore_if_zero_rip && !(ibs_data.regs[1]))
+ goto out;
+
set_linear_ip(&regs, ibs_data.regs[1]);
regs.flags |= PERF_EFLAGS_EXACT;
}
@@ -770,6 +775,9 @@ static __init void perf_event_ibs_init(void)
if (boot_cpu_data.x86 >= 0x16 && boot_cpu_data.x86 <= 0x18)
perf_ibs_fetch.fetch_count_reset_broken = 1;

+ if (boot_cpu_data.x86 == 0x19 && boot_cpu_data.x86_model < 0x10)
+ perf_ibs_fetch.fetch_ignore_if_zero_rip = 1;
+
perf_ibs_pmu_init(&perf_ibs_fetch, "ibs_fetch");

if (ibs_caps & IBS_CAPS_OPCNT) {
--
2.31.1

2021-08-17 22:14:16

by Kim Phillips

[permalink] [raw]
Subject: [PATCH 3/8] perf/x86/amd/power: Assign pmu.module

Assign pmu.module so the driver can't be unloaded whilst in use.

Signed-off-by: Kim Phillips <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andrew Murray <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Joao Martins <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
arch/x86/events/amd/power.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/events/amd/power.c b/arch/x86/events/amd/power.c
index 16a2369c586e..37d5b380516e 100644
--- a/arch/x86/events/amd/power.c
+++ b/arch/x86/events/amd/power.c
@@ -213,6 +213,7 @@ static struct pmu pmu_class = {
.stop = pmu_event_stop,
.read = pmu_event_read,
.capabilities = PERF_PMU_CAP_NO_EXCLUDE,
+ .module = THIS_MODULE,
};

static int power_cpu_exit(unsigned int cpu)
--
2.31.1

2021-08-17 22:14:48

by Kim Phillips

[permalink] [raw]
Subject: [PATCH 5/8] perf/amd/uncore: Use linux/ include paths instead of asm/

Found by checkpatch.

Signed-off-by: Kim Phillips <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andrew Murray <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Joao Martins <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
arch/x86/events/amd/uncore.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
index 05bdb4cba300..7fb50ad171e9 100644
--- a/arch/x86/events/amd/uncore.c
+++ b/arch/x86/events/amd/uncore.c
@@ -12,11 +12,11 @@
#include <linux/init.h>
#include <linux/cpu.h>
#include <linux/cpumask.h>
+#include <linux/cpufeature.h>
+#include <linux/smp.h>

-#include <asm/cpufeature.h>
#include <asm/perf_event.h>
#include <asm/msr.h>
-#include <asm/smp.h>

#define NUM_COUNTERS_NB 4
#define NUM_COUNTERS_L2 4
--
2.31.1

2021-08-17 22:17:09

by Kim Phillips

[permalink] [raw]
Subject: [PATCH 4/8] perf/amd/uncore: Use free_percpu's built-in check for null

free_percpu() has its own check for null.

Signed-off-by: Kim Phillips <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andrew Murray <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Joao Martins <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
arch/x86/events/amd/uncore.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
index 582c0ffb5e98..05bdb4cba300 100644
--- a/arch/x86/events/amd/uncore.c
+++ b/arch/x86/events/amd/uncore.c
@@ -659,11 +659,9 @@ static int __init amd_uncore_init(void)
fail_llc:
if (boot_cpu_has(X86_FEATURE_PERFCTR_NB))
perf_pmu_unregister(&amd_nb_pmu);
- if (amd_uncore_llc)
- free_percpu(amd_uncore_llc);
+ free_percpu(amd_uncore_llc);
fail_nb:
- if (amd_uncore_nb)
- free_percpu(amd_uncore_nb);
+ free_percpu(amd_uncore_nb);

return ret;
}
--
2.31.1

2021-08-17 22:17:29

by Kim Phillips

[permalink] [raw]
Subject: [PATCH 6/8] x86/cpu: Add helper function get_llc_id

Factor out a helper function rather than export cpu_llc_id, which is
needed in order to be able to build the AMD uncore driver as a module.

Signed-off-by: Kim Phillips <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andrew Murray <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Joao Martins <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
arch/x86/events/amd/uncore.c | 2 +-
arch/x86/include/asm/processor.h | 2 ++
arch/x86/kernel/cpu/amd.c | 2 +-
arch/x86/kernel/cpu/common.c | 6 ++++++
4 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
index 7fb50ad171e9..a01f9f1016d9 100644
--- a/arch/x86/events/amd/uncore.c
+++ b/arch/x86/events/amd/uncore.c
@@ -452,7 +452,7 @@ static int amd_uncore_cpu_starting(unsigned int cpu)

if (amd_uncore_llc) {
uncore = *per_cpu_ptr(amd_uncore_llc, cpu);
- uncore->id = per_cpu(cpu_llc_id, cpu);
+ uncore->id = get_llc_id(cpu);

uncore = amd_uncore_find_online_sibling(uncore, amd_uncore_llc);
*per_cpu_ptr(amd_uncore_llc, cpu) = uncore;
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 1e0d13c9fda6..9ad2acaaae9b 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -797,6 +797,8 @@ extern int set_tsc_mode(unsigned int val);

DECLARE_PER_CPU(u64, msr_misc_features_shadow);

+extern u16 get_llc_id(unsigned int cpu);
+
#ifdef CONFIG_CPU_SUP_AMD
extern u32 amd_get_nodes_per_socket(void);
extern u32 amd_get_highest_perf(void);
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index b7c003013d41..2131af9f2fa2 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -438,7 +438,7 @@ static void srat_detect_node(struct cpuinfo_x86 *c)

node = numa_cpu_node(cpu);
if (node == NUMA_NO_NODE)
- node = per_cpu(cpu_llc_id, cpu);
+ node = get_llc_id(cpu);

/*
* On multi-fabric platform (e.g. Numascale NumaChip) a
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 64b805bd6a54..0f8885949e8c 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -79,6 +79,12 @@ EXPORT_SYMBOL(smp_num_siblings);
/* Last level cache ID of each logical CPU */
DEFINE_PER_CPU_READ_MOSTLY(u16, cpu_llc_id) = BAD_APICID;

+u16 get_llc_id(unsigned int cpu)
+{
+ return per_cpu(cpu_llc_id, cpu);
+}
+EXPORT_SYMBOL_GPL(get_llc_id);
+
/* correctly size the local cpu masks */
void __init setup_cpu_local_masks(void)
{
--
2.31.1

2021-08-17 22:17:36

by Kim Phillips

[permalink] [raw]
Subject: [PATCH 7/8] perf/amd/uncore: Allow the driver to be built as a module

Add support to build the AMD uncore driver as a module.
This is in order to facilitate development.

Signed-off-by: Kim Phillips <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andrew Murray <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Joao Martins <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
arch/x86/events/Kconfig | 10 ++++++++++
arch/x86/events/amd/Makefile | 6 +++---
arch/x86/events/amd/uncore.c | 28 +++++++++++++++++++++++++++-
3 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/arch/x86/events/Kconfig b/arch/x86/events/Kconfig
index 39d9ded9e25a..d6cdfe631674 100644
--- a/arch/x86/events/Kconfig
+++ b/arch/x86/events/Kconfig
@@ -34,4 +34,14 @@ config PERF_EVENTS_AMD_POWER
(CPUID Fn8000_0007_EDX[12]) interface to calculate the
average power consumption on Family 15h processors.

+config PERF_EVENTS_AMD_UNCORE
+ tristate "AMD Uncore performance events"
+ depends on PERF_EVENTS && CPU_SUP_AMD
+ default y
+ help
+ Include support for AMD uncore performance events for use with
+ e.g., perf stat -e amd_l3/.../,amd_df/.../.
+
+ To compile this driver as a module, choose M here: the
+ module will be called 'amd-uncore'.
endmenu
diff --git a/arch/x86/events/amd/Makefile b/arch/x86/events/amd/Makefile
index fe8795a67385..ec45a12deb8b 100644
--- a/arch/x86/events/amd/Makefile
+++ b/arch/x86/events/amd/Makefile
@@ -1,8 +1,8 @@
# SPDX-License-Identifier: GPL-2.0
-obj-$(CONFIG_CPU_SUP_AMD) += core.o uncore.o
+obj-$(CONFIG_CPU_SUP_AMD) += core.o ibs.o
obj-$(CONFIG_PERF_EVENTS_AMD_POWER) += power.o
-obj-$(CONFIG_X86_LOCAL_APIC) += ibs.o
+obj-$(CONFIG_PERF_EVENTS_AMD_UNCORE) += amd-uncore.o
+amd-uncore-objs := uncore.o
ifdef CONFIG_AMD_IOMMU
obj-$(CONFIG_CPU_SUP_AMD) += iommu.o
endif
-
diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
index a01f9f1016d9..0d04414b97d2 100644
--- a/arch/x86/events/amd/uncore.c
+++ b/arch/x86/events/amd/uncore.c
@@ -347,6 +347,7 @@ static struct pmu amd_nb_pmu = {
.stop = amd_uncore_stop,
.read = amd_uncore_read,
.capabilities = PERF_PMU_CAP_NO_EXCLUDE | PERF_PMU_CAP_NO_INTERRUPT,
+ .module = THIS_MODULE,
};

static struct pmu amd_llc_pmu = {
@@ -360,6 +361,7 @@ static struct pmu amd_llc_pmu = {
.stop = amd_uncore_stop,
.read = amd_uncore_read,
.capabilities = PERF_PMU_CAP_NO_EXCLUDE | PERF_PMU_CAP_NO_INTERRUPT,
+ .module = THIS_MODULE,
};

static struct amd_uncore *amd_uncore_alloc(unsigned int cpu)
@@ -665,4 +667,28 @@ static int __init amd_uncore_init(void)

return ret;
}
-device_initcall(amd_uncore_init);
+
+static void __exit amd_uncore_exit(void)
+{
+ cpuhp_remove_state(CPUHP_AP_PERF_X86_AMD_UNCORE_ONLINE);
+ cpuhp_remove_state(CPUHP_AP_PERF_X86_AMD_UNCORE_STARTING);
+ cpuhp_remove_state(CPUHP_PERF_X86_AMD_UNCORE_PREP);
+
+ if (boot_cpu_has(X86_FEATURE_PERFCTR_LLC)) {
+ perf_pmu_unregister(&amd_llc_pmu);
+ free_percpu(amd_uncore_llc);
+ amd_uncore_llc = NULL;
+ }
+
+ if (boot_cpu_has(X86_FEATURE_PERFCTR_NB)) {
+ perf_pmu_unregister(&amd_nb_pmu);
+ free_percpu(amd_uncore_nb);
+ amd_uncore_nb = NULL;
+ }
+}
+
+module_init(amd_uncore_init);
+module_exit(amd_uncore_exit);
+
+MODULE_DESCRIPTION("AMD Uncore Driver");
+MODULE_LICENSE("GPL v2");
--
2.31.1

2021-08-17 22:18:07

by Kim Phillips

[permalink] [raw]
Subject: [PATCH 8/8] perf/x86/amd/ibs: Add bitfield definitions in new header

Add arch/x86/include/asm/amd-ibs.h with bitfield definitions for
IBS MSRs, and demonstrate usage within the driver.

Also move struct perf_ibs_data where it can be shared with
the perf tool that will soon be using it.

No functional changes.

Signed-off-by: Kim Phillips <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andrew Murray <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Joao Martins <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
arch/x86/events/amd/ibs.c | 23 +++---
arch/x86/include/asm/amd-ibs.h | 132 +++++++++++++++++++++++++++++++++
2 files changed, 141 insertions(+), 14 deletions(-)
create mode 100644 arch/x86/include/asm/amd-ibs.h

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 222c890527a2..4fc85cdaa27a 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -26,6 +26,7 @@ static u32 ibs_caps;
#include <linux/hardirq.h>

#include <asm/nmi.h>
+#include <asm/amd-ibs.h>

#define IBS_FETCH_CONFIG_MASK (IBS_FETCH_RAND_EN | IBS_FETCH_MAX_CNT)
#define IBS_OP_CONFIG_MASK IBS_OP_MAX_CNT
@@ -100,15 +101,6 @@ struct perf_ibs {
u64 (*get_count)(u64 config);
};

-struct perf_ibs_data {
- u32 size;
- union {
- u32 data[0]; /* data buffer starts here */
- u32 caps;
- };
- u64 regs[MSR_AMD64_IBS_REG_COUNT_MAX];
-};
-
static int
perf_event_set_period(struct hw_perf_event *hwc, u64 min, u64 max, u64 *hw_period)
{
@@ -329,11 +321,14 @@ static int perf_ibs_set_period(struct perf_ibs *perf_ibs,

static u64 get_ibs_fetch_count(u64 config)
{
- return (config & IBS_FETCH_CNT) >> 12;
+ union ibs_fetch_ctl fetch_ctl = (union ibs_fetch_ctl)config;
+
+ return fetch_ctl.fetch_cnt << 4;
}

static u64 get_ibs_op_count(u64 config)
{
+ union ibs_op_ctl op_ctl = (union ibs_op_ctl)config;
u64 count = 0;

/*
@@ -341,12 +336,12 @@ static u64 get_ibs_op_count(u64 config)
* and the lower 7 bits of CurCnt are randomized.
* Otherwise CurCnt has the full 27-bit current counter value.
*/
- if (config & IBS_OP_VAL) {
- count = (config & IBS_OP_MAX_CNT) << 4;
+ if (op_ctl.op_val) {
+ count = op_ctl.opmaxcnt << 4;
if (ibs_caps & IBS_CAPS_OPCNTEXT)
- count += config & IBS_OP_MAX_CNT_EXT_MASK;
+ count += op_ctl.opmaxcnt_ext << 20;
} else if (ibs_caps & IBS_CAPS_RDWROPCNT) {
- count = (config & IBS_OP_CUR_CNT) >> 32;
+ count = op_ctl.opcurcnt;
}

return count;
diff --git a/arch/x86/include/asm/amd-ibs.h b/arch/x86/include/asm/amd-ibs.h
new file mode 100644
index 000000000000..46e1df45efc0
--- /dev/null
+++ b/arch/x86/include/asm/amd-ibs.h
@@ -0,0 +1,132 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * From PPR Vol 1 for AMD Family 19h Model 01h B1
+ * 55898 Rev 0.35 - Feb 5, 2021
+ */
+
+#include <asm/msr-index.h>
+
+/*
+ * IBS Hardware MSRs
+ */
+
+/* MSR 0xc0011030: IBS Fetch Control */
+union ibs_fetch_ctl {
+ __u64 val;
+ struct {
+ __u64 fetch_maxcnt:16,/* 0-15: instruction fetch max. count */
+ fetch_cnt:16, /* 16-31: instruction fetch count */
+ fetch_lat:16, /* 32-47: instruction fetch latency */
+ fetch_en:1, /* 48: instruction fetch enable */
+ fetch_val:1, /* 49: instruction fetch valid */
+ fetch_comp:1, /* 50: instruction fetch complete */
+ ic_miss:1, /* 51: i-cache miss */
+ phy_addr_valid:1,/* 52: physical address valid */
+ l1tlb_pgsz:2, /* 53-54: i-cache L1TLB page size
+ * (needs IbsPhyAddrValid) */
+ l1tlb_miss:1, /* 55: i-cache fetch missed in L1TLB */
+ l2tlb_miss:1, /* 56: i-cache fetch missed in L2TLB */
+ rand_en:1, /* 57: random tagging enable */
+ fetch_l2_miss:1,/* 58: L2 miss for sampled fetch
+ * (needs IbsFetchComp) */
+ reserved:5; /* 59-63: reserved */
+ };
+};
+
+/* MSR 0xc0011033: IBS Execution Control */
+union ibs_op_ctl {
+ __u64 val;
+ struct {
+ __u64 opmaxcnt:16, /* 0-15: periodic op max. count */
+ reserved0:1, /* 16: reserved */
+ op_en:1, /* 17: op sampling enable */
+ op_val:1, /* 18: op sample valid */
+ cnt_ctl:1, /* 19: periodic op counter control */
+ opmaxcnt_ext:7, /* 20-26: upper 7 bits of periodic op maximum count */
+ reserved1:5, /* 27-31: reserved */
+ opcurcnt:27, /* 32-58: periodic op counter current count */
+ reserved2:5; /* 59-63: reserved */
+ };
+};
+
+/* MSR 0xc0011035: IBS Op Data 2 */
+union ibs_op_data {
+ __u64 val;
+ struct {
+ __u64 comp_to_ret_ctr:16, /* 0-15: op completion to retire count */
+ tag_to_ret_ctr:16, /* 15-31: op tag to retire count */
+ reserved1:2, /* 32-33: reserved */
+ op_return:1, /* 34: return op */
+ op_brn_taken:1, /* 35: taken branch op */
+ op_brn_misp:1, /* 36: mispredicted branch op */
+ op_brn_ret:1, /* 37: branch op retired */
+ op_rip_invalid:1, /* 38: RIP is invalid */
+ op_brn_fuse:1, /* 39: fused branch op */
+ op_microcode:1, /* 40: microcode op */
+ reserved2:23; /* 41-63: reserved */
+ };
+};
+
+/* MSR 0xc0011036: IBS Op Data 2 */
+union ibs_op_data2 {
+ __u64 val;
+ struct {
+ __u64 data_src:3, /* 0-2: data source */
+ reserved0:1, /* 3: reserved */
+ rmt_node:1, /* 4: destination node */
+ cache_hit_st:1, /* 5: cache hit state */
+ reserved1:57; /* 5-63: reserved */
+ };
+};
+
+/* MSR 0xc0011037: IBS Op Data 3 */
+union ibs_op_data3 {
+ __u64 val;
+ struct {
+ __u64 ld_op:1, /* 0: load op */
+ st_op:1, /* 1: store op */
+ dc_l1tlb_miss:1, /* 2: data cache L1TLB miss */
+ dc_l2tlb_miss:1, /* 3: data cache L2TLB hit in 2M page */
+ dc_l1tlb_hit_2m:1, /* 4: data cache L1TLB hit in 2M page */
+ dc_l1tlb_hit_1g:1, /* 5: data cache L1TLB hit in 1G page */
+ dc_l2tlb_hit_2m:1, /* 6: data cache L2TLB hit in 2M page */
+ dc_miss:1, /* 7: data cache miss */
+ dc_mis_acc:1, /* 8: misaligned access */
+ reserved:4, /* 9-12: reserved */
+ dc_wc_mem_acc:1, /* 13: write combining memory access */
+ dc_uc_mem_acc:1, /* 14: uncacheable memory access */
+ dc_locked_op:1, /* 15: locked operation */
+ dc_miss_no_mab_alloc:1, /* 16: DC miss with no MAB allocated */
+ dc_lin_addr_valid:1, /* 17: data cache linear address valid */
+ dc_phy_addr_valid:1, /* 18: data cache physical address valid */
+ dc_l2_tlb_hit_1g:1, /* 19: data cache L2 hit in 1GB page */
+ l2_miss:1, /* 20: L2 cache miss */
+ sw_pf:1, /* 21: software prefetch */
+ op_mem_width:4, /* 22-25: load/store size in bytes */
+ op_dc_miss_open_mem_reqs:6, /* 26-31: outstanding mem reqs on DC fill */
+ dc_miss_lat:16, /* 32-47: data cache miss latency */
+ tlb_refill_lat:16; /* 48-63: L1 TLB refill latency */
+ };
+};
+
+/* MSR 0xc001103c: IBS Fetch Control Extended */
+union ic_ibs_extd_ctl {
+ __u64 val;
+ struct {
+ __u64 itlb_refill_lat:16, /* 0-15: ITLB Refill latency for sampled fetch */
+ reserved:48; /* 16-63: reserved */
+ };
+};
+
+/*
+ * IBS driver related
+ */
+
+struct perf_ibs_data {
+ u32 size;
+ union {
+ u32 data[0]; /* data buffer starts here */
+ u32 caps;
+ };
+ u64 regs[MSR_AMD64_IBS_REG_COUNT_MAX];
+};
--
2.31.1

2021-08-19 22:31:24

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 2/8] perf/x86/amd/ibs: Add workaround for erratum #1,197

Hello,

On Tue, Aug 17, 2021 at 3:11 PM Kim Phillips <[email protected]> wrote:
>
> Erratum #1197 "IBS (Instruction Based Sampling) Register State May be
> Incorrect After Restore From CC6" is published in a document available
> at the link tag below:
>
> "Revision Guide for AMD Family 19h Models 00h-0Fh Processors"
> 56683 Rev. 1.04 July 2021
>
> Implement the erratum's suggested workaround and ignore IBS samples
> if MSRC001_1031 == 0.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Kim Phillips <[email protected]>
> Cc: Alexander Shishkin <[email protected]>
> Cc: Andrew Murray <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Boris Ostrovsky <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Ian Rogers <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Jiri Olsa <[email protected]>
> Cc: Joao Martins <[email protected]>
> Cc: Konrad Rzeszutek Wilk <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Michael Petlan <[email protected]>
> Cc: Namhyung Kim <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Robert Richter <[email protected]>
> Cc: Stephane Eranian <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> ---
> arch/x86/events/amd/ibs.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
> index 8c25fbd5142e..222c890527a2 100644
> --- a/arch/x86/events/amd/ibs.c
> +++ b/arch/x86/events/amd/ibs.c
> @@ -90,6 +90,7 @@ struct perf_ibs {
> unsigned long offset_mask[1];
> int offset_max;
> unsigned int fetch_count_reset_broken : 1;
> + unsigned int fetch_ignore_if_zero_rip : 1;
> struct cpu_perf_ibs __percpu *pcpu;
>
> struct attribute **format_attrs;
> @@ -673,6 +674,10 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
> if (check_rip && (ibs_data.regs[2] & IBS_RIP_INVALID)) {
> regs.flags &= ~PERF_EFLAGS_EXACT;
> } else {
> + /* Workaround for erratum #1,197 */
> + if (perf_ibs->fetch_ignore_if_zero_rip && !(ibs_data.regs[1]))
> + goto out;

Can we just use the iregs.ip instead of dropping the sample?
Users might care about the accurate number of samples..

Thanks,
Namhyung


> +
> set_linear_ip(&regs, ibs_data.regs[1]);
> regs.flags |= PERF_EFLAGS_EXACT;
> }
> @@ -770,6 +775,9 @@ static __init void perf_event_ibs_init(void)
> if (boot_cpu_data.x86 >= 0x16 && boot_cpu_data.x86 <= 0x18)
> perf_ibs_fetch.fetch_count_reset_broken = 1;
>
> + if (boot_cpu_data.x86 == 0x19 && boot_cpu_data.x86_model < 0x10)
> + perf_ibs_fetch.fetch_ignore_if_zero_rip = 1;
> +
> perf_ibs_pmu_init(&perf_ibs_fetch, "ibs_fetch");
>
> if (ibs_caps & IBS_CAPS_OPCNT) {
> --
> 2.31.1
>

2021-08-19 23:00:56

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 8/8] perf/x86/amd/ibs: Add bitfield definitions in new header

On Tue, Aug 17, 2021 at 3:12 PM Kim Phillips <[email protected]> wrote:
>
> Add arch/x86/include/asm/amd-ibs.h with bitfield definitions for
> IBS MSRs, and demonstrate usage within the driver.
>
> Also move struct perf_ibs_data where it can be shared with
> the perf tool that will soon be using it.
>
> No functional changes.
>
> Signed-off-by: Kim Phillips <[email protected]>
> Cc: Alexander Shishkin <[email protected]>
> Cc: Andrew Murray <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Boris Ostrovsky <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Ian Rogers <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Jiri Olsa <[email protected]>
> Cc: Joao Martins <[email protected]>
> Cc: Konrad Rzeszutek Wilk <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Michael Petlan <[email protected]>
> Cc: Namhyung Kim <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Robert Richter <[email protected]>
> Cc: Stephane Eranian <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> ---
[SNIP]
> diff --git a/arch/x86/include/asm/amd-ibs.h b/arch/x86/include/asm/amd-ibs.h
> new file mode 100644
> index 000000000000..46e1df45efc0
> --- /dev/null
> +++ b/arch/x86/include/asm/amd-ibs.h
> @@ -0,0 +1,132 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * From PPR Vol 1 for AMD Family 19h Model 01h B1
> + * 55898 Rev 0.35 - Feb 5, 2021
> + */
> +
> +#include <asm/msr-index.h>
> +
> +/*
> + * IBS Hardware MSRs
> + */
> +
> +/* MSR 0xc0011030: IBS Fetch Control */
> +union ibs_fetch_ctl {
> + __u64 val;
> + struct {
> + __u64 fetch_maxcnt:16,/* 0-15: instruction fetch max. count */
> + fetch_cnt:16, /* 16-31: instruction fetch count */
> + fetch_lat:16, /* 32-47: instruction fetch latency */
> + fetch_en:1, /* 48: instruction fetch enable */
> + fetch_val:1, /* 49: instruction fetch valid */
> + fetch_comp:1, /* 50: instruction fetch complete */
> + ic_miss:1, /* 51: i-cache miss */
> + phy_addr_valid:1,/* 52: physical address valid */
> + l1tlb_pgsz:2, /* 53-54: i-cache L1TLB page size
> + * (needs IbsPhyAddrValid) */

What about adding an enum for the page size?

> + l1tlb_miss:1, /* 55: i-cache fetch missed in L1TLB */
> + l2tlb_miss:1, /* 56: i-cache fetch missed in L2TLB */
> + rand_en:1, /* 57: random tagging enable */
> + fetch_l2_miss:1,/* 58: L2 miss for sampled fetch
> + * (needs IbsFetchComp) */
> + reserved:5; /* 59-63: reserved */
> + };
> +};
> +
> +/* MSR 0xc0011033: IBS Execution Control */
> +union ibs_op_ctl {
> + __u64 val;
> + struct {
> + __u64 opmaxcnt:16, /* 0-15: periodic op max. count */
> + reserved0:1, /* 16: reserved */
> + op_en:1, /* 17: op sampling enable */
> + op_val:1, /* 18: op sample valid */
> + cnt_ctl:1, /* 19: periodic op counter control */
> + opmaxcnt_ext:7, /* 20-26: upper 7 bits of periodic op maximum count */
> + reserved1:5, /* 27-31: reserved */
> + opcurcnt:27, /* 32-58: periodic op counter current count */
> + reserved2:5; /* 59-63: reserved */
> + };
> +};
> +
> +/* MSR 0xc0011035: IBS Op Data 2 */
> +union ibs_op_data {
> + __u64 val;
> + struct {
> + __u64 comp_to_ret_ctr:16, /* 0-15: op completion to retire count */
> + tag_to_ret_ctr:16, /* 15-31: op tag to retire count */
> + reserved1:2, /* 32-33: reserved */
> + op_return:1, /* 34: return op */
> + op_brn_taken:1, /* 35: taken branch op */
> + op_brn_misp:1, /* 36: mispredicted branch op */
> + op_brn_ret:1, /* 37: branch op retired */
> + op_rip_invalid:1, /* 38: RIP is invalid */
> + op_brn_fuse:1, /* 39: fused branch op */
> + op_microcode:1, /* 40: microcode op */
> + reserved2:23; /* 41-63: reserved */
> + };
> +};
> +
> +/* MSR 0xc0011036: IBS Op Data 2 */
> +union ibs_op_data2 {
> + __u64 val;
> + struct {
> + __u64 data_src:3, /* 0-2: data source */

and for data source too.

Thanks,
Namhyung


> + reserved0:1, /* 3: reserved */
> + rmt_node:1, /* 4: destination node */
> + cache_hit_st:1, /* 5: cache hit state */
> + reserved1:57; /* 5-63: reserved */
> + };
> +};
> +
> +/* MSR 0xc0011037: IBS Op Data 3 */
> +union ibs_op_data3 {
> + __u64 val;
> + struct {
> + __u64 ld_op:1, /* 0: load op */
> + st_op:1, /* 1: store op */
> + dc_l1tlb_miss:1, /* 2: data cache L1TLB miss */
> + dc_l2tlb_miss:1, /* 3: data cache L2TLB hit in 2M page */
> + dc_l1tlb_hit_2m:1, /* 4: data cache L1TLB hit in 2M page */
> + dc_l1tlb_hit_1g:1, /* 5: data cache L1TLB hit in 1G page */
> + dc_l2tlb_hit_2m:1, /* 6: data cache L2TLB hit in 2M page */
> + dc_miss:1, /* 7: data cache miss */
> + dc_mis_acc:1, /* 8: misaligned access */
> + reserved:4, /* 9-12: reserved */
> + dc_wc_mem_acc:1, /* 13: write combining memory access */
> + dc_uc_mem_acc:1, /* 14: uncacheable memory access */
> + dc_locked_op:1, /* 15: locked operation */
> + dc_miss_no_mab_alloc:1, /* 16: DC miss with no MAB allocated */
> + dc_lin_addr_valid:1, /* 17: data cache linear address valid */
> + dc_phy_addr_valid:1, /* 18: data cache physical address valid */
> + dc_l2_tlb_hit_1g:1, /* 19: data cache L2 hit in 1GB page */
> + l2_miss:1, /* 20: L2 cache miss */
> + sw_pf:1, /* 21: software prefetch */
> + op_mem_width:4, /* 22-25: load/store size in bytes */
> + op_dc_miss_open_mem_reqs:6, /* 26-31: outstanding mem reqs on DC fill */
> + dc_miss_lat:16, /* 32-47: data cache miss latency */
> + tlb_refill_lat:16; /* 48-63: L1 TLB refill latency */
> + };
> +};
> +
> +/* MSR 0xc001103c: IBS Fetch Control Extended */
> +union ic_ibs_extd_ctl {
> + __u64 val;
> + struct {
> + __u64 itlb_refill_lat:16, /* 0-15: ITLB Refill latency for sampled fetch */
> + reserved:48; /* 16-63: reserved */
> + };
> +};
> +
> +/*
> + * IBS driver related
> + */
> +
> +struct perf_ibs_data {
> + u32 size;
> + union {
> + u32 data[0]; /* data buffer starts here */
> + u32 caps;
> + };
> + u64 regs[MSR_AMD64_IBS_REG_COUNT_MAX];
> +};
> --
> 2.31.1
>

Subject: [tip: perf/core] perf/amd/uncore: Use linux/ include paths instead of asm/

The following commit has been merged into the perf/core branch of tip:

Commit-ID: e1706b30e939d3e62c2a2b9415ed9c6313bbb8e9
Gitweb: https://git.kernel.org/tip/e1706b30e939d3e62c2a2b9415ed9c6313bbb8e9
Author: Kim Phillips <[email protected]>
AuthorDate: Tue, 17 Aug 2021 17:10:45 -05:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Fri, 20 Aug 2021 12:33:13 +02:00

perf/amd/uncore: Use linux/ include paths instead of asm/

Found by checkpatch.

Signed-off-by: Kim Phillips <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/events/amd/uncore.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
index 05bdb4c..7fb50ad 100644
--- a/arch/x86/events/amd/uncore.c
+++ b/arch/x86/events/amd/uncore.c
@@ -12,11 +12,11 @@
#include <linux/init.h>
#include <linux/cpu.h>
#include <linux/cpumask.h>
+#include <linux/cpufeature.h>
+#include <linux/smp.h>

-#include <asm/cpufeature.h>
#include <asm/perf_event.h>
#include <asm/msr.h>
-#include <asm/smp.h>

#define NUM_COUNTERS_NB 4
#define NUM_COUNTERS_L2 4

Subject: [tip: perf/core] x86/cpu: Add helper function get_llc_id

The following commit has been merged into the perf/core branch of tip:

Commit-ID: f644500512b6b3838091ddc2cfe61a4110e7778e
Gitweb: https://git.kernel.org/tip/f644500512b6b3838091ddc2cfe61a4110e7778e
Author: Kim Phillips <[email protected]>
AuthorDate: Tue, 17 Aug 2021 17:10:46 -05:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Fri, 20 Aug 2021 12:33:13 +02:00

x86/cpu: Add helper function get_llc_id

Factor out a helper function rather than export cpu_llc_id, which is
needed in order to be able to build the AMD uncore driver as a module.

Signed-off-by: Kim Phillips <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/events/amd/uncore.c | 2 +-
arch/x86/include/asm/processor.h | 2 ++
arch/x86/kernel/cpu/amd.c | 2 +-
arch/x86/kernel/cpu/common.c | 6 ++++++
4 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
index 7fb50ad..a01f9f1 100644
--- a/arch/x86/events/amd/uncore.c
+++ b/arch/x86/events/amd/uncore.c
@@ -452,7 +452,7 @@ static int amd_uncore_cpu_starting(unsigned int cpu)

if (amd_uncore_llc) {
uncore = *per_cpu_ptr(amd_uncore_llc, cpu);
- uncore->id = per_cpu(cpu_llc_id, cpu);
+ uncore->id = get_llc_id(cpu);

uncore = amd_uncore_find_online_sibling(uncore, amd_uncore_llc);
*per_cpu_ptr(amd_uncore_llc, cpu) = uncore;
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 154321d..fa2f7ee 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -785,6 +785,8 @@ extern int set_tsc_mode(unsigned int val);

DECLARE_PER_CPU(u64, msr_misc_features_shadow);

+extern u16 get_llc_id(unsigned int cpu);
+
#ifdef CONFIG_CPU_SUP_AMD
extern u32 amd_get_nodes_per_socket(void);
#else
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 2d11384..1d83024 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -438,7 +438,7 @@ static void srat_detect_node(struct cpuinfo_x86 *c)

node = numa_cpu_node(cpu);
if (node == NUMA_NO_NODE)
- node = per_cpu(cpu_llc_id, cpu);
+ node = get_llc_id(cpu);

/*
* On multi-fabric platform (e.g. Numascale NumaChip) a
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index a1b756c..684d4e1 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -78,6 +78,12 @@ EXPORT_SYMBOL(smp_num_siblings);
/* Last level cache ID of each logical CPU */
DEFINE_PER_CPU_READ_MOSTLY(u16, cpu_llc_id) = BAD_APICID;

+u16 get_llc_id(unsigned int cpu)
+{
+ return per_cpu(cpu_llc_id, cpu);
+}
+EXPORT_SYMBOL_GPL(get_llc_id);
+
/* correctly size the local cpu masks */
void __init setup_cpu_local_masks(void)
{

Subject: [tip: perf/core] perf/amd/uncore: Use free_percpu's built-in check for null

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 7987baceb9fe700a35b86e4a2f55b2d2d524d687
Gitweb: https://git.kernel.org/tip/7987baceb9fe700a35b86e4a2f55b2d2d524d687
Author: Kim Phillips <[email protected]>
AuthorDate: Tue, 17 Aug 2021 17:10:44 -05:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Fri, 20 Aug 2021 12:33:13 +02:00

perf/amd/uncore: Use free_percpu's built-in check for null

free_percpu() has its own check for null.

Signed-off-by: Kim Phillips <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/events/amd/uncore.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
index 582c0ff..05bdb4c 100644
--- a/arch/x86/events/amd/uncore.c
+++ b/arch/x86/events/amd/uncore.c
@@ -659,11 +659,9 @@ fail_prep:
fail_llc:
if (boot_cpu_has(X86_FEATURE_PERFCTR_NB))
perf_pmu_unregister(&amd_nb_pmu);
- if (amd_uncore_llc)
- free_percpu(amd_uncore_llc);
+ free_percpu(amd_uncore_llc);
fail_nb:
- if (amd_uncore_nb)
- free_percpu(amd_uncore_nb);
+ free_percpu(amd_uncore_nb);

return ret;
}

Subject: [tip: perf/core] perf/x86/amd/ibs: Add bitfield definitions in new header

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 492f4a3cf54cf39b5c2d682a1e6dd4ea9af2f4f2
Gitweb: https://git.kernel.org/tip/492f4a3cf54cf39b5c2d682a1e6dd4ea9af2f4f2
Author: Kim Phillips <[email protected]>
AuthorDate: Tue, 17 Aug 2021 17:10:48 -05:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Fri, 20 Aug 2021 12:33:14 +02:00

perf/x86/amd/ibs: Add bitfield definitions in new header

Add arch/x86/include/asm/amd-ibs.h with bitfield definitions for
IBS MSRs, and demonstrate usage within the driver.

Also move struct perf_ibs_data where it can be shared with
the perf tool that will soon be using it.

No functional changes.

Signed-off-by: Kim Phillips <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/events/amd/ibs.c | 23 ++----
arch/x86/include/asm/amd-ibs.h | 132 ++++++++++++++++++++++++++++++++-
2 files changed, 141 insertions(+), 14 deletions(-)
create mode 100644 arch/x86/include/asm/amd-ibs.h

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 222c890..4fc85cd 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -26,6 +26,7 @@ static u32 ibs_caps;
#include <linux/hardirq.h>

#include <asm/nmi.h>
+#include <asm/amd-ibs.h>

#define IBS_FETCH_CONFIG_MASK (IBS_FETCH_RAND_EN | IBS_FETCH_MAX_CNT)
#define IBS_OP_CONFIG_MASK IBS_OP_MAX_CNT
@@ -100,15 +101,6 @@ struct perf_ibs {
u64 (*get_count)(u64 config);
};

-struct perf_ibs_data {
- u32 size;
- union {
- u32 data[0]; /* data buffer starts here */
- u32 caps;
- };
- u64 regs[MSR_AMD64_IBS_REG_COUNT_MAX];
-};
-
static int
perf_event_set_period(struct hw_perf_event *hwc, u64 min, u64 max, u64 *hw_period)
{
@@ -329,11 +321,14 @@ static int perf_ibs_set_period(struct perf_ibs *perf_ibs,

static u64 get_ibs_fetch_count(u64 config)
{
- return (config & IBS_FETCH_CNT) >> 12;
+ union ibs_fetch_ctl fetch_ctl = (union ibs_fetch_ctl)config;
+
+ return fetch_ctl.fetch_cnt << 4;
}

static u64 get_ibs_op_count(u64 config)
{
+ union ibs_op_ctl op_ctl = (union ibs_op_ctl)config;
u64 count = 0;

/*
@@ -341,12 +336,12 @@ static u64 get_ibs_op_count(u64 config)
* and the lower 7 bits of CurCnt are randomized.
* Otherwise CurCnt has the full 27-bit current counter value.
*/
- if (config & IBS_OP_VAL) {
- count = (config & IBS_OP_MAX_CNT) << 4;
+ if (op_ctl.op_val) {
+ count = op_ctl.opmaxcnt << 4;
if (ibs_caps & IBS_CAPS_OPCNTEXT)
- count += config & IBS_OP_MAX_CNT_EXT_MASK;
+ count += op_ctl.opmaxcnt_ext << 20;
} else if (ibs_caps & IBS_CAPS_RDWROPCNT) {
- count = (config & IBS_OP_CUR_CNT) >> 32;
+ count = op_ctl.opcurcnt;
}

return count;
diff --git a/arch/x86/include/asm/amd-ibs.h b/arch/x86/include/asm/amd-ibs.h
new file mode 100644
index 0000000..46e1df4
--- /dev/null
+++ b/arch/x86/include/asm/amd-ibs.h
@@ -0,0 +1,132 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * From PPR Vol 1 for AMD Family 19h Model 01h B1
+ * 55898 Rev 0.35 - Feb 5, 2021
+ */
+
+#include <asm/msr-index.h>
+
+/*
+ * IBS Hardware MSRs
+ */
+
+/* MSR 0xc0011030: IBS Fetch Control */
+union ibs_fetch_ctl {
+ __u64 val;
+ struct {
+ __u64 fetch_maxcnt:16,/* 0-15: instruction fetch max. count */
+ fetch_cnt:16, /* 16-31: instruction fetch count */
+ fetch_lat:16, /* 32-47: instruction fetch latency */
+ fetch_en:1, /* 48: instruction fetch enable */
+ fetch_val:1, /* 49: instruction fetch valid */
+ fetch_comp:1, /* 50: instruction fetch complete */
+ ic_miss:1, /* 51: i-cache miss */
+ phy_addr_valid:1,/* 52: physical address valid */
+ l1tlb_pgsz:2, /* 53-54: i-cache L1TLB page size
+ * (needs IbsPhyAddrValid) */
+ l1tlb_miss:1, /* 55: i-cache fetch missed in L1TLB */
+ l2tlb_miss:1, /* 56: i-cache fetch missed in L2TLB */
+ rand_en:1, /* 57: random tagging enable */
+ fetch_l2_miss:1,/* 58: L2 miss for sampled fetch
+ * (needs IbsFetchComp) */
+ reserved:5; /* 59-63: reserved */
+ };
+};
+
+/* MSR 0xc0011033: IBS Execution Control */
+union ibs_op_ctl {
+ __u64 val;
+ struct {
+ __u64 opmaxcnt:16, /* 0-15: periodic op max. count */
+ reserved0:1, /* 16: reserved */
+ op_en:1, /* 17: op sampling enable */
+ op_val:1, /* 18: op sample valid */
+ cnt_ctl:1, /* 19: periodic op counter control */
+ opmaxcnt_ext:7, /* 20-26: upper 7 bits of periodic op maximum count */
+ reserved1:5, /* 27-31: reserved */
+ opcurcnt:27, /* 32-58: periodic op counter current count */
+ reserved2:5; /* 59-63: reserved */
+ };
+};
+
+/* MSR 0xc0011035: IBS Op Data 2 */
+union ibs_op_data {
+ __u64 val;
+ struct {
+ __u64 comp_to_ret_ctr:16, /* 0-15: op completion to retire count */
+ tag_to_ret_ctr:16, /* 15-31: op tag to retire count */
+ reserved1:2, /* 32-33: reserved */
+ op_return:1, /* 34: return op */
+ op_brn_taken:1, /* 35: taken branch op */
+ op_brn_misp:1, /* 36: mispredicted branch op */
+ op_brn_ret:1, /* 37: branch op retired */
+ op_rip_invalid:1, /* 38: RIP is invalid */
+ op_brn_fuse:1, /* 39: fused branch op */
+ op_microcode:1, /* 40: microcode op */
+ reserved2:23; /* 41-63: reserved */
+ };
+};
+
+/* MSR 0xc0011036: IBS Op Data 2 */
+union ibs_op_data2 {
+ __u64 val;
+ struct {
+ __u64 data_src:3, /* 0-2: data source */
+ reserved0:1, /* 3: reserved */
+ rmt_node:1, /* 4: destination node */
+ cache_hit_st:1, /* 5: cache hit state */
+ reserved1:57; /* 5-63: reserved */
+ };
+};
+
+/* MSR 0xc0011037: IBS Op Data 3 */
+union ibs_op_data3 {
+ __u64 val;
+ struct {
+ __u64 ld_op:1, /* 0: load op */
+ st_op:1, /* 1: store op */
+ dc_l1tlb_miss:1, /* 2: data cache L1TLB miss */
+ dc_l2tlb_miss:1, /* 3: data cache L2TLB hit in 2M page */
+ dc_l1tlb_hit_2m:1, /* 4: data cache L1TLB hit in 2M page */
+ dc_l1tlb_hit_1g:1, /* 5: data cache L1TLB hit in 1G page */
+ dc_l2tlb_hit_2m:1, /* 6: data cache L2TLB hit in 2M page */
+ dc_miss:1, /* 7: data cache miss */
+ dc_mis_acc:1, /* 8: misaligned access */
+ reserved:4, /* 9-12: reserved */
+ dc_wc_mem_acc:1, /* 13: write combining memory access */
+ dc_uc_mem_acc:1, /* 14: uncacheable memory access */
+ dc_locked_op:1, /* 15: locked operation */
+ dc_miss_no_mab_alloc:1, /* 16: DC miss with no MAB allocated */
+ dc_lin_addr_valid:1, /* 17: data cache linear address valid */
+ dc_phy_addr_valid:1, /* 18: data cache physical address valid */
+ dc_l2_tlb_hit_1g:1, /* 19: data cache L2 hit in 1GB page */
+ l2_miss:1, /* 20: L2 cache miss */
+ sw_pf:1, /* 21: software prefetch */
+ op_mem_width:4, /* 22-25: load/store size in bytes */
+ op_dc_miss_open_mem_reqs:6, /* 26-31: outstanding mem reqs on DC fill */
+ dc_miss_lat:16, /* 32-47: data cache miss latency */
+ tlb_refill_lat:16; /* 48-63: L1 TLB refill latency */
+ };
+};
+
+/* MSR 0xc001103c: IBS Fetch Control Extended */
+union ic_ibs_extd_ctl {
+ __u64 val;
+ struct {
+ __u64 itlb_refill_lat:16, /* 0-15: ITLB Refill latency for sampled fetch */
+ reserved:48; /* 16-63: reserved */
+ };
+};
+
+/*
+ * IBS driver related
+ */
+
+struct perf_ibs_data {
+ u32 size;
+ union {
+ u32 data[0]; /* data buffer starts here */
+ u32 caps;
+ };
+ u64 regs[MSR_AMD64_IBS_REG_COUNT_MAX];
+};

Subject: [tip: perf/core] perf/x86/amd/power: Assign pmu.module

The following commit has been merged into the perf/core branch of tip:

Commit-ID: b159f2ed7a712ae24b22414cda22ed93db7033bb
Gitweb: https://git.kernel.org/tip/b159f2ed7a712ae24b22414cda22ed93db7033bb
Author: Kim Phillips <[email protected]>
AuthorDate: Tue, 17 Aug 2021 17:10:43 -05:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Fri, 20 Aug 2021 12:33:12 +02:00

perf/x86/amd/power: Assign pmu.module

Assign pmu.module so the driver can't be unloaded whilst in use.

Signed-off-by: Kim Phillips <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/events/amd/power.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/events/amd/power.c b/arch/x86/events/amd/power.c
index 16a2369..37d5b38 100644
--- a/arch/x86/events/amd/power.c
+++ b/arch/x86/events/amd/power.c
@@ -213,6 +213,7 @@ static struct pmu pmu_class = {
.stop = pmu_event_stop,
.read = pmu_event_read,
.capabilities = PERF_PMU_CAP_NO_EXCLUDE,
+ .module = THIS_MODULE,
};

static int power_cpu_exit(unsigned int cpu)

Subject: [tip: perf/core] perf/x86/amd/ibs: Add workaround for erratum #1,197

The following commit has been merged into the perf/core branch of tip:

Commit-ID: ba02a6dc5693d1db817850f4ba5602d003d0cefb
Gitweb: https://git.kernel.org/tip/ba02a6dc5693d1db817850f4ba5602d003d0cefb
Author: Kim Phillips <[email protected]>
AuthorDate: Tue, 17 Aug 2021 17:10:42 -05:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Fri, 20 Aug 2021 12:33:12 +02:00

perf/x86/amd/ibs: Add workaround for erratum #1,197

Erratum #1197 "IBS (Instruction Based Sampling) Register State May be
Incorrect After Restore From CC6" is published in a document:

"Revision Guide for AMD Family 19h Models 00h-0Fh Processors" 56683 Rev. 1.04 July 2021

https://bugzilla.kernel.org/show_bug.cgi?id=206537

Implement the erratum's suggested workaround and ignore IBS samples if
MSRC001_1031 == 0.

Signed-off-by: Kim Phillips <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/events/amd/ibs.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 8c25fbd..222c890 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -90,6 +90,7 @@ struct perf_ibs {
unsigned long offset_mask[1];
int offset_max;
unsigned int fetch_count_reset_broken : 1;
+ unsigned int fetch_ignore_if_zero_rip : 1;
struct cpu_perf_ibs __percpu *pcpu;

struct attribute **format_attrs;
@@ -673,6 +674,10 @@ fail:
if (check_rip && (ibs_data.regs[2] & IBS_RIP_INVALID)) {
regs.flags &= ~PERF_EFLAGS_EXACT;
} else {
+ /* Workaround for erratum #1,197 */
+ if (perf_ibs->fetch_ignore_if_zero_rip && !(ibs_data.regs[1]))
+ goto out;
+
set_linear_ip(&regs, ibs_data.regs[1]);
regs.flags |= PERF_EFLAGS_EXACT;
}
@@ -770,6 +775,9 @@ static __init void perf_event_ibs_init(void)
if (boot_cpu_data.x86 >= 0x16 && boot_cpu_data.x86 <= 0x18)
perf_ibs_fetch.fetch_count_reset_broken = 1;

+ if (boot_cpu_data.x86 == 0x19 && boot_cpu_data.x86_model < 0x10)
+ perf_ibs_fetch.fetch_ignore_if_zero_rip = 1;
+
perf_ibs_pmu_init(&perf_ibs_fetch, "ibs_fetch");

if (ibs_caps & IBS_CAPS_OPCNT) {

Subject: [tip: perf/core] perf/amd/uncore: Allow the driver to be built as a module

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 50b46ab90f7c43a1adcc9099211b7447818d446e
Gitweb: https://git.kernel.org/tip/50b46ab90f7c43a1adcc9099211b7447818d446e
Author: Kim Phillips <[email protected]>
AuthorDate: Tue, 17 Aug 2021 17:10:47 -05:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Fri, 20 Aug 2021 12:33:13 +02:00

perf/amd/uncore: Allow the driver to be built as a module

Add support to build the AMD uncore driver as a module.
This is in order to facilitate development.

Signed-off-by: Kim Phillips <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/events/Kconfig | 10 ++++++++++
arch/x86/events/amd/Makefile | 5 +++--
arch/x86/events/amd/uncore.c | 28 +++++++++++++++++++++++++++-
3 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/Kconfig b/arch/x86/events/Kconfig
index 39d9ded..d6cdfe6 100644
--- a/arch/x86/events/Kconfig
+++ b/arch/x86/events/Kconfig
@@ -34,4 +34,14 @@ config PERF_EVENTS_AMD_POWER
(CPUID Fn8000_0007_EDX[12]) interface to calculate the
average power consumption on Family 15h processors.

+config PERF_EVENTS_AMD_UNCORE
+ tristate "AMD Uncore performance events"
+ depends on PERF_EVENTS && CPU_SUP_AMD
+ default y
+ help
+ Include support for AMD uncore performance events for use with
+ e.g., perf stat -e amd_l3/.../,amd_df/.../.
+
+ To compile this driver as a module, choose M here: the
+ module will be called 'amd-uncore'.
endmenu
diff --git a/arch/x86/events/amd/Makefile b/arch/x86/events/amd/Makefile
index fe8795a..6cbe38d 100644
--- a/arch/x86/events/amd/Makefile
+++ b/arch/x86/events/amd/Makefile
@@ -1,8 +1,9 @@
# SPDX-License-Identifier: GPL-2.0
-obj-$(CONFIG_CPU_SUP_AMD) += core.o uncore.o
+obj-$(CONFIG_CPU_SUP_AMD) += core.o
obj-$(CONFIG_PERF_EVENTS_AMD_POWER) += power.o
obj-$(CONFIG_X86_LOCAL_APIC) += ibs.o
+obj-$(CONFIG_PERF_EVENTS_AMD_UNCORE) += amd-uncore.o
+amd-uncore-objs := uncore.o
ifdef CONFIG_AMD_IOMMU
obj-$(CONFIG_CPU_SUP_AMD) += iommu.o
endif
-
diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
index a01f9f1..0d04414 100644
--- a/arch/x86/events/amd/uncore.c
+++ b/arch/x86/events/amd/uncore.c
@@ -347,6 +347,7 @@ static struct pmu amd_nb_pmu = {
.stop = amd_uncore_stop,
.read = amd_uncore_read,
.capabilities = PERF_PMU_CAP_NO_EXCLUDE | PERF_PMU_CAP_NO_INTERRUPT,
+ .module = THIS_MODULE,
};

static struct pmu amd_llc_pmu = {
@@ -360,6 +361,7 @@ static struct pmu amd_llc_pmu = {
.stop = amd_uncore_stop,
.read = amd_uncore_read,
.capabilities = PERF_PMU_CAP_NO_EXCLUDE | PERF_PMU_CAP_NO_INTERRUPT,
+ .module = THIS_MODULE,
};

static struct amd_uncore *amd_uncore_alloc(unsigned int cpu)
@@ -665,4 +667,28 @@ fail_nb:

return ret;
}
-device_initcall(amd_uncore_init);
+
+static void __exit amd_uncore_exit(void)
+{
+ cpuhp_remove_state(CPUHP_AP_PERF_X86_AMD_UNCORE_ONLINE);
+ cpuhp_remove_state(CPUHP_AP_PERF_X86_AMD_UNCORE_STARTING);
+ cpuhp_remove_state(CPUHP_PERF_X86_AMD_UNCORE_PREP);
+
+ if (boot_cpu_has(X86_FEATURE_PERFCTR_LLC)) {
+ perf_pmu_unregister(&amd_llc_pmu);
+ free_percpu(amd_uncore_llc);
+ amd_uncore_llc = NULL;
+ }
+
+ if (boot_cpu_has(X86_FEATURE_PERFCTR_NB)) {
+ perf_pmu_unregister(&amd_nb_pmu);
+ free_percpu(amd_uncore_nb);
+ amd_uncore_nb = NULL;
+ }
+}
+
+module_init(amd_uncore_init);
+module_exit(amd_uncore_exit);
+
+MODULE_DESCRIPTION("AMD Uncore Driver");
+MODULE_LICENSE("GPL v2");

Subject: [tip: perf/urgent] perf/x86/amd/power: Assign pmu.module

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID: ccf26483416a339c114409f6e7cd02abdeaf8052
Gitweb: https://git.kernel.org/tip/ccf26483416a339c114409f6e7cd02abdeaf8052
Author: Kim Phillips <[email protected]>
AuthorDate: Tue, 17 Aug 2021 17:10:43 -05:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Thu, 26 Aug 2021 09:12:57 +02:00

perf/x86/amd/power: Assign pmu.module

Assign pmu.module so the driver can't be unloaded whilst in use.

Signed-off-by: Kim Phillips <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/events/amd/power.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/events/amd/power.c b/arch/x86/events/amd/power.c
index 16a2369..37d5b38 100644
--- a/arch/x86/events/amd/power.c
+++ b/arch/x86/events/amd/power.c
@@ -213,6 +213,7 @@ static struct pmu pmu_class = {
.stop = pmu_event_stop,
.read = pmu_event_read,
.capabilities = PERF_PMU_CAP_NO_EXCLUDE,
+ .module = THIS_MODULE,
};

static int power_cpu_exit(unsigned int cpu)

Subject: [tip: perf/urgent] perf/x86/amd/ibs: Work around erratum #1197

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID: 26db2e0c51fe83e1dd852c1321407835b481806e
Gitweb: https://git.kernel.org/tip/26db2e0c51fe83e1dd852c1321407835b481806e
Author: Kim Phillips <[email protected]>
AuthorDate: Tue, 17 Aug 2021 17:10:42 -05:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Thu, 26 Aug 2021 08:58:02 +02:00

perf/x86/amd/ibs: Work around erratum #1197

Erratum #1197 "IBS (Instruction Based Sampling) Register State May be
Incorrect After Restore From CC6" is published in a document:

"Revision Guide for AMD Family 19h Models 00h-0Fh Processors" 56683 Rev. 1.04 July 2021

https://bugzilla.kernel.org/show_bug.cgi?id=206537

Implement the erratum's suggested workaround and ignore IBS samples if
MSRC001_1031 == 0.

Signed-off-by: Kim Phillips <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/events/amd/ibs.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 40669ea..921f47b 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -90,6 +90,7 @@ struct perf_ibs {
unsigned long offset_mask[1];
int offset_max;
unsigned int fetch_count_reset_broken : 1;
+ unsigned int fetch_ignore_if_zero_rip : 1;
struct cpu_perf_ibs __percpu *pcpu;

struct attribute **format_attrs;
@@ -672,6 +673,10 @@ fail:
if (check_rip && (ibs_data.regs[2] & IBS_RIP_INVALID)) {
regs.flags &= ~PERF_EFLAGS_EXACT;
} else {
+ /* Workaround for erratum #1197 */
+ if (perf_ibs->fetch_ignore_if_zero_rip && !(ibs_data.regs[1]))
+ goto out;
+
set_linear_ip(&regs, ibs_data.regs[1]);
regs.flags |= PERF_EFLAGS_EXACT;
}
@@ -769,6 +774,9 @@ static __init void perf_event_ibs_init(void)
if (boot_cpu_data.x86 >= 0x16 && boot_cpu_data.x86 <= 0x18)
perf_ibs_fetch.fetch_count_reset_broken = 1;

+ if (boot_cpu_data.x86 == 0x19 && boot_cpu_data.x86_model < 0x10)
+ perf_ibs_fetch.fetch_ignore_if_zero_rip = 1;
+
perf_ibs_pmu_init(&perf_ibs_fetch, "ibs_fetch");

if (ibs_caps & IBS_CAPS_OPCNT) {

Subject: [tip: perf/core] x86/cpu: Add get_llc_id() helper function

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 9164d9493a792682143af12b182be12d7c32b195
Gitweb: https://git.kernel.org/tip/9164d9493a792682143af12b182be12d7c32b195
Author: Kim Phillips <[email protected]>
AuthorDate: Tue, 17 Aug 2021 17:10:46 -05:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Thu, 26 Aug 2021 09:14:36 +02:00

x86/cpu: Add get_llc_id() helper function

Factor out a helper function rather than export cpu_llc_id, which is
needed in order to be able to build the AMD uncore driver as a module.

Signed-off-by: Kim Phillips <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/events/amd/uncore.c | 2 +-
arch/x86/include/asm/processor.h | 2 ++
arch/x86/kernel/cpu/amd.c | 2 +-
arch/x86/kernel/cpu/common.c | 6 ++++++
4 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
index 7fb50ad..a01f9f1 100644
--- a/arch/x86/events/amd/uncore.c
+++ b/arch/x86/events/amd/uncore.c
@@ -452,7 +452,7 @@ static int amd_uncore_cpu_starting(unsigned int cpu)

if (amd_uncore_llc) {
uncore = *per_cpu_ptr(amd_uncore_llc, cpu);
- uncore->id = per_cpu(cpu_llc_id, cpu);
+ uncore->id = get_llc_id(cpu);

uncore = amd_uncore_find_online_sibling(uncore, amd_uncore_llc);
*per_cpu_ptr(amd_uncore_llc, cpu) = uncore;
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index f3020c5..33dd157 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -795,6 +795,8 @@ extern int set_tsc_mode(unsigned int val);

DECLARE_PER_CPU(u64, msr_misc_features_shadow);

+extern u16 get_llc_id(unsigned int cpu);
+
#ifdef CONFIG_CPU_SUP_AMD
extern u32 amd_get_nodes_per_socket(void);
extern u32 amd_get_highest_perf(void);
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index b7c0030..2131af9 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -438,7 +438,7 @@ static void srat_detect_node(struct cpuinfo_x86 *c)

node = numa_cpu_node(cpu);
if (node == NUMA_NO_NODE)
- node = per_cpu(cpu_llc_id, cpu);
+ node = get_llc_id(cpu);

/*
* On multi-fabric platform (e.g. Numascale NumaChip) a
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 64b805b..0f88859 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -79,6 +79,12 @@ EXPORT_SYMBOL(smp_num_siblings);
/* Last level cache ID of each logical CPU */
DEFINE_PER_CPU_READ_MOSTLY(u16, cpu_llc_id) = BAD_APICID;

+u16 get_llc_id(unsigned int cpu)
+{
+ return per_cpu(cpu_llc_id, cpu);
+}
+EXPORT_SYMBOL_GPL(get_llc_id);
+
/* correctly size the local cpu masks */
void __init setup_cpu_local_masks(void)
{

Subject: [tip: perf/core] perf/amd/uncore: Allow the driver to be built as a module

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 05485745ad482c1910a45f23a5c255f6a0df0f46
Gitweb: https://git.kernel.org/tip/05485745ad482c1910a45f23a5c255f6a0df0f46
Author: Kim Phillips <[email protected]>
AuthorDate: Tue, 17 Aug 2021 17:10:47 -05:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Thu, 26 Aug 2021 09:14:36 +02:00

perf/amd/uncore: Allow the driver to be built as a module

Add support to build the AMD uncore driver as a module.

This is in order to facilitate development without having
to reboot the kernel in most cases.

Signed-off-by: Kim Phillips <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/events/Kconfig | 10 ++++++++++
arch/x86/events/amd/Makefile | 5 +++--
arch/x86/events/amd/uncore.c | 28 +++++++++++++++++++++++++++-
3 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/Kconfig b/arch/x86/events/Kconfig
index 39d9ded..d6cdfe6 100644
--- a/arch/x86/events/Kconfig
+++ b/arch/x86/events/Kconfig
@@ -34,4 +34,14 @@ config PERF_EVENTS_AMD_POWER
(CPUID Fn8000_0007_EDX[12]) interface to calculate the
average power consumption on Family 15h processors.

+config PERF_EVENTS_AMD_UNCORE
+ tristate "AMD Uncore performance events"
+ depends on PERF_EVENTS && CPU_SUP_AMD
+ default y
+ help
+ Include support for AMD uncore performance events for use with
+ e.g., perf stat -e amd_l3/.../,amd_df/.../.
+
+ To compile this driver as a module, choose M here: the
+ module will be called 'amd-uncore'.
endmenu
diff --git a/arch/x86/events/amd/Makefile b/arch/x86/events/amd/Makefile
index fe8795a..6cbe38d 100644
--- a/arch/x86/events/amd/Makefile
+++ b/arch/x86/events/amd/Makefile
@@ -1,8 +1,9 @@
# SPDX-License-Identifier: GPL-2.0
-obj-$(CONFIG_CPU_SUP_AMD) += core.o uncore.o
+obj-$(CONFIG_CPU_SUP_AMD) += core.o
obj-$(CONFIG_PERF_EVENTS_AMD_POWER) += power.o
obj-$(CONFIG_X86_LOCAL_APIC) += ibs.o
+obj-$(CONFIG_PERF_EVENTS_AMD_UNCORE) += amd-uncore.o
+amd-uncore-objs := uncore.o
ifdef CONFIG_AMD_IOMMU
obj-$(CONFIG_CPU_SUP_AMD) += iommu.o
endif
-
diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
index a01f9f1..0d04414 100644
--- a/arch/x86/events/amd/uncore.c
+++ b/arch/x86/events/amd/uncore.c
@@ -347,6 +347,7 @@ static struct pmu amd_nb_pmu = {
.stop = amd_uncore_stop,
.read = amd_uncore_read,
.capabilities = PERF_PMU_CAP_NO_EXCLUDE | PERF_PMU_CAP_NO_INTERRUPT,
+ .module = THIS_MODULE,
};

static struct pmu amd_llc_pmu = {
@@ -360,6 +361,7 @@ static struct pmu amd_llc_pmu = {
.stop = amd_uncore_stop,
.read = amd_uncore_read,
.capabilities = PERF_PMU_CAP_NO_EXCLUDE | PERF_PMU_CAP_NO_INTERRUPT,
+ .module = THIS_MODULE,
};

static struct amd_uncore *amd_uncore_alloc(unsigned int cpu)
@@ -665,4 +667,28 @@ fail_nb:

return ret;
}
-device_initcall(amd_uncore_init);
+
+static void __exit amd_uncore_exit(void)
+{
+ cpuhp_remove_state(CPUHP_AP_PERF_X86_AMD_UNCORE_ONLINE);
+ cpuhp_remove_state(CPUHP_AP_PERF_X86_AMD_UNCORE_STARTING);
+ cpuhp_remove_state(CPUHP_PERF_X86_AMD_UNCORE_PREP);
+
+ if (boot_cpu_has(X86_FEATURE_PERFCTR_LLC)) {
+ perf_pmu_unregister(&amd_llc_pmu);
+ free_percpu(amd_uncore_llc);
+ amd_uncore_llc = NULL;
+ }
+
+ if (boot_cpu_has(X86_FEATURE_PERFCTR_NB)) {
+ perf_pmu_unregister(&amd_nb_pmu);
+ free_percpu(amd_uncore_nb);
+ amd_uncore_nb = NULL;
+ }
+}
+
+module_init(amd_uncore_init);
+module_exit(amd_uncore_exit);
+
+MODULE_DESCRIPTION("AMD Uncore Driver");
+MODULE_LICENSE("GPL v2");

Subject: [tip: perf/core] perf/amd/uncore: Clean up header use, use <linux/ include paths instead of <asm/

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 0a0b53e0c3793c0930d258786702d48d21fc6383
Gitweb: https://git.kernel.org/tip/0a0b53e0c3793c0930d258786702d48d21fc6383
Author: Kim Phillips <[email protected]>
AuthorDate: Tue, 17 Aug 2021 17:10:45 -05:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Thu, 26 Aug 2021 09:14:36 +02:00

perf/amd/uncore: Clean up header use, use <linux/ include paths instead of <asm/

Found by checkpatch.

Signed-off-by: Kim Phillips <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/events/amd/uncore.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
index 05bdb4c..7fb50ad 100644
--- a/arch/x86/events/amd/uncore.c
+++ b/arch/x86/events/amd/uncore.c
@@ -12,11 +12,11 @@
#include <linux/init.h>
#include <linux/cpu.h>
#include <linux/cpumask.h>
+#include <linux/cpufeature.h>
+#include <linux/smp.h>

-#include <asm/cpufeature.h>
#include <asm/perf_event.h>
#include <asm/msr.h>
-#include <asm/smp.h>

#define NUM_COUNTERS_NB 4
#define NUM_COUNTERS_L2 4

Subject: [tip: perf/core] perf/x86/amd/ibs: Add bitfield definitions in new <asm/amd-ibs.h> header

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 6a371bafe613b7746c3d3ac486bdb3035f77e029
Gitweb: https://git.kernel.org/tip/6a371bafe613b7746c3d3ac486bdb3035f77e029
Author: Kim Phillips <[email protected]>
AuthorDate: Tue, 17 Aug 2021 17:10:48 -05:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Thu, 26 Aug 2021 09:14:36 +02:00

perf/x86/amd/ibs: Add bitfield definitions in new <asm/amd-ibs.h> header

Add <asm/amd-ibs.h> with bitfield definitions for IBS MSRs,
and demonstrate usage within the driver.

Also move 'struct perf_ibs_data' where it can be shared with
the perf tool that will soon be using it.

No functional changes.

Signed-off-by: Kim Phillips <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/events/amd/ibs.c | 23 ++----
arch/x86/include/asm/amd-ibs.h | 132 ++++++++++++++++++++++++++++++++-
2 files changed, 141 insertions(+), 14 deletions(-)
create mode 100644 arch/x86/include/asm/amd-ibs.h

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index ccc9ee1..9739019 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -26,6 +26,7 @@ static u32 ibs_caps;
#include <linux/hardirq.h>

#include <asm/nmi.h>
+#include <asm/amd-ibs.h>

#define IBS_FETCH_CONFIG_MASK (IBS_FETCH_RAND_EN | IBS_FETCH_MAX_CNT)
#define IBS_OP_CONFIG_MASK IBS_OP_MAX_CNT
@@ -100,15 +101,6 @@ struct perf_ibs {
u64 (*get_count)(u64 config);
};

-struct perf_ibs_data {
- u32 size;
- union {
- u32 data[0]; /* data buffer starts here */
- u32 caps;
- };
- u64 regs[MSR_AMD64_IBS_REG_COUNT_MAX];
-};
-
static int
perf_event_set_period(struct hw_perf_event *hwc, u64 min, u64 max, u64 *hw_period)
{
@@ -329,11 +321,14 @@ static int perf_ibs_set_period(struct perf_ibs *perf_ibs,

static u64 get_ibs_fetch_count(u64 config)
{
- return (config & IBS_FETCH_CNT) >> 12;
+ union ibs_fetch_ctl fetch_ctl = (union ibs_fetch_ctl)config;
+
+ return fetch_ctl.fetch_cnt << 4;
}

static u64 get_ibs_op_count(u64 config)
{
+ union ibs_op_ctl op_ctl = (union ibs_op_ctl)config;
u64 count = 0;

/*
@@ -341,12 +336,12 @@ static u64 get_ibs_op_count(u64 config)
* and the lower 7 bits of CurCnt are randomized.
* Otherwise CurCnt has the full 27-bit current counter value.
*/
- if (config & IBS_OP_VAL) {
- count = (config & IBS_OP_MAX_CNT) << 4;
+ if (op_ctl.op_val) {
+ count = op_ctl.opmaxcnt << 4;
if (ibs_caps & IBS_CAPS_OPCNTEXT)
- count += config & IBS_OP_MAX_CNT_EXT_MASK;
+ count += op_ctl.opmaxcnt_ext << 20;
} else if (ibs_caps & IBS_CAPS_RDWROPCNT) {
- count = (config & IBS_OP_CUR_CNT) >> 32;
+ count = op_ctl.opcurcnt;
}

return count;
diff --git a/arch/x86/include/asm/amd-ibs.h b/arch/x86/include/asm/amd-ibs.h
new file mode 100644
index 0000000..46e1df4
--- /dev/null
+++ b/arch/x86/include/asm/amd-ibs.h
@@ -0,0 +1,132 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * From PPR Vol 1 for AMD Family 19h Model 01h B1
+ * 55898 Rev 0.35 - Feb 5, 2021
+ */
+
+#include <asm/msr-index.h>
+
+/*
+ * IBS Hardware MSRs
+ */
+
+/* MSR 0xc0011030: IBS Fetch Control */
+union ibs_fetch_ctl {
+ __u64 val;
+ struct {
+ __u64 fetch_maxcnt:16,/* 0-15: instruction fetch max. count */
+ fetch_cnt:16, /* 16-31: instruction fetch count */
+ fetch_lat:16, /* 32-47: instruction fetch latency */
+ fetch_en:1, /* 48: instruction fetch enable */
+ fetch_val:1, /* 49: instruction fetch valid */
+ fetch_comp:1, /* 50: instruction fetch complete */
+ ic_miss:1, /* 51: i-cache miss */
+ phy_addr_valid:1,/* 52: physical address valid */
+ l1tlb_pgsz:2, /* 53-54: i-cache L1TLB page size
+ * (needs IbsPhyAddrValid) */
+ l1tlb_miss:1, /* 55: i-cache fetch missed in L1TLB */
+ l2tlb_miss:1, /* 56: i-cache fetch missed in L2TLB */
+ rand_en:1, /* 57: random tagging enable */
+ fetch_l2_miss:1,/* 58: L2 miss for sampled fetch
+ * (needs IbsFetchComp) */
+ reserved:5; /* 59-63: reserved */
+ };
+};
+
+/* MSR 0xc0011033: IBS Execution Control */
+union ibs_op_ctl {
+ __u64 val;
+ struct {
+ __u64 opmaxcnt:16, /* 0-15: periodic op max. count */
+ reserved0:1, /* 16: reserved */
+ op_en:1, /* 17: op sampling enable */
+ op_val:1, /* 18: op sample valid */
+ cnt_ctl:1, /* 19: periodic op counter control */
+ opmaxcnt_ext:7, /* 20-26: upper 7 bits of periodic op maximum count */
+ reserved1:5, /* 27-31: reserved */
+ opcurcnt:27, /* 32-58: periodic op counter current count */
+ reserved2:5; /* 59-63: reserved */
+ };
+};
+
+/* MSR 0xc0011035: IBS Op Data 2 */
+union ibs_op_data {
+ __u64 val;
+ struct {
+ __u64 comp_to_ret_ctr:16, /* 0-15: op completion to retire count */
+ tag_to_ret_ctr:16, /* 15-31: op tag to retire count */
+ reserved1:2, /* 32-33: reserved */
+ op_return:1, /* 34: return op */
+ op_brn_taken:1, /* 35: taken branch op */
+ op_brn_misp:1, /* 36: mispredicted branch op */
+ op_brn_ret:1, /* 37: branch op retired */
+ op_rip_invalid:1, /* 38: RIP is invalid */
+ op_brn_fuse:1, /* 39: fused branch op */
+ op_microcode:1, /* 40: microcode op */
+ reserved2:23; /* 41-63: reserved */
+ };
+};
+
+/* MSR 0xc0011036: IBS Op Data 2 */
+union ibs_op_data2 {
+ __u64 val;
+ struct {
+ __u64 data_src:3, /* 0-2: data source */
+ reserved0:1, /* 3: reserved */
+ rmt_node:1, /* 4: destination node */
+ cache_hit_st:1, /* 5: cache hit state */
+ reserved1:57; /* 5-63: reserved */
+ };
+};
+
+/* MSR 0xc0011037: IBS Op Data 3 */
+union ibs_op_data3 {
+ __u64 val;
+ struct {
+ __u64 ld_op:1, /* 0: load op */
+ st_op:1, /* 1: store op */
+ dc_l1tlb_miss:1, /* 2: data cache L1TLB miss */
+ dc_l2tlb_miss:1, /* 3: data cache L2TLB hit in 2M page */
+ dc_l1tlb_hit_2m:1, /* 4: data cache L1TLB hit in 2M page */
+ dc_l1tlb_hit_1g:1, /* 5: data cache L1TLB hit in 1G page */
+ dc_l2tlb_hit_2m:1, /* 6: data cache L2TLB hit in 2M page */
+ dc_miss:1, /* 7: data cache miss */
+ dc_mis_acc:1, /* 8: misaligned access */
+ reserved:4, /* 9-12: reserved */
+ dc_wc_mem_acc:1, /* 13: write combining memory access */
+ dc_uc_mem_acc:1, /* 14: uncacheable memory access */
+ dc_locked_op:1, /* 15: locked operation */
+ dc_miss_no_mab_alloc:1, /* 16: DC miss with no MAB allocated */
+ dc_lin_addr_valid:1, /* 17: data cache linear address valid */
+ dc_phy_addr_valid:1, /* 18: data cache physical address valid */
+ dc_l2_tlb_hit_1g:1, /* 19: data cache L2 hit in 1GB page */
+ l2_miss:1, /* 20: L2 cache miss */
+ sw_pf:1, /* 21: software prefetch */
+ op_mem_width:4, /* 22-25: load/store size in bytes */
+ op_dc_miss_open_mem_reqs:6, /* 26-31: outstanding mem reqs on DC fill */
+ dc_miss_lat:16, /* 32-47: data cache miss latency */
+ tlb_refill_lat:16; /* 48-63: L1 TLB refill latency */
+ };
+};
+
+/* MSR 0xc001103c: IBS Fetch Control Extended */
+union ic_ibs_extd_ctl {
+ __u64 val;
+ struct {
+ __u64 itlb_refill_lat:16, /* 0-15: ITLB Refill latency for sampled fetch */
+ reserved:48; /* 16-63: reserved */
+ };
+};
+
+/*
+ * IBS driver related
+ */
+
+struct perf_ibs_data {
+ u32 size;
+ union {
+ u32 data[0]; /* data buffer starts here */
+ u32 caps;
+ };
+ u64 regs[MSR_AMD64_IBS_REG_COUNT_MAX];
+};

Subject: [tip: perf/core] perf/amd/uncore: Simplify code, use free_percpu()'s built-in check for NULL

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 6cf295b21608f9253037335f47cd0dfcce812d81
Gitweb: https://git.kernel.org/tip/6cf295b21608f9253037335f47cd0dfcce812d81
Author: Kim Phillips <[email protected]>
AuthorDate: Tue, 17 Aug 2021 17:10:44 -05:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Thu, 26 Aug 2021 09:14:36 +02:00

perf/amd/uncore: Simplify code, use free_percpu()'s built-in check for NULL

free_percpu() has its own check for NULL, no need to open-code it.

Signed-off-by: Kim Phillips <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/events/amd/uncore.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
index 582c0ff..05bdb4c 100644
--- a/arch/x86/events/amd/uncore.c
+++ b/arch/x86/events/amd/uncore.c
@@ -659,11 +659,9 @@ fail_prep:
fail_llc:
if (boot_cpu_has(X86_FEATURE_PERFCTR_NB))
perf_pmu_unregister(&amd_nb_pmu);
- if (amd_uncore_llc)
- free_percpu(amd_uncore_llc);
+ free_percpu(amd_uncore_llc);
fail_nb:
- if (amd_uncore_nb)
- free_percpu(amd_uncore_nb);
+ free_percpu(amd_uncore_nb);

return ret;
}