This erratum is found when system hang reported by internal
benchmarking team while they were trying ThunderX2 PMU uncore
events along with cpu/memory intensive applications like stream,
SPECjbb, SPECInt, etc.
The workaround is to disable event multiplexing.
The current Perf core does not provide any provision to PMUs
to disable event multiplexing. In first patch, adding PMU
capability to disable event multiplexing in perf core.
In second patch, setting the capability to disable for
the ThunderX2 UNCORE PMUs.
Ganapatrao Prabhakerrao Kulkarni (2):
perf/core: Adding capability to disable PMUs event multiplexing
Thunderx2, uncore: Add workaround for ThunderX2 erratum 221
Documentation/admin-guide/perf/thunderx2-pmu.rst | 9 +++++++++
drivers/perf/thunderx2_pmu.c | 3 ++-
include/linux/perf_event.h | 1 +
kernel/events/core.c | 8 ++++++++
4 files changed, 20 insertions(+), 1 deletion(-)
--
2.17.1
When perf tried with more events than the PMU supported counters, the perf
core uses event multiplexing to accommodate all events. This results in
burst of PMU register read and writes and causes the system hang, when
executed along with the CPU intensive applications.
Adding software workaround by disabling event multiplexing.
Signed-off-by: Ganapatrao Prabhakerrao Kulkarni <[email protected]>
---
Documentation/admin-guide/perf/thunderx2-pmu.rst | 9 +++++++++
drivers/perf/thunderx2_pmu.c | 3 ++-
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/perf/thunderx2-pmu.rst b/Documentation/admin-guide/perf/thunderx2-pmu.rst
index 08e33675853a..fff65382c887 100644
--- a/Documentation/admin-guide/perf/thunderx2-pmu.rst
+++ b/Documentation/admin-guide/perf/thunderx2-pmu.rst
@@ -40,3 +40,12 @@ Examples::
uncore_l3c_0/read_hit/,\
uncore_l3c_0/inv_request/,\
uncore_l3c_0/inv_hit/ sleep 1
+
+ThunderX2 erratum 221:
+When perf tried with more events than the PMU supported counters, the perf core
+uses event multiplexing to accommodate all events. This results in burst of PMU
+registers read and write and leading to system hang when executed along with
+CPU intensive applications.
+
+
+Disabling PMUs event multiplexing capability.
diff --git a/drivers/perf/thunderx2_pmu.c b/drivers/perf/thunderx2_pmu.c
index 43d76c85da56..c443be8bd449 100644
--- a/drivers/perf/thunderx2_pmu.c
+++ b/drivers/perf/thunderx2_pmu.c
@@ -563,7 +563,8 @@ static int tx2_uncore_pmu_register(
.start = tx2_uncore_event_start,
.stop = tx2_uncore_event_stop,
.read = tx2_uncore_event_read,
- .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE |
+ PERF_PMU_CAP_NO_MUX_EVENTS,
};
tx2_pmu->pmu.name = devm_kasprintf(dev, GFP_KERNEL,
--
2.17.1
On Wed, Nov 06, 2019 at 01:01:41AM +0000, Ganapatrao Prabhakerrao Kulkarni wrote:
> When perf tried with more events than the PMU supported counters, the perf
> core uses event multiplexing to accommodate all events. This results in
> burst of PMU register read and writes and causes the system hang, when
> executed along with the CPU intensive applications.
Can you please elaborate on how a burst of PMU reads/writes leads to a
hang?
I see the PMU counts DMC/L3C events -- does this occur under heavy /cpu/
load, or heavy /memory/ load?
Does this only happen with a specific timing of reads/writes, or is it
always possible that accessing the PMU can trigger a lockup, and it's
just more likely when the PMU is accessed more often?
Thanks,
Mark.
>
> Adding software workaround by disabling event multiplexing.
>
> Signed-off-by: Ganapatrao Prabhakerrao Kulkarni <[email protected]>
> ---
> Documentation/admin-guide/perf/thunderx2-pmu.rst | 9 +++++++++
> drivers/perf/thunderx2_pmu.c | 3 ++-
> 2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/admin-guide/perf/thunderx2-pmu.rst b/Documentation/admin-guide/perf/thunderx2-pmu.rst
> index 08e33675853a..fff65382c887 100644
> --- a/Documentation/admin-guide/perf/thunderx2-pmu.rst
> +++ b/Documentation/admin-guide/perf/thunderx2-pmu.rst
> @@ -40,3 +40,12 @@ Examples::
> uncore_l3c_0/read_hit/,\
> uncore_l3c_0/inv_request/,\
> uncore_l3c_0/inv_hit/ sleep 1
> +
> +ThunderX2 erratum 221:
> +When perf tried with more events than the PMU supported counters, the perf core
> +uses event multiplexing to accommodate all events. This results in burst of PMU
> +registers read and write and leading to system hang when executed along with
> +CPU intensive applications.
> +
> +
> +Disabling PMUs event multiplexing capability.
> diff --git a/drivers/perf/thunderx2_pmu.c b/drivers/perf/thunderx2_pmu.c
> index 43d76c85da56..c443be8bd449 100644
> --- a/drivers/perf/thunderx2_pmu.c
> +++ b/drivers/perf/thunderx2_pmu.c
> @@ -563,7 +563,8 @@ static int tx2_uncore_pmu_register(
> .start = tx2_uncore_event_start,
> .stop = tx2_uncore_event_stop,
> .read = tx2_uncore_event_read,
> - .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
> + .capabilities = PERF_PMU_CAP_NO_EXCLUDE |
> + PERF_PMU_CAP_NO_MUX_EVENTS,
> };
>
> tx2_pmu->pmu.name = devm_kasprintf(dev, GFP_KERNEL,
> --
> 2.17.1
>