2023-01-12 20:43:19

by Liang, Kan

[permalink] [raw]
Subject: [PATCH RESEND 5/5] perf/x86/uncore: Don't WARN_ON_ONCE() for a broken discovery table

From: Kan Liang <[email protected]>

The kernel warning message is triggered, when SPR MCC is used.

[ 17.945331] ------------[ cut here ]------------
[ 17.946305] WARNING: CPU: 65 PID: 1 at
arch/x86/events/intel/uncore_discovery.c:184
intel_uncore_has_discovery_tables+0x4c0/0x65c
[ 17.946305] Modules linked in:
[ 17.946305] CPU: 65 PID: 1 Comm: swapper/0 Not tainted
5.4.17-2136.313.1-X10-2c+ #4

It's caused by the broken discovery table of UPI.

The discovery tables are from hardware. Except for dropping the broken
information, there is nothing Linux can do. Using WARN_ON_ONCE() is
overkilled.

Use the pr_info() to replace WARN_ON_ONCE(), and specify what uncore unit
is dropped and the reason.

Signed-off-by: Kan Liang <[email protected]>
---
arch/x86/events/intel/uncore_discovery.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/intel/uncore_discovery.c b/arch/x86/events/intel/uncore_discovery.c
index abb51191f5af..cb488e41807c 100644
--- a/arch/x86/events/intel/uncore_discovery.c
+++ b/arch/x86/events/intel/uncore_discovery.c
@@ -128,13 +128,21 @@ uncore_insert_box_info(struct uncore_unit_discovery *unit,
unsigned int *box_offset, *ids;
int i;

- if (WARN_ON_ONCE(!unit->ctl || !unit->ctl_offset || !unit->ctr_offset))
+ if (!unit->ctl || !unit->ctl_offset || !unit->ctr_offset) {
+ pr_info("Invalid address is detected for uncore type %d box %d, "
+ "Disable the uncore unit.\n",
+ unit->box_type, unit->box_id);
return;
+ }

if (parsed) {
type = search_uncore_discovery_type(unit->box_type);
- if (WARN_ON_ONCE(!type))
+ if (!type) {
+ pr_info("A spurious uncore type %d is detected, "
+ "Disable the uncore type.\n",
+ unit->box_type);
return;
+ }
/* Store the first box of each die */
if (!type->box_ctrl_die[die])
type->box_ctrl_die[die] = unit->ctl;
@@ -169,8 +177,12 @@ uncore_insert_box_info(struct uncore_unit_discovery *unit,
ids[i] = type->ids[i];
box_offset[i] = type->box_offset[i];

- if (WARN_ON_ONCE(unit->box_id == ids[i]))
+ if (unit->box_id == ids[i]) {
+ pr_info("Duplicate uncore type %d box ID %d is detected, "
+ "Drop the duplicate uncore unit.\n",
+ unit->box_type, unit->box_id);
goto free_ids;
+ }
}
ids[i] = unit->box_id;
box_offset[i] = unit->ctl - type->box_ctrl;
--
2.35.1


2023-01-21 10:01:42

by tip-bot2 for Tony Luck

[permalink] [raw]
Subject: [tip: perf/core] perf/x86/uncore: Don't WARN_ON_ONCE() for a broken discovery table

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 5d515ee40cb57ea5331998f27df7946a69f14dc3
Gitweb: https://git.kernel.org/tip/5d515ee40cb57ea5331998f27df7946a69f14dc3
Author: Kan Liang <[email protected]>
AuthorDate: Thu, 12 Jan 2023 12:01:05 -08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Sat, 21 Jan 2023 00:06:13 +01:00

perf/x86/uncore: Don't WARN_ON_ONCE() for a broken discovery table

The kernel warning message is triggered, when SPR MCC is used.

[ 17.945331] ------------[ cut here ]------------
[ 17.946305] WARNING: CPU: 65 PID: 1 at
arch/x86/events/intel/uncore_discovery.c:184
intel_uncore_has_discovery_tables+0x4c0/0x65c
[ 17.946305] Modules linked in:
[ 17.946305] CPU: 65 PID: 1 Comm: swapper/0 Not tainted
5.4.17-2136.313.1-X10-2c+ #4

It's caused by the broken discovery table of UPI.

The discovery tables are from hardware. Except for dropping the broken
information, there is nothing Linux can do. Using WARN_ON_ONCE() is
overkilled.

Use the pr_info() to replace WARN_ON_ONCE(), and specify what uncore unit
is dropped and the reason.

Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Petlan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/events/intel/uncore_discovery.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/intel/uncore_discovery.c b/arch/x86/events/intel/uncore_discovery.c
index abb5119..cb488e4 100644
--- a/arch/x86/events/intel/uncore_discovery.c
+++ b/arch/x86/events/intel/uncore_discovery.c
@@ -128,13 +128,21 @@ uncore_insert_box_info(struct uncore_unit_discovery *unit,
unsigned int *box_offset, *ids;
int i;

- if (WARN_ON_ONCE(!unit->ctl || !unit->ctl_offset || !unit->ctr_offset))
+ if (!unit->ctl || !unit->ctl_offset || !unit->ctr_offset) {
+ pr_info("Invalid address is detected for uncore type %d box %d, "
+ "Disable the uncore unit.\n",
+ unit->box_type, unit->box_id);
return;
+ }

if (parsed) {
type = search_uncore_discovery_type(unit->box_type);
- if (WARN_ON_ONCE(!type))
+ if (!type) {
+ pr_info("A spurious uncore type %d is detected, "
+ "Disable the uncore type.\n",
+ unit->box_type);
return;
+ }
/* Store the first box of each die */
if (!type->box_ctrl_die[die])
type->box_ctrl_die[die] = unit->ctl;
@@ -169,8 +177,12 @@ uncore_insert_box_info(struct uncore_unit_discovery *unit,
ids[i] = type->ids[i];
box_offset[i] = type->box_offset[i];

- if (WARN_ON_ONCE(unit->box_id == ids[i]))
+ if (unit->box_id == ids[i]) {
+ pr_info("Duplicate uncore type %d box ID %d is detected, "
+ "Drop the duplicate uncore unit.\n",
+ unit->box_type, unit->box_id);
goto free_ids;
+ }
}
ids[i] = unit->box_id;
box_offset[i] = unit->ctl - type->box_ctrl;