2019-05-29 08:45:47

by Robert Richter

[permalink] [raw]
Subject: [PATCH 00/21] EDAC, mc, ghes: Fixes and updates to improve memory error reporting

Current arm64 systems that use the ghes driver lack kernel support for
a proper memory error reporting. Following issues are seen:

* Error record shows insufficient data, such as "EDAC MC0: 1 CE
unknown error on unknown label",

* DMI DIMM labels are not decoded for error reporting,

* No memory hierarchy known (NUMA topology),

* No per layer reporting,

* Significant differences to x86 error reports.

This patch set addresses all the above involving a rework of the
ghes_edac and edac_mc driver.

Patch #1: Repost of an already accepted patch sent to the ml. Adding
it here for completeness as I did not find it in a repository yet. The
fix is also required for this series. Note there is a modification
compared to the previous version that further reduces complexity
(early break of the loop removed).

Patches #2-#11: General fixes and improvements of the ghes and mc
drivers. Most of it is a rework of existing code without functional
changes to improve, ease, cleanup and join common code. The changes
are in preparation of and a requirment for the following patches that
improve ghes error reports.

Patches #12-#20: Improve error memory reporting of the ghes driver
including:

* support for legacy API (patch #12),

* NUMA detection, one mc device per node (patches #13-#16),

* support for DMI DIMM label information (patch #17),

* per-layer reporting (patches #18-#20).

Patches #21: Documentation updates.

All changes should keep existing systems working as before. All
systems that are using ghes will also benefit from the update. There
is a fallback in the ghes driver that disables NUMA or enters a fake
mode if some of the NUMA or DIMM information is inconsistent. So it
should not break existing systems that provide broken firmware tables.

The series has been tested on a Marvell/Cavium ThunderX2 system. Here
some example logs and sysfs entries:

Boot log of memory hierarchy and dimm detection:

EDAC DEBUG: mem_info_setup: DIMM0: Found mem range [0x0000008800000000-0x0000009ffcffffff] on node 0, handle: 0x0038, label: N0 DIMM_A0
EDAC DEBUG: mem_info_setup: DIMM1: Found mem range [0x0000008800000000-0x0000009ffcffffff] on node 0, handle: 0x0039, label: N0 DIMM_B0
EDAC DEBUG: mem_info_setup: DIMM2: Found mem range [0x0000008800000000-0x0000009ffcffffff] on node 0, handle: 0x003a, label: N0 DIMM_C0
EDAC DEBUG: mem_info_setup: DIMM3: Found mem range [0x0000008800000000-0x0000009ffcffffff] on node 0, handle: 0x003b, label: N0 DIMM_D0
EDAC DEBUG: mem_info_setup: DIMM4: Found mem range [0x0000008800000000-0x0000009ffcffffff] on node 0, handle: 0x003c, label: N0 DIMM_E0
EDAC DEBUG: mem_info_setup: DIMM5: Found mem range [0x0000008800000000-0x0000009ffcffffff] on node 0, handle: 0x003d, label: N0 DIMM_F0
EDAC DEBUG: mem_info_setup: DIMM6: Found mem range [0x0000008800000000-0x0000009ffcffffff] on node 0, handle: 0x003e, label: N0 DIMM_G0
EDAC DEBUG: mem_info_setup: DIMM7: Found mem range [0x0000008800000000-0x0000009ffcffffff] on node 0, handle: 0x003f, label: N0 DIMM_H0
EDAC DEBUG: mem_info_setup: DIMM8: Found mem range [0x0000009ffd000000-0x000000bffcffffff] on node 1, handle: 0x004f, label: N1 DIMM_I0
EDAC DEBUG: mem_info_setup: DIMM9: Found mem range [0x0000009ffd000000-0x000000bffcffffff] on node 1, handle: 0x0050, label: N1 DIMM_J0
EDAC DEBUG: mem_info_setup: DIMM10: Found mem range [0x0000009ffd000000-0x000000bffcffffff] on node 1, handle: 0x0051, label: N1 DIMM_K0
EDAC DEBUG: mem_info_setup: DIMM11: Found mem range [0x0000009ffd000000-0x000000bffcffffff] on node 1, handle: 0x0052, label: N1 DIMM_L0
EDAC DEBUG: mem_info_setup: DIMM12: Found mem range [0x0000009ffd000000-0x000000bffcffffff] on node 1, handle: 0x0053, label: N1 DIMM_M0
EDAC DEBUG: mem_info_setup: DIMM13: Found mem range [0x0000009ffd000000-0x000000bffcffffff] on node 1, handle: 0x0054, label: N1 DIMM_N0
EDAC DEBUG: mem_info_setup: DIMM14: Found mem range [0x0000009ffd000000-0x000000bffcffffff] on node 1, handle: 0x0055, label: N1 DIMM_O0
EDAC DEBUG: mem_info_setup: DIMM15: Found mem range [0x0000009ffd000000-0x000000bffcffffff] on node 1, handle: 0x0056, label: N1 DIMM_P0

DIMM label entries in sysfs:

# grep . /sys/devices/system/edac/mc/mc*/dimm*/dimm_label
/sys/devices/system/edac/mc/mc0/dimm0/dimm_label:N0 DIMM_A0
/sys/devices/system/edac/mc/mc0/dimm1/dimm_label:N0 DIMM_B0
/sys/devices/system/edac/mc/mc0/dimm2/dimm_label:N0 DIMM_C0
/sys/devices/system/edac/mc/mc0/dimm3/dimm_label:N0 DIMM_D0
/sys/devices/system/edac/mc/mc0/dimm4/dimm_label:N0 DIMM_E0
/sys/devices/system/edac/mc/mc0/dimm5/dimm_label:N0 DIMM_F0
/sys/devices/system/edac/mc/mc0/dimm6/dimm_label:N0 DIMM_G0
/sys/devices/system/edac/mc/mc0/dimm7/dimm_label:N0 DIMM_H0
/sys/devices/system/edac/mc/mc1/dimm0/dimm_label:N1 DIMM_I0
/sys/devices/system/edac/mc/mc1/dimm1/dimm_label:N1 DIMM_J0
/sys/devices/system/edac/mc/mc1/dimm2/dimm_label:N1 DIMM_K0
/sys/devices/system/edac/mc/mc1/dimm3/dimm_label:N1 DIMM_L0
/sys/devices/system/edac/mc/mc1/dimm4/dimm_label:N1 DIMM_M0
/sys/devices/system/edac/mc/mc1/dimm5/dimm_label:N1 DIMM_N0
/sys/devices/system/edac/mc/mc1/dimm6/dimm_label:N1 DIMM_O0
/sys/devices/system/edac/mc/mc1/dimm7/dimm_label:N1 DIMM_P0

Memory error reports in the kernel log:

{1}[Hardware Error]: Error 4, type: corrected
{1}[Hardware Error]: section_type: memory error
{1}[Hardware Error]: error_status: 0x0000000000000400
{1}[Hardware Error]: physical_address: 0x000000bd0db44000
{1}[Hardware Error]: node: 1 card: 3 module: 0 rank: 0 bank: 256 column: 10 bit_position: 16
{1}[Hardware Error]: DIMM location: N1 DIMM_L0
EDAC MC1: 1 CE ghes_mc on N1 DIMM_L0 (card:3 module:0 page:0xbd0db44 offset:0x0 grain:0 syndrome:0x0 - APEI location: node:1 card:3 module:0 rank:0 bank:256 col:10 bit_pos:16 handle:0x0052 status(0x0000000000000400): Storage error in DRAM memory)

Error counters in sysfs (zero counters dropped):

# find /sys/devices/system/edac/mc/ -name \*count | sort -V | xargs grep . | sed -e '/:0/d'
/sys/devices/system/edac/mc/mc0/ce_count:5
/sys/devices/system/edac/mc/mc0/csrow0/ce_count:1
/sys/devices/system/edac/mc/mc0/csrow0/ch0_ce_count:1
/sys/devices/system/edac/mc/mc0/csrow3/ce_count:1
/sys/devices/system/edac/mc/mc0/csrow3/ch0_ce_count:1
/sys/devices/system/edac/mc/mc0/csrow4/ce_count:1
/sys/devices/system/edac/mc/mc0/csrow4/ch0_ce_count:1
/sys/devices/system/edac/mc/mc0/csrow6/ce_count:2
/sys/devices/system/edac/mc/mc0/csrow6/ch0_ce_count:2
/sys/devices/system/edac/mc/mc0/dimm0/dimm_ce_count:1
/sys/devices/system/edac/mc/mc0/dimm3/dimm_ce_count:1
/sys/devices/system/edac/mc/mc0/dimm4/dimm_ce_count:1
/sys/devices/system/edac/mc/mc0/dimm6/dimm_ce_count:2
/sys/devices/system/edac/mc/mc1/ce_count:4
/sys/devices/system/edac/mc/mc1/csrow0/ce_count:1
/sys/devices/system/edac/mc/mc1/csrow0/ch0_ce_count:1
/sys/devices/system/edac/mc/mc1/csrow3/ce_count:1
/sys/devices/system/edac/mc/mc1/csrow3/ch0_ce_count:1
/sys/devices/system/edac/mc/mc1/csrow6/ce_count:2
/sys/devices/system/edac/mc/mc1/csrow6/ch0_ce_count:2
/sys/devices/system/edac/mc/mc1/dimm0/dimm_ce_count:1
/sys/devices/system/edac/mc/mc1/dimm3/dimm_ce_count:1
/sys/devices/system/edac/mc/mc1/dimm6/dimm_ce_count:2


Robert Richter (21):
EDAC, mc: Fix edac_mc_find() in case no device is found
EDAC: Fixes to use put_device() after device_add() errors
EDAC: Kill EDAC_DIMM_PTR() macro
EDAC: Kill EDAC_DIMM_OFF() macro
EDAC: Introduce mci_for_each_dimm() iterator
EDAC, mc: Cleanup _edac_mc_free() code
EDAC, mc: Remove per layer counters
EDAC, mc: Rework edac_raw_mc_handle_error() to use struct dimm_info
EDAC, ghes: Use standard kernel macros for page calculations
EDAC, ghes: Remove pvt->detail_location string
EDAC, ghes: Unify trace_mc_event() code with edac_mc driver
EDAC, ghes: Add support for legacy API counters
EDAC, ghes: Rework memory hierarchy detection
EDAC, ghes: Extract numa node information for each dimm
EDAC, ghes: Moving code around ghes_edac_register()
EDAC, ghes: Create one memory controller device per node
EDAC, ghes: Fill sysfs with the DMI DIMM label information
EDAC, mc: Introduce edac_mc_alloc_by_dimm() for per dimm allocation
EDAC, ghes: Identify dimm by node, card, module and handle
EDAC, ghes: Enable per-layer reporting based on card/module
EDAC, Documentation: Describe CPER module definition and DIMM ranks

Documentation/admin-guide/ras.rst | 31 +-
drivers/edac/edac_mc.c | 360 +++++++++---------
drivers/edac/edac_mc.h | 26 +-
drivers/edac/edac_mc_sysfs.c | 103 ++---
drivers/edac/ghes_edac.c | 609 +++++++++++++++++++++++-------
drivers/edac/i10nm_base.c | 3 +-
drivers/edac/i3200_edac.c | 3 +-
drivers/edac/i5000_edac.c | 5 +-
drivers/edac/i5100_edac.c | 14 +-
drivers/edac/i5400_edac.c | 4 +-
drivers/edac/i7300_edac.c | 3 +-
drivers/edac/i7core_edac.c | 3 +-
drivers/edac/ie31200_edac.c | 7 +-
drivers/edac/pnd2_edac.c | 4 +-
drivers/edac/sb_edac.c | 2 +-
drivers/edac/skx_base.c | 3 +-
drivers/edac/ti_edac.c | 2 +-
include/linux/edac.h | 141 ++++---
18 files changed, 815 insertions(+), 508 deletions(-)

--
2.20.1


2019-05-29 08:45:57

by Robert Richter

[permalink] [raw]
Subject: [PATCH 05/21] EDAC: Introduce mci_for_each_dimm() iterator

Make code more readable by introducing a mci_for_each_dimm() iterator.
Now, we just get a pointer to a struct dimm_info. Direct array access
using an index is no longer needed to iterate.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/edac/edac_mc.c | 18 ++++++++++--------
drivers/edac/edac_mc_sysfs.c | 34 +++++++++++++++-------------------
drivers/edac/ghes_edac.c | 8 ++++----
drivers/edac/i5100_edac.c | 11 +++++------
include/linux/edac.h | 7 +++++++
5 files changed, 41 insertions(+), 37 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 07edbd80af07..a6b34ccce3d4 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -145,15 +145,18 @@ static void edac_mc_dump_channel(struct rank_info *chan)
edac_dbg(4, " channel->dimm = %p\n", chan->dimm);
}

-static void edac_mc_dump_dimm(struct dimm_info *dimm, int number)
+static void edac_mc_dump_dimm(struct dimm_info *dimm)
{
char location[80];

+ if (!dimm->nr_pages)
+ return;
+
edac_dimm_info_location(dimm, location, sizeof(location));

edac_dbg(4, "%s%i: %smapped as virtual row %d, chan %d\n",
dimm->mci->csbased ? "rank" : "dimm",
- number, location, dimm->csrow, dimm->cschannel);
+ dimm->idx, location, dimm->csrow, dimm->cschannel);
edac_dbg(4, " dimm = %p\n", dimm);
edac_dbg(4, " dimm->label = '%s'\n", dimm->label);
edac_dbg(4, " dimm->nr_pages = 0x%x\n", dimm->nr_pages);
@@ -703,6 +706,7 @@ EXPORT_SYMBOL_GPL(edac_get_owner);
int edac_mc_add_mc_with_groups(struct mem_ctl_info *mci,
const struct attribute_group **groups)
{
+ struct dimm_info *dimm;
int ret = -EINVAL;
edac_dbg(0, "\n");

@@ -727,9 +731,8 @@ int edac_mc_add_mc_with_groups(struct mem_ctl_info *mci,
if (csrow->channels[j]->dimm->nr_pages)
edac_mc_dump_channel(csrow->channels[j]);
}
- for (i = 0; i < mci->tot_dimms; i++)
- if (mci->dimms[i]->nr_pages)
- edac_mc_dump_dimm(mci->dimms[i], i);
+ mci_for_each_dimm(mci, dimm)
+ edac_mc_dump_dimm(dimm);
}
#endif
mutex_lock(&mem_ctls_mutex);
@@ -1087,6 +1090,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
const char *msg,
const char *other_detail)
{
+ struct dimm_info *dimm;
char *p;
int row = -1, chan = -1;
int pos[EDAC_MAX_LAYERS] = { top_layer, mid_layer, low_layer };
@@ -1147,9 +1151,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
p = e->label;
*p = '\0';

- for (i = 0; i < mci->tot_dimms; i++) {
- struct dimm_info *dimm = mci->dimms[i];
-
+ mci_for_each_dimm(mci, dimm) {
if (top_layer >= 0 && top_layer != dimm->location[0])
continue;
if (mid_layer >= 0 && mid_layer != dimm->location[1])
diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
index 8beefa699a49..75cba0812a16 100644
--- a/drivers/edac/edac_mc_sysfs.c
+++ b/drivers/edac/edac_mc_sysfs.c
@@ -622,8 +622,7 @@ static const struct device_type dimm_attr_type = {

/* Create a DIMM object under specifed memory controller device */
static int edac_create_dimm_object(struct mem_ctl_info *mci,
- struct dimm_info *dimm,
- int index)
+ struct dimm_info *dimm)
{
int err;
dimm->mci = mci;
@@ -633,9 +632,9 @@ static int edac_create_dimm_object(struct mem_ctl_info *mci,

dimm->dev.parent = &mci->dev;
if (mci->csbased)
- dev_set_name(&dimm->dev, "rank%d", index);
+ dev_set_name(&dimm->dev, "rank%d", dimm->idx);
else
- dev_set_name(&dimm->dev, "dimm%d", index);
+ dev_set_name(&dimm->dev, "dimm%d", dimm->idx);
dev_set_drvdata(&dimm->dev, dimm);
pm_runtime_forbid(&mci->dev);

@@ -909,7 +908,8 @@ static const struct device_type mci_attr_type = {
int edac_create_sysfs_mci_device(struct mem_ctl_info *mci,
const struct attribute_group **groups)
{
- int i, err;
+ struct dimm_info *dimm;
+ int err;

/* get the /sys/devices/system/edac subsys reference */
mci->dev.type = &mci_attr_type;
@@ -932,14 +932,13 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci,
/*
* Create the dimm/rank devices
*/
- for (i = 0; i < mci->tot_dimms; i++) {
- struct dimm_info *dimm = mci->dimms[i];
+ mci_for_each_dimm(mci, dimm) {
/* Only expose populated DIMMs */
if (!dimm->nr_pages)
continue;

#ifdef CONFIG_EDAC_DEBUG
- edac_dbg(1, "creating dimm%d, located at ", i);
+ edac_dbg(1, "creating dimm%d, located at ", dimm->idx);
if (edac_debug_level >= 1) {
int lay;
for (lay = 0; lay < mci->n_layers; lay++)
@@ -949,9 +948,10 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci,
printk(KERN_CONT "\n");
}
#endif
- err = edac_create_dimm_object(mci, dimm, i);
+ err = edac_create_dimm_object(mci, dimm);
if (err) {
- edac_dbg(1, "failure: create dimm %d obj\n", i);
+ edac_dbg(1, "failure: create dimm %d obj\n",
+ dimm->idx);
goto fail_unregister_dimm;
}
}
@@ -966,12 +966,9 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci,
return 0;

fail_unregister_dimm:
- for (i--; i >= 0; i--) {
- struct dimm_info *dimm = mci->dimms[i];
- if (!dimm->nr_pages)
- continue;
-
- device_unregister(&dimm->dev);
+ mci_for_each_dimm(mci, dimm) {
+ if (device_is_registered(&dimm->dev))
+ device_unregister(&dimm->dev);
}
device_unregister(&mci->dev);

@@ -983,7 +980,7 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci,
*/
void edac_remove_sysfs_mci_device(struct mem_ctl_info *mci)
{
- int i;
+ struct dimm_info *dimm;

edac_dbg(0, "\n");

@@ -994,8 +991,7 @@ void edac_remove_sysfs_mci_device(struct mem_ctl_info *mci)
edac_delete_csrow_objects(mci);
#endif

- for (i = 0; i < mci->tot_dimms; i++) {
- struct dimm_info *dimm = mci->dimms[i];
+ mci_for_each_dimm(mci, dimm) {
if (dimm->nr_pages == 0)
continue;
edac_dbg(0, "removing device %s\n", dev_name(&dimm->dev));
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 42afa2604db3..fe45392f0c3e 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -84,11 +84,11 @@ static void ghes_edac_count_dimms(const struct dmi_header *dh, void *arg)
static int get_dimm_smbios_index(u16 handle)
{
struct mem_ctl_info *mci = ghes_pvt->mci;
- int i;
+ struct dimm_info *dimm;

- for (i = 0; i < mci->tot_dimms; i++) {
- if (mci->dimms[i]->smbios_handle == handle)
- return i;
+ mci_for_each_dimm(mci, dimm) {
+ if (dimm->smbios_handle == handle)
+ return dimm->idx;
}
return -1;
}
diff --git a/drivers/edac/i5100_edac.c b/drivers/edac/i5100_edac.c
index 39ba7f2414ae..7ec42b26a716 100644
--- a/drivers/edac/i5100_edac.c
+++ b/drivers/edac/i5100_edac.c
@@ -846,14 +846,13 @@ static void i5100_init_interleaving(struct pci_dev *pdev,

static void i5100_init_csrows(struct mem_ctl_info *mci)
{
- int i;
struct i5100_priv *priv = mci->pvt_info;
+ struct dimm_info *dimm;

- for (i = 0; i < mci->tot_dimms; i++) {
- struct dimm_info *dimm;
- const unsigned long npages = i5100_npages(mci, i);
- const unsigned chan = i5100_csrow_to_chan(mci, i);
- const unsigned rank = i5100_csrow_to_rank(mci, i);
+ mci_for_each_dimm(mci, dimm) {
+ const unsigned long npages = i5100_npages(mci, dimm->idx);
+ const unsigned chan = i5100_csrow_to_chan(mci, dimm->idx);
+ const unsigned rank = i5100_csrow_to_rank(mci, dimm->idx);

if (!npages)
continue;
diff --git a/include/linux/edac.h b/include/linux/edac.h
index 2ee9b8598ae0..20a04f48616c 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -599,6 +599,13 @@ struct mem_ctl_info {
u16 fake_inject_count;
};

+#define mci_for_each_dimm(mci, dimm) \
+ for ((dimm) = (mci)->dimms[0]; \
+ (dimm); \
+ (dimm) = (dimm)->idx < (mci)->tot_dimms \
+ ? (mci)->dimms[(dimm)->idx + 1] \
+ : NULL)
+
/**
* edac_get_dimm_by_index - Get DIMM info from a memory controller
* given by an index
--
2.20.1

2019-05-29 08:45:59

by Robert Richter

[permalink] [raw]
Subject: [PATCH 04/21] EDAC: Kill EDAC_DIMM_OFF() macro

We do not need to calculate the offset in the mc's dimm array, let's
just store the index in struct dimm_info and we can get rid of this
macro.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/edac/edac_mc.c | 13 ++++--------
drivers/edac/edac_mc_sysfs.c | 20 ++++--------------
include/linux/edac.h | 41 ------------------------------------
3 files changed, 8 insertions(+), 66 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 5f565e5949b3..07edbd80af07 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -318,7 +318,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
unsigned size, tot_dimms = 1, count = 1;
unsigned tot_csrows = 1, tot_channels = 1, tot_errcount = 0;
void *pvt, *p, *ptr = NULL;
- int i, j, row, chn, n, len, off;
+ int idx, i, j, row, chn, n, len;
bool per_rank = false;

BUG_ON(n_layers > EDAC_MAX_LAYERS || n_layers == 0);
@@ -426,20 +426,15 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
memset(&pos, 0, sizeof(pos));
row = 0;
chn = 0;
- for (i = 0; i < tot_dimms; i++) {
+ for (idx = 0; idx < tot_dimms; idx++) {
chan = mci->csrows[row]->channels[chn];
- off = EDAC_DIMM_OFF(layer, n_layers, pos[0], pos[1], pos[2]);
- if (off < 0 || off >= tot_dimms) {
- edac_mc_printk(mci, KERN_ERR, "EDAC core bug: EDAC_DIMM_OFF is trying to do an illegal data access\n");
- goto error;
- }

dimm = kzalloc(sizeof(**mci->dimms), GFP_KERNEL);
if (!dimm)
goto error;
- mci->dimms[off] = dimm;
+ mci->dimms[idx] = dimm;
dimm->mci = mci;
- dimm->idx = off;
+ dimm->idx = idx;

/*
* Copy DIMM location and initialize it.
diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
index dbef699162a8..8beefa699a49 100644
--- a/drivers/edac/edac_mc_sysfs.c
+++ b/drivers/edac/edac_mc_sysfs.c
@@ -558,14 +558,8 @@ static ssize_t dimmdev_ce_count_show(struct device *dev,
{
struct dimm_info *dimm = to_dimm(dev);
u32 count;
- int off;
-
- off = EDAC_DIMM_OFF(dimm->mci->layers,
- dimm->mci->n_layers,
- dimm->location[0],
- dimm->location[1],
- dimm->location[2]);
- count = dimm->mci->ce_per_layer[dimm->mci->n_layers-1][off];
+
+ count = dimm->mci->ce_per_layer[dimm->mci->n_layers-1][dimm->idx];
return sprintf(data, "%u\n", count);
}

@@ -575,14 +569,8 @@ static ssize_t dimmdev_ue_count_show(struct device *dev,
{
struct dimm_info *dimm = to_dimm(dev);
u32 count;
- int off;
-
- off = EDAC_DIMM_OFF(dimm->mci->layers,
- dimm->mci->n_layers,
- dimm->location[0],
- dimm->location[1],
- dimm->location[2]);
- count = dimm->mci->ue_per_layer[dimm->mci->n_layers-1][off];
+
+ count = dimm->mci->ue_per_layer[dimm->mci->n_layers-1][dimm->idx];
return sprintf(data, "%u\n", count);
}

diff --git a/include/linux/edac.h b/include/linux/edac.h
index 1367a3fc544f..2ee9b8598ae0 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -362,47 +362,6 @@ struct edac_mc_layer {
*/
#define EDAC_MAX_LAYERS 3

-/**
- * EDAC_DIMM_OFF - Macro responsible to get a pointer offset inside a pointer
- * array for the element given by [layer0,layer1,layer2]
- * position
- *
- * @layers: a struct edac_mc_layer array, describing how many elements
- * were allocated for each layer
- * @nlayers: Number of layers at the @layers array
- * @layer0: layer0 position
- * @layer1: layer1 position. Unused if n_layers < 2
- * @layer2: layer2 position. Unused if n_layers < 3
- *
- * For 1 layer, this macro returns "var[layer0] - var";
- *
- * For 2 layers, this macro is similar to allocate a bi-dimensional array
- * and to return "var[layer0][layer1] - var";
- *
- * For 3 layers, this macro is similar to allocate a tri-dimensional array
- * and to return "var[layer0][layer1][layer2] - var".
- *
- * A loop could be used here to make it more generic, but, as we only have
- * 3 layers, this is a little faster.
- *
- * By design, layers can never be 0 or more than 3. If that ever happens,
- * a NULL is returned, causing an OOPS during the memory allocation routine,
- * with would point to the developer that he's doing something wrong.
- */
-#define EDAC_DIMM_OFF(layers, nlayers, layer0, layer1, layer2) ({ \
- int __i; \
- if ((nlayers) == 1) \
- __i = layer0; \
- else if ((nlayers) == 2) \
- __i = (layer1) + ((layers[1]).size * (layer0)); \
- else if ((nlayers) == 3) \
- __i = (layer2) + ((layers[2]).size * ((layer1) + \
- ((layers[1]).size * (layer0)))); \
- else \
- __i = -EINVAL; \
- __i; \
-})
-
struct dimm_info {
struct device dev;

--
2.20.1

2019-05-29 08:46:09

by Robert Richter

[permalink] [raw]
Subject: [PATCH 07/21] EDAC, mc: Remove per layer counters

Looking at how mci->{ue,ce}_per_layer[EDAC_MAX_LAYERS] is used, it
turns out that only the leaves in the memory hierarchy are consumed
(in sysfs), but not the intermediate layers, e.g.:

count = dimm->mci->ce_per_layer[dimm->mci->n_layers-1][dimm->idx];

So let's get rid of the unused counters that just add complexity.

Error counter values are directly stored in struct dimm_info now.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/edac/edac_mc.c | 98 ++++++++++++------------------------
drivers/edac/edac_mc_sysfs.c | 17 ++-----
drivers/edac/ghes_edac.c | 5 +-
include/linux/edac.h | 7 ++-
4 files changed, 39 insertions(+), 88 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index a849993a8dcd..2e461c9e1a89 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -313,10 +313,9 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
struct csrow_info *csr;
struct rank_info *chan;
struct dimm_info *dimm;
- u32 *ce_per_layer[EDAC_MAX_LAYERS], *ue_per_layer[EDAC_MAX_LAYERS];
unsigned pos[EDAC_MAX_LAYERS];
- unsigned size, tot_dimms = 1, count = 1;
- unsigned tot_csrows = 1, tot_channels = 1, tot_errcount = 0;
+ unsigned size, tot_dimms = 1;
+ unsigned tot_csrows = 1, tot_channels = 1;
void *pvt, *p, *ptr = NULL;
int idx, i, j, row, chn, n, len;
bool per_rank = false;
@@ -342,19 +341,10 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
* stringent as what the compiler would provide if we could simply
* hardcode everything into a single struct.
*/
- mci = edac_align_ptr(&ptr, sizeof(*mci), 1);
- layer = edac_align_ptr(&ptr, sizeof(*layer), n_layers);
- for (i = 0; i < n_layers; i++) {
- count *= layers[i].size;
- edac_dbg(4, "errcount layer %d size %d\n", i, count);
- ce_per_layer[i] = edac_align_ptr(&ptr, sizeof(u32), count);
- ue_per_layer[i] = edac_align_ptr(&ptr, sizeof(u32), count);
- tot_errcount += 2 * count;
- }
-
- edac_dbg(4, "allocating %d error counters\n", tot_errcount);
- pvt = edac_align_ptr(&ptr, sz_pvt, 1);
- size = ((unsigned long)pvt) + sz_pvt;
+ mci = edac_align_ptr(&ptr, sizeof(*mci), 1);
+ layer = edac_align_ptr(&ptr, sizeof(*layer), n_layers);
+ pvt = edac_align_ptr(&ptr, sz_pvt, 1);
+ size = ((unsigned long)pvt) + sz_pvt;

edac_dbg(1, "allocating %u bytes for mci data (%d %s, %d csrows/channels)\n",
size,
@@ -370,10 +360,6 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
* rather than an imaginary chunk of memory located at address 0.
*/
layer = (struct edac_mc_layer *)(((char *)mci) + ((unsigned long)layer));
- for (i = 0; i < n_layers; i++) {
- mci->ce_per_layer[i] = (u32 *)((char *)mci + ((unsigned long)ce_per_layer[i]));
- mci->ue_per_layer[i] = (u32 *)((char *)mci + ((unsigned long)ue_per_layer[i]));
- }
pvt = sz_pvt ? (((char *)mci) + ((unsigned long)pvt)) : NULL;

/* setup index and various internal pointers */
@@ -903,53 +889,31 @@ const char *edac_layer_name[] = {
EXPORT_SYMBOL_GPL(edac_layer_name);

static void edac_inc_ce_error(struct mem_ctl_info *mci,
- bool enable_per_layer_report,
const int pos[EDAC_MAX_LAYERS],
const u16 count)
{
- int i, index = 0;
+ struct dimm_info *dimm = edac_get_dimm(mci, pos[0], pos[1], pos[2]);

mci->ce_mc += count;

- if (!enable_per_layer_report) {
+ if (dimm)
+ dimm->ce_count += count;
+ else
mci->ce_noinfo_count += count;
- return;
- }
-
- for (i = 0; i < mci->n_layers; i++) {
- if (pos[i] < 0)
- break;
- index += pos[i];
- mci->ce_per_layer[i][index] += count;
-
- if (i < mci->n_layers - 1)
- index *= mci->layers[i + 1].size;
- }
}

static void edac_inc_ue_error(struct mem_ctl_info *mci,
- bool enable_per_layer_report,
const int pos[EDAC_MAX_LAYERS],
const u16 count)
{
- int i, index = 0;
+ struct dimm_info *dimm = edac_get_dimm(mci, pos[0], pos[1], pos[2]);

mci->ue_mc += count;

- if (!enable_per_layer_report) {
+ if (dimm)
+ dimm->ue_count += count;
+ else
mci->ue_noinfo_count += count;
- return;
- }
-
- for (i = 0; i < mci->n_layers; i++) {
- if (pos[i] < 0)
- break;
- index += pos[i];
- mci->ue_per_layer[i][index] += count;
-
- if (i < mci->n_layers - 1)
- index *= mci->layers[i + 1].size;
- }
}

static void edac_ce_error(struct mem_ctl_info *mci,
@@ -960,7 +924,6 @@ static void edac_ce_error(struct mem_ctl_info *mci,
const char *label,
const char *detail,
const char *other_detail,
- const bool enable_per_layer_report,
const unsigned long page_frame_number,
const unsigned long offset_in_page,
long grain)
@@ -983,7 +946,7 @@ static void edac_ce_error(struct mem_ctl_info *mci,
error_count, msg, msg_aux, label,
location, detail);
}
- edac_inc_ce_error(mci, enable_per_layer_report, pos, error_count);
+ edac_inc_ce_error(mci, pos, error_count);

if (mci->scrub_mode == SCRUB_SW_SRC) {
/*
@@ -1013,8 +976,7 @@ static void edac_ue_error(struct mem_ctl_info *mci,
const char *location,
const char *label,
const char *detail,
- const char *other_detail,
- const bool enable_per_layer_report)
+ const char *other_detail)
{
char *msg_aux = "";

@@ -1043,7 +1005,7 @@ static void edac_ue_error(struct mem_ctl_info *mci,
msg, msg_aux, label, location, detail);
}

- edac_inc_ue_error(mci, enable_per_layer_report, pos, error_count);
+ edac_inc_ue_error(mci, pos, error_count);
}

void edac_raw_mc_handle_error(const enum hw_event_mc_err_type type,
@@ -1059,16 +1021,16 @@ void edac_raw_mc_handle_error(const enum hw_event_mc_err_type type,
"page:0x%lx offset:0x%lx grain:%ld syndrome:0x%lx",
e->page_frame_number, e->offset_in_page,
e->grain, e->syndrome);
- edac_ce_error(mci, e->error_count, pos, e->msg, e->location, e->label,
- detail, e->other_detail, e->enable_per_layer_report,
+ edac_ce_error(mci, e->error_count, pos, e->msg, e->location,
+ e->label, detail, e->other_detail,
e->page_frame_number, e->offset_in_page, e->grain);
} else {
snprintf(detail, sizeof(detail),
"page:0x%lx offset:0x%lx grain:%ld",
e->page_frame_number, e->offset_in_page, e->grain);

- edac_ue_error(mci, e->error_count, pos, e->msg, e->location, e->label,
- detail, e->other_detail, e->enable_per_layer_report);
+ edac_ue_error(mci, e->error_count, pos, e->msg, e->location,
+ e->label, detail, e->other_detail);
}


@@ -1094,6 +1056,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
int i, n_labels = 0;
u8 grain_bits;
struct edac_raw_error_desc *e = &mci->error_desc;
+ bool per_layer_report = false;

edac_dbg(3, "MC%d\n", mci->mc_idx);

@@ -1111,9 +1074,9 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,

/*
* Check if the event report is consistent and if the memory
- * location is known. If it is known, enable_per_layer_report will be
- * true, the DIMM(s) label info will be filled and the per-layer
- * error counters will be incremented.
+ * location is known. If it is known, the DIMM(s) label info
+ * will be filled and the per-layer error counters will be
+ * incremented.
*/
for (i = 0; i < mci->n_layers; i++) {
if (pos[i] >= (int)mci->layers[i].size) {
@@ -1131,7 +1094,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
pos[i] = -1;
}
if (pos[i] >= 0)
- e->enable_per_layer_report = true;
+ per_layer_report = true;
}

/*
@@ -1160,15 +1123,18 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
if (dimm->grain > e->grain)
e->grain = dimm->grain;

+ if (!per_layer_report)
+ continue;
+
/*
* If the error is memory-controller wide, there's no need to
* seek for the affected DIMMs because the whole
* channel/memory controller/... may be affected.
* Also, don't show errors for empty DIMM slots.
*/
- if (e->enable_per_layer_report && dimm->nr_pages) {
+ if (dimm->nr_pages) {
if (n_labels >= EDAC_MAX_LABELS) {
- e->enable_per_layer_report = false;
+ per_layer_report = false;
break;
}
n_labels++;
@@ -1199,7 +1165,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
}
}

- if (!e->enable_per_layer_report) {
+ if (!per_layer_report) {
strcpy(e->label, "any memory");
} else {
edac_dbg(4, "csrow/channel to increment: (%d,%d)\n", row, chan);
diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
index 75cba0812a16..55357d243fca 100644
--- a/drivers/edac/edac_mc_sysfs.c
+++ b/drivers/edac/edac_mc_sysfs.c
@@ -557,10 +557,8 @@ static ssize_t dimmdev_ce_count_show(struct device *dev,
char *data)
{
struct dimm_info *dimm = to_dimm(dev);
- u32 count;

- count = dimm->mci->ce_per_layer[dimm->mci->n_layers-1][dimm->idx];
- return sprintf(data, "%u\n", count);
+ return sprintf(data, "%u\n", dimm->ce_count);
}

static ssize_t dimmdev_ue_count_show(struct device *dev,
@@ -568,10 +566,8 @@ static ssize_t dimmdev_ue_count_show(struct device *dev,
char *data)
{
struct dimm_info *dimm = to_dimm(dev);
- u32 count;

- count = dimm->mci->ue_per_layer[dimm->mci->n_layers-1][dimm->idx];
- return sprintf(data, "%u\n", count);
+ return sprintf(data, "%u\n", dimm->ue_count);
}

/* dimm/rank attribute files */
@@ -659,7 +655,7 @@ static ssize_t mci_reset_counters_store(struct device *dev,
const char *data, size_t count)
{
struct mem_ctl_info *mci = to_mci(dev);
- int cnt, row, chan, i;
+ int row, chan;
mci->ue_mc = 0;
mci->ce_mc = 0;
mci->ue_noinfo_count = 0;
@@ -675,13 +671,6 @@ static ssize_t mci_reset_counters_store(struct device *dev,
ri->channels[chan]->ce_count = 0;
}

- cnt = 1;
- for (i = 0; i < mci->n_layers; i++) {
- cnt *= mci->layers[i].size;
- memset(mci->ce_per_layer[i], 0, cnt * sizeof(u32));
- memset(mci->ue_per_layer[i], 0, cnt * sizeof(u32));
- }
-
mci->start_time = jiffies;
return count;
}
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index fe45392f0c3e..2c8f816c2cc7 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -350,11 +350,8 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
mem_err->mem_dev_handle);

index = get_dimm_smbios_index(mem_err->mem_dev_handle);
- if (index >= 0) {
+ if (index >= 0)
e->top_layer = index;
- e->enable_per_layer_report = true;
- }
-
}
if (p > e->location)
*(p - 1) = '\0';
diff --git a/include/linux/edac.h b/include/linux/edac.h
index 20a04f48616c..4dcf075e9dff 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -383,6 +383,9 @@ struct dimm_info {
unsigned csrow, cschannel; /* Points to the old API data */

u16 smbios_handle; /* Handle for SMBIOS type 17 */
+
+ u32 ce_count;
+ u32 ue_count;
};

/**
@@ -453,8 +456,6 @@ struct errcount_attribute_data {
* @location: location of the error
* @label: label of the affected DIMM(s)
* @other_detail: other driver-specific detail about the error
- * @enable_per_layer_report: if false, the error affects all layers
- * (typically, a memory controller error)
*/
struct edac_raw_error_desc {
/*
@@ -475,7 +476,6 @@ struct edac_raw_error_desc {
unsigned long syndrome;
const char *msg;
const char *other_detail;
- bool enable_per_layer_report;
};

/* MEMORY controller information structure
@@ -565,7 +565,6 @@ struct mem_ctl_info {
*/
u32 ce_noinfo_count, ue_noinfo_count;
u32 ue_mc, ce_mc;
- u32 *ce_per_layer[EDAC_MAX_LAYERS], *ue_per_layer[EDAC_MAX_LAYERS];

struct completion complete;

--
2.20.1

2019-05-29 08:46:16

by Robert Richter

[permalink] [raw]
Subject: [PATCH 10/21] EDAC, ghes: Remove pvt->detail_location string

The detail_location[] string in struct ghes_edac_pvt is complete
useless and data is just copied around. Put everything into
e->other_detail from the beginning.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/edac/ghes_edac.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 39702bac5eaf..c18f16bc9e4d 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -23,8 +23,7 @@ struct ghes_edac_pvt {
struct mem_ctl_info *mci;

/* Buffers for the error handling routine */
- char detail_location[240];
- char other_detail[160];
+ char other_detail[400];
char msg[80];
};

@@ -225,13 +224,14 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
memset(e, 0, sizeof (*e));
e->error_count = 1;
strcpy(e->label, "unknown label");
- e->msg = pvt->msg;
- e->other_detail = pvt->other_detail;
e->top_layer = -1;
e->mid_layer = -1;
e->low_layer = -1;
- *pvt->other_detail = '\0';
+ e->msg = pvt->msg;
+ e->other_detail = pvt->other_detail;
+
*pvt->msg = '\0';
+ *pvt->other_detail = '\0';

switch (sev) {
case GHES_SEV_CORRECTED:
@@ -359,6 +359,8 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)

/* All other fields are mapped on e->other_detail */
p = pvt->other_detail;
+ p += snprintf(p, sizeof(pvt->other_detail),
+ "APEI location: %s ", e->location);
if (mem_err->validation_bits & CPER_MEM_VALID_ERROR_STATUS) {
u64 status = mem_err->error_status;

@@ -434,12 +436,11 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)

/* Generate the trace event */
grain_bits = fls_long(e->grain);
- snprintf(pvt->detail_location, sizeof(pvt->detail_location),
- "APEI location: %s %s", e->location, e->other_detail);
+
trace_mc_event(type, e->msg, e->label, e->error_count,
mci->mc_idx, e->top_layer, e->mid_layer, e->low_layer,
(e->page_frame_number << PAGE_SHIFT) | e->offset_in_page,
- grain_bits, e->syndrome, pvt->detail_location);
+ grain_bits, e->syndrome, e->other_detail);

dimm_info = edac_get_dimm_by_index(mci, e->top_layer);

--
2.20.1

2019-05-29 08:46:25

by Robert Richter

[permalink] [raw]
Subject: [PATCH 09/21] EDAC, ghes: Use standard kernel macros for page calculations

Use standard macros for page calculations.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/edac/ghes_edac.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 4a13c394fa66..39702bac5eaf 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -313,8 +313,8 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)

/* Error address */
if (mem_err->validation_bits & CPER_MEM_VALID_PA) {
- e->page_frame_number = mem_err->physical_addr >> PAGE_SHIFT;
- e->offset_in_page = mem_err->physical_addr & ~PAGE_MASK;
+ e->page_frame_number = PHYS_PFN(mem_err->physical_addr);
+ e->offset_in_page = offset_in_page(mem_err->physical_addr);
}

/* Error grain */
--
2.20.1

2019-05-29 08:46:36

by Robert Richter

[permalink] [raw]
Subject: [PATCH 01/21] EDAC, mc: Fix edac_mc_find() in case no device is found

The function should return NULL in case no device is found, but it
always returns the last checked mc device from the list even if the
index did not match. This patch fixes this.

I did some analysis why this did not raise any issues for about 3
years and the reason is that edac_mc_find() is mostly used to search
for existing devices. Thus, the bug is not triggered.

Fixes: c73e8833bec5 ("EDAC, mc: Fix locking around mc_devices list")
Signed-off-by: Robert Richter <[email protected]>
---
drivers/edac/edac_mc.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 13594ffadcb3..64922c8fa7e3 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -679,22 +679,18 @@ static int del_mc_from_global_list(struct mem_ctl_info *mci)

struct mem_ctl_info *edac_mc_find(int idx)
{
- struct mem_ctl_info *mci = NULL;
+ struct mem_ctl_info *mci;
struct list_head *item;

mutex_lock(&mem_ctls_mutex);

list_for_each(item, &mc_devices) {
mci = list_entry(item, struct mem_ctl_info, link);
-
- if (mci->mc_idx >= idx) {
- if (mci->mc_idx == idx) {
- goto unlock;
- }
- break;
- }
+ if (mci->mc_idx == idx)
+ goto unlock;
}

+ mci = NULL;
unlock:
mutex_unlock(&mem_ctls_mutex);
return mci;
--
2.20.1

2019-05-29 08:46:42

by Robert Richter

[permalink] [raw]
Subject: [PATCH 20/21] EDAC, ghes: Enable per-layer reporting based on card/module

This patch enables per-layer reporting of the GHES driver based on
node, card and module. A dimm can be uniquely identified by those 3
identifiers. The mc device is selected by the node id. Thus, each ghes
edac memory controller device has a 2-dimensional layer hierarchy
based on card and module in the same way as most other driver have. An
error log looks as follows now:

[ 8902.592060] {4}[Hardware Error]: Error 6, type: corrected
[ 8902.597534] {4}[Hardware Error]: section_type: memory error
[ 8902.603267] {4}[Hardware Error]: error_status: 0x0000000000000400
[ 8902.609522] {4}[Hardware Error]: physical_address: 0x000000b3bb7d3000
[ 8902.616126] {4}[Hardware Error]: node: 1 card: 3 module: 0 rank: 1 bank: 771 column: 14 bit_position: 16
[ 8902.625854] {4}[Hardware Error]: DIMM location: N1 DIMM_L0
[ 8902.807783] EDAC MC1: 1 CE ghes_mc on N1 DIMM_L0 (card:3 module:0 page:0xb3bb7d3 offset:0x0 grain:0 syndrome:0x0 - APEI location: node:1 card:3 module:0 rank:1 bank:771 col:14 bit_pos:16 handle:0x0052 status(0x0000000000000400): Storage error in DRAM memory)

GHES error reports are now similar to edac_mc reports. This patch
moves common code of ghes and edac_mc to edac_raw_mc_handle_error().

Signed-off-by: Robert Richter <[email protected]>
---
drivers/edac/edac_mc.c | 45 ++++++++++++++----------
drivers/edac/ghes_edac.c | 76 ++++++++++++++++++----------------------
include/linux/edac.h | 2 ++
3 files changed, 63 insertions(+), 60 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index bdeb9fd08249..c159bb3c77e0 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -915,11 +915,13 @@ int edac_mc_find_csrow_by_page(struct mem_ctl_info *mci, unsigned long page)
EXPORT_SYMBOL_GPL(edac_mc_find_csrow_by_page);

const char *edac_layer_name[] = {
- [EDAC_MC_LAYER_BRANCH] = "branch",
- [EDAC_MC_LAYER_CHANNEL] = "channel",
- [EDAC_MC_LAYER_SLOT] = "slot",
- [EDAC_MC_LAYER_CHIP_SELECT] = "csrow",
- [EDAC_MC_LAYER_ALL_MEM] = "memory",
+ [EDAC_MC_LAYER_BRANCH] = "branch",
+ [EDAC_MC_LAYER_CHANNEL] = "channel",
+ [EDAC_MC_LAYER_SLOT] = "slot",
+ [EDAC_MC_LAYER_CHIP_SELECT] = "csrow",
+ [EDAC_MC_LAYER_ALL_MEM] = "memory",
+ [EDAC_MC_LAYER_CARD] = "card",
+ [EDAC_MC_LAYER_MODULE] = "module",
};
EXPORT_SYMBOL_GPL(edac_layer_name);

@@ -1046,7 +1048,26 @@ void edac_raw_mc_handle_error(const enum hw_event_mc_err_type type,
int row, int chan)
{
char detail[80];
+ int idx;
+ int pos[EDAC_MAX_LAYERS] = { e->top_layer, e->mid_layer,
+ e->low_layer };
u8 grain_bits;
+ char *p;
+
+ /* Fill the RAM location data */
+ p = e->location;
+
+ for (idx = 0; idx < mci->n_layers; idx++) {
+ if (pos[idx] < 0)
+ continue;
+
+ p += sprintf(p, "%s:%d ",
+ edac_layer_name[mci->layers[idx].type],
+ pos[idx]);
+ }
+
+ if (p > e->location)
+ *(p - 1) = '\0';

/* Report the error via the trace interface */
grain_bits = fls_long(e->grain) + 1;
@@ -1228,20 +1249,6 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
else if (!*e->label)
strcpy(e->label, "unknown memory");

- /* Fill the RAM location data */
- p = e->location;
-
- for (i = 0; i < mci->n_layers; i++) {
- if (pos[i] < 0)
- continue;
-
- p += sprintf(p, "%s:%d ",
- edac_layer_name[mci->layers[i].type],
- pos[i]);
- }
- if (p > e->location)
- *(p - 1) = '\0';
-
dimm = edac_get_dimm(mci, top_layer, mid_layer, low_layer);

edac_raw_mc_handle_error(type, mci, dimm, e, row, chan);
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 07c847ed7315..67e962159653 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -167,18 +167,6 @@ static void ghes_edac_set_nid(const struct dmi_header *dh, void *arg)
}
}

-static int get_dimm_smbios_index(struct mem_ctl_info *mci, u16 handle)
-{
- struct dimm_info *dimm;
-
- mci_for_each_dimm(mci, dimm) {
- if (dimm->smbios_handle == handle)
- return dimm->idx;
- }
-
- return -1;
-}
-
static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
{
if (dh->type == DMI_ENTRY_MEM_DEVICE) {
@@ -506,10 +494,12 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
pvt = mci->pvt_info;
e = &mci->error_desc;

+ edac_dbg(3, "MC%d\n", mci->mc_idx);
+
/* Cleans the error report buffer */
memset(e, 0, sizeof (*e));
+
e->error_count = 1;
- strcpy(e->label, "unknown label");
e->top_layer = -1;
e->mid_layer = -1;
e->low_layer = -1;
@@ -519,6 +509,25 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
*pvt->msg = '\0';
*pvt->other_detail = '\0';

+ if (dimm) {
+ /* The DIMM could be identified. */
+ e->top_layer = dimm->card;
+ e->mid_layer = dimm->module;
+ strcpy(e->label, dimm->dimm->label);
+ } else if (nid >= 0 || card >= 0 || module >= 0 || handle >= 0) {
+ /*
+ * We have at least some information and can do a
+ * per-layer reporting, but the exact location is
+ * unknown.
+ */
+ e->top_layer = card;
+ e->mid_layer = module;
+ strcpy(e->label, "unknown memory");
+ } else {
+ /* No error location at all. */
+ strcpy(e->label, "any memory");
+ }
+
switch (sev) {
case GHES_SEV_CORRECTED:
type = HW_EVENT_ERR_CORRECTED;
@@ -538,8 +547,10 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
(long long)mem_err->validation_bits);

/* Error type, mapped on e->msg */
+ p = pvt->msg;
+ p += sprintf(p, "%s", mci->ctl_name);
if (mem_err->validation_bits & CPER_MEM_VALID_ERROR_TYPE) {
- p = pvt->msg;
+ p += sprintf(p, ": ");
switch (mem_err->error_type) {
case 0:
p += sprintf(p, "Unknown");
@@ -593,8 +604,6 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
p += sprintf(p, "reserved error (%d)",
mem_err->error_type);
}
- } else {
- strcpy(pvt->msg, "unknown error");
}

/* Error address */
@@ -607,8 +616,9 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK)
e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK);

- /* Memory error location, mapped on e->location */
- p = e->location;
+ /* Memory error location, mapped on e->other_detail */
+ p = pvt->other_detail;
+ p += snprintf(p, sizeof(pvt->other_detail), "APEI location: ");
if (mem_err->validation_bits & CPER_MEM_VALID_NODE)
p += sprintf(p, "node:%d ", mem_err->node);
if (mem_err->validation_bits & CPER_MEM_VALID_CARD)
@@ -626,27 +636,8 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
if (mem_err->validation_bits & CPER_MEM_VALID_BIT_POSITION)
p += sprintf(p, "bit_pos:%d ", mem_err->bit_pos);
if (mem_err->validation_bits & CPER_MEM_VALID_MODULE_HANDLE) {
- const char *bank = NULL, *device = NULL;
- int index = -1;
-
- dmi_memdev_name(mem_err->mem_dev_handle, &bank, &device);
- if (bank != NULL && device != NULL)
- p += sprintf(p, "DIMM location:%s %s ", bank, device);
- else
- p += sprintf(p, "DIMM DMI handle: 0x%.4x ",
- mem_err->mem_dev_handle);
-
- index = get_dimm_smbios_index(mci, mem_err->mem_dev_handle);
- if (index >= 0)
- e->top_layer = index;
+ p += sprintf(p, "handle:0x%.4x ", handle);
}
- if (p > e->location)
- *(p - 1) = '\0';
-
- /* All other fields are mapped on e->other_detail */
- p = pvt->other_detail;
- p += snprintf(p, sizeof(pvt->other_detail),
- "APEI location: %s ", e->location);
if (mem_err->validation_bits & CPER_MEM_VALID_ERROR_STATUS) {
u64 status = mem_err->error_status;

@@ -754,11 +745,14 @@ ghes_edac_register_one(int nid, struct ghes *ghes, struct device *parent)
struct ghes_edac_pvt *ghes_pvt;
int rc;
struct mem_ctl_info *mci;
- struct edac_mc_layer layers[1];
+ struct edac_mc_layer layers[2];

- layers[0].type = EDAC_MC_LAYER_ALL_MEM;
+ layers[0].type = EDAC_MC_LAYER_CARD;
layers[0].size = 0;
- layers[0].is_virt_csrow = true;
+ layers[0].is_virt_csrow = false;
+ layers[1].type = EDAC_MC_LAYER_MODULE;
+ layers[1].size = 0;
+ layers[1].is_virt_csrow = false;

mci = edac_mc_alloc_by_dimm(nid, mem_info.num_per_node[nid],
ARRAY_SIZE(layers), layers,
diff --git a/include/linux/edac.h b/include/linux/edac.h
index 4dcf075e9dff..40e7da735e48 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -336,6 +336,8 @@ enum edac_mc_layer_type {
EDAC_MC_LAYER_SLOT,
EDAC_MC_LAYER_CHIP_SELECT,
EDAC_MC_LAYER_ALL_MEM,
+ EDAC_MC_LAYER_CARD, /* SMBIOS Type 16 Memory Array */
+ EDAC_MC_LAYER_MODULE, /* SMBIOS Type 17 Memory Device */
};

/**
--
2.20.1

2019-05-29 08:46:45

by Robert Richter

[permalink] [raw]
Subject: [PATCH 18/21] EDAC, mc: Introduce edac_mc_alloc_by_dimm() for per dimm allocation

Systems using ACPI GHES for error detection do not have exact
knowledge of the memory hierarchy. Compared to other memory controller
drivers the total size of each layer is unknown (card/module,
channel/slot, etc.). But there is the total number of dimms. So add a
function to allocate an mc device this way. The edac's driver uses
internally a dimm index already for data access.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/edac/edac_mc.c | 83 ++++++++++++++++++++++++++++------------
drivers/edac/edac_mc.h | 17 ++++++--
drivers/edac/ghes_edac.c | 7 ++--
3 files changed, 76 insertions(+), 31 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index f7e6a751f309..bdeb9fd08249 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -303,10 +303,11 @@ static void _edac_mc_free(struct mem_ctl_info *mci)
kfree(mci);
}

-struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
- unsigned n_layers,
- struct edac_mc_layer *layers,
- unsigned sz_pvt)
+struct mem_ctl_info *__edac_mc_alloc(unsigned mc_num,
+ unsigned dimm_num,
+ unsigned n_layers,
+ struct edac_mc_layer *layers,
+ unsigned sz_pvt)
{
struct mem_ctl_info *mci;
struct edac_mc_layer *layer;
@@ -321,6 +322,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
bool per_rank = false;

BUG_ON(n_layers > EDAC_MAX_LAYERS || n_layers == 0);
+
/*
* Calculate the total amount of dimms and csrows/cschannels while
* in the old API emulation mode
@@ -336,6 +338,26 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
per_rank = true;
}

+ /* allocate dimm_num DIMMS, layer size must be zero */
+ if (dimm_num) {
+ if (dimm_num <= 0 ||
+ layers[0].size ||
+ (n_layers > 1 && layers[1].size) ||
+ (n_layers > 2 && layers[2].size)) {
+ edac_printk(KERN_WARNING, EDAC_MC,
+ "invalid layer data\n");
+ return NULL;
+ }
+
+ /*
+ * Assume 1 csrow per dimm which also means 1 channel
+ * per csrow.
+ */
+ tot_dimms = dimm_num;
+ tot_csrows = dimm_num;
+ tot_channels = 1;
+ }
+
/* Figure out the offsets of the various items from the start of an mc
* structure. We want the alignment of each item to be at least as
* stringent as what the compiler would provide if we could simply
@@ -422,25 +444,10 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
dimm->mci = mci;
dimm->idx = idx;

- /*
- * Copy DIMM location and initialize it.
- */
- len = sizeof(dimm->label);
- p = dimm->label;
- n = snprintf(p, len, "mc#%u", mc_num);
- p += n;
- len -= n;
- for (j = 0; j < n_layers; j++) {
- n = snprintf(p, len, "%s#%u",
- edac_layer_name[layers[j].type],
- pos[j]);
- p += n;
- len -= n;
- dimm->location[j] = pos[j];
-
- if (len <= 0)
- break;
- }
+ /* unknown location */
+ dimm->location[0] = -1;
+ dimm->location[1] = -1;
+ dimm->location[2] = -1;

/* Link it to the csrows old API data */
chan->dimm = dimm;
@@ -462,6 +469,34 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
}
}

+ /*
+ * Copy DIMM location and initialize it.
+ */
+ len = sizeof(dimm->label);
+ p = dimm->label;
+ n = snprintf(p, len, "mc#%u", mc_num);
+ p += n;
+ len -= n;
+
+ if (dimm_num) {
+ n = snprintf(p, len, "dimm#%u", idx);
+ p += n;
+ len -= n;
+ continue;
+ }
+
+ for (j = 0; j < n_layers; j++) {
+ n = snprintf(p, len, "%s#%u",
+ edac_layer_name[layers[j].type],
+ pos[j]);
+ p += n;
+ len -= n;
+ dimm->location[j] = pos[j];
+
+ if (len <= 0)
+ break;
+ }
+
/* Increment dimm location */
for (j = n_layers - 1; j >= 0; j--) {
pos[j]++;
@@ -480,7 +515,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,

return NULL;
}
-EXPORT_SYMBOL_GPL(edac_mc_alloc);
+EXPORT_SYMBOL_GPL(__edac_mc_alloc);

void edac_mc_free(struct mem_ctl_info *mci)
{
diff --git a/drivers/edac/edac_mc.h b/drivers/edac/edac_mc.h
index c4ddd5c1e24c..e8215847f853 100644
--- a/drivers/edac/edac_mc.h
+++ b/drivers/edac/edac_mc.h
@@ -99,6 +99,10 @@ do { \
* edac_mc_alloc() - Allocate and partially fill a struct &mem_ctl_info.
*
* @mc_num: Memory controller number
+ * @dimm_num: Number of DIMMs to allocate. If non-zero the
+ * @layers' size parameter must be zero. Useful
+ * if the MC hierarchy is unknown but the number
+ * of DIMMs is known.
* @n_layers: Number of MC hierarchy layers
* @layers: Describes each layer as seen by the Memory Controller
* @sz_pvt: size of private storage needed
@@ -122,10 +126,15 @@ do { \
* On success, return a pointer to struct mem_ctl_info pointer;
* %NULL otherwise
*/
-struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
- unsigned n_layers,
- struct edac_mc_layer *layers,
- unsigned sz_pvt);
+struct mem_ctl_info *__edac_mc_alloc(unsigned mc_num,
+ unsigned dimm_num,
+ unsigned n_layers,
+ struct edac_mc_layer *layers,
+ unsigned sz_pvt);
+#define edac_mc_alloc(mc_num, n_layers, layers, sz_pvt) \
+ __edac_mc_alloc(mc_num, 0, n_layers, layers, sz_pvt)
+#define edac_mc_alloc_by_dimm(mc_num, dimm_num, n_layers, layers, sz_pvt) \
+ __edac_mc_alloc(mc_num, dimm_num, n_layers, layers, sz_pvt)

/**
* edac_get_owner - Return the owner's mod_name of EDAC MC
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index b8878ff498d1..4bac643d3404 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -707,11 +707,12 @@ ghes_edac_register_one(int nid, struct ghes *ghes, struct device *parent)
struct edac_mc_layer layers[1];

layers[0].type = EDAC_MC_LAYER_ALL_MEM;
- layers[0].size = mem_info.num_per_node[nid];
+ layers[0].size = 0;
layers[0].is_virt_csrow = true;

- mci = edac_mc_alloc(nid, ARRAY_SIZE(layers), layers,
- sizeof(struct ghes_edac_pvt));
+ mci = edac_mc_alloc_by_dimm(nid, mem_info.num_per_node[nid],
+ ARRAY_SIZE(layers), layers,
+ sizeof(struct ghes_edac_pvt));
if (!mci) {
pr_err("Can't allocate memory for EDAC data\n");
return -ENOMEM;
--
2.20.1

2019-05-29 08:46:57

by Robert Richter

[permalink] [raw]
Subject: [PATCH 19/21] EDAC, ghes: Identify dimm by node, card, module and handle

According to SMBIOS Spec. 2.7 (N.2.5 Memory Error Section), a failing
DIMM (module or rank number) can be identified by its error location
consisting of node, card and module. A module handle is used to map it
to the dimms listed in the dmi table. Collect all those data from the
error record and select the dimm accordingly. Inconsistent error
records will be reported which is the case if the same dimm handle
reports errors with different node, card or module.

The change allows to enable per-layer reporting based on node, card
and module in the next patch.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/edac/ghes_edac.c | 74 +++++++++++++++++++++++++++++++++-------
1 file changed, 62 insertions(+), 12 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 4bac643d3404..07c847ed7315 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -83,8 +83,11 @@ struct memarr_dmi_entry {

struct ghes_dimm_info {
struct dimm_info dimm_info;
+ struct dimm_info *dimm;
int idx;
int numa_node;
+ int card;
+ int module;
phys_addr_t start;
phys_addr_t end;
u16 phys_handle;
@@ -119,6 +122,8 @@ static void ghes_dimm_info_init(void)
for_each_dimm(dimm) {
dimm->idx = idx;
dimm->numa_node = NUMA_NO_NODE;
+ dimm->card = -1;
+ dimm->module = -1;
idx++;
}
}
@@ -401,6 +406,13 @@ static void mci_add_dimm_info(struct mem_ctl_info *mci)

if (*dmi_dimm->label)
strcpy(mci_dimm->label, dmi_dimm->label);
+
+ /*
+ * From here on do not use any longer &dimm.dimm_info.
+ * Instead switch to the mci's dimm info which might
+ * contain updated data, such as the label.
+ */
+ dimm->dimm = mci_dimm;
}

if (index != mci->tot_dimms)
@@ -408,24 +420,46 @@ static void mci_add_dimm_info(struct mem_ctl_info *mci)
index, mci->tot_dimms);
}

-static struct mem_ctl_info *get_mc_by_node(int nid)
+/* Requires ghes_lock being set. */
+static struct ghes_dimm_info *
+get_and_prepare_dimm_info(int nid, int card, int module, int handle)
{
- struct mem_ctl_info *mci = edac_mc_find(nid);
+ static struct ghes_dimm_info *dimm;
+ struct dimm_info *di;

- if (mci)
- return mci;
+ /*
+ * We require smbios_handle being set in the error report for
+ * per layer reporting (SMBIOS handle for the Type 17 Memory
+ * Device Structure that represents the Memory Module)
+ */
+ for_each_dimm(dimm) {
+ di = dimm->dimm;
+ if (di->smbios_handle == handle)
+ goto found;
+ }

- if (num_possible_nodes() > 1) {
- edac_mc_printk(fallback, KERN_WARNING,
- "Invalid or no node information, falling back to first node: %s",
- fallback->dev_name);
+ return NULL;
+found:
+ if (dimm->card < 0 && card >= 0)
+ dimm->card = card;
+ if (dimm->module < 0 && module >= 0)
+ dimm->module = module;
+
+ if ((num_possible_nodes() > 1 && di->mci->mc_idx != nid) ||
+ (card >= 0 && card != dimm->card) ||
+ (module >= 0 && module != dimm->module)) {
+ edac_mc_printk(di->mci, KERN_WARNING,
+ "Inconsistent error report (nid/card/module): %d/%d/%d (dimm%d: %d/%d/%d)",
+ nid, card, module, di->idx,
+ di->mci->mc_idx, dimm->card, dimm->module);
}

- return fallback;
+ return dimm;
}

void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
{
+ struct ghes_dimm_info *dimm;
struct dimm_info *dimm_info;
enum hw_event_mc_err_type type;
struct edac_raw_error_desc *e;
@@ -434,6 +468,9 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
unsigned long flags;
char *p;
int nid = NUMA_NO_NODE;
+ int card = -1;
+ int module = -1;
+ int handle = -1;

/* We need at least one mc */
if (WARN_ON_ONCE(!fallback))
@@ -449,10 +486,23 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)

spin_lock_irqsave(&ghes_lock, flags);

- /* select the node's mc device */
if (mem_err->validation_bits & CPER_MEM_VALID_NODE)
nid = mem_err->node;
- mci = get_mc_by_node(nid);
+ if (mem_err->validation_bits & CPER_MEM_VALID_CARD)
+ card = mem_err->card;
+ if (mem_err->validation_bits & CPER_MEM_VALID_MODULE)
+ module = mem_err->module;
+ if (mem_err->validation_bits & CPER_MEM_VALID_MODULE_HANDLE)
+ handle = mem_err->mem_dev_handle;
+
+ dimm = get_and_prepare_dimm_info(nid, card, module, handle);
+ if (dimm)
+ mci = dimm->dimm->mci;
+ else
+ mci = edac_mc_find(nid);
+ if (!mci)
+ mci = fallback;
+
pvt = mci->pvt_info;
e = &mci->error_desc;

@@ -670,7 +720,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
if (p > pvt->other_detail)
*(p - 1) = '\0';

- dimm_info = edac_get_dimm_by_index(mci, e->top_layer);
+ dimm_info = dimm ? dimm->dimm : NULL;

edac_raw_mc_handle_error(type, mci, dimm_info, e, -1, -1);

--
2.20.1

2019-05-29 08:46:58

by Robert Richter

[permalink] [raw]
Subject: [PATCH 13/21] EDAC, ghes: Rework memory hierarchy detection

In a later patch we want add more information about the memory
hierarchy (NUMA topology, DIMM label information). Rework memory
hierarchy detection to make the code extendable for this.

The general approach is roughly like:

mem_info_setup();
for_each_node(nid) {
mci = edac_mc_alloc(nid);
mci_add_dimm_info(mci);
edac_mc_add_mc(mci);
};

This patch introduces mem_info_setup() and mci_add_dimm_info().

All data of the memory hierarchy is collected in a local struct
ghes_mem_info.

Note: Per (NUMA) node registration will be implemented in a later
patch.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/edac/ghes_edac.c | 166 +++++++++++++++++++++++++++++----------
1 file changed, 126 insertions(+), 40 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index ea4d53043199..50f4ee36b755 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -67,17 +67,38 @@ struct memdev_dmi_entry {
u16 conf_mem_clk_speed;
} __attribute__((__packed__));

-struct ghes_edac_dimm_fill {
- struct mem_ctl_info *mci;
- unsigned count;
+struct ghes_dimm_info {
+ struct dimm_info dimm_info;
+ int idx;
+};
+
+struct ghes_mem_info {
+ int num_dimm;
+ struct ghes_dimm_info *dimms;
};

+struct ghes_mem_info mem_info;
+
+#define for_each_dimm(dimm) \
+ for (dimm = mem_info.dimms; \
+ dimm < mem_info.dimms + mem_info.num_dimm; \
+ dimm++)
+
static void ghes_edac_count_dimms(const struct dmi_header *dh, void *arg)
{
- int *num_dimm = arg;
-
if (dh->type == DMI_ENTRY_MEM_DEVICE)
- (*num_dimm)++;
+ mem_info.num_dimm++;
+}
+
+static void ghes_dimm_info_init(void)
+{
+ struct ghes_dimm_info *dimm;
+ int idx = 0;
+
+ for_each_dimm(dimm) {
+ dimm->idx = idx;
+ idx++;
+ }
}

static int get_dimm_smbios_index(u16 handle)
@@ -94,18 +115,17 @@ static int get_dimm_smbios_index(u16 handle)

static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
{
- struct ghes_edac_dimm_fill *dimm_fill = arg;
- struct mem_ctl_info *mci = dimm_fill->mci;
-
if (dh->type == DMI_ENTRY_MEM_DEVICE) {
+ int *idx = arg;
struct memdev_dmi_entry *entry = (struct memdev_dmi_entry *)dh;
- struct dimm_info *dimm = edac_get_dimm(mci, dimm_fill->count,
- 0, 0);
+ struct ghes_dimm_info *mi = &mem_info.dimms[*idx];
+ struct dimm_info *dimm = &mi->dimm_info;
u16 rdr_mask = BIT(7) | BIT(13);

+ mi->phys_handle = entry->phys_mem_array_handle;
+
if (entry->size == 0xffff) {
- pr_info("Can't get DIMM%i size\n",
- dimm_fill->count);
+ pr_info("Can't get DIMM%i size\n", mi->idx);
dimm->nr_pages = MiB_TO_PAGES(32);/* Unknown */
} else if (entry->size == 0x7fff) {
dimm->nr_pages = MiB_TO_PAGES(entry->extended_size);
@@ -179,7 +199,7 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)

if (dimm->nr_pages) {
edac_dbg(1, "DIMM%i: %s size = %d MB%s\n",
- dimm_fill->count, edac_mem_types[dimm->mtype],
+ mi->idx, edac_mem_types[dimm->mtype],
PAGES_TO_MiB(dimm->nr_pages),
(dimm->edac_mode != EDAC_NONE) ? "(ECC)" : "");
edac_dbg(2, "\ttype %d, detail 0x%02x, width %d(total %d)\n",
@@ -189,8 +209,83 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)

dimm->smbios_handle = entry->handle;

- dimm_fill->count++;
+ (*idx)++;
+ }
+}
+
+static int mem_info_setup(void)
+{
+ int idx = 0;
+
+ memset(&mem_info, 0, sizeof(mem_info));
+
+ /* Get the number of DIMMs */
+ dmi_walk(ghes_edac_count_dimms, NULL);
+ if (!mem_info.num_dimm)
+ return -EINVAL;
+
+ mem_info.dimms = kcalloc(mem_info.num_dimm,
+ sizeof(*mem_info.dimms), GFP_KERNEL);
+ if (!mem_info.dimms)
+ return -ENOMEM;
+
+ ghes_dimm_info_init();
+ dmi_walk(ghes_edac_dmidecode, &idx);
+
+ return 0;
+}
+
+static int mem_info_setup_fake(void)
+{
+ struct ghes_dimm_info *ghes_dimm;
+ struct dimm_info *dimm;
+
+ memset(&mem_info, 0, sizeof(mem_info));
+
+ ghes_dimm = kzalloc(sizeof(*mem_info.dimms), GFP_KERNEL);
+ if (!ghes_dimm)
+ return -ENOMEM;
+
+ mem_info.num_dimm = 1;
+ mem_info.dimms = ghes_dimm;
+
+ ghes_dimm_info_init();
+
+ dimm = &ghes_dimm->dimm_info;
+ dimm->nr_pages = 1;
+ dimm->grain = 128;
+ dimm->mtype = MEM_UNKNOWN;
+ dimm->dtype = DEV_UNKNOWN;
+ dimm->edac_mode = EDAC_SECDED;
+
+ return 0;
+}
+
+static void mci_add_dimm_info(struct mem_ctl_info *mci)
+{
+ struct dimm_info *mci_dimm, *dmi_dimm;
+ struct ghes_dimm_info *dimm;
+ int index = 0;
+
+ for_each_dimm(dimm) {
+ dmi_dimm = &dimm->dimm_info;
+ mci_dimm = edac_get_dimm_by_index(mci, index);
+
+ index++;
+ if (index > mci->tot_dimms)
+ break;
+
+ mci_dimm->nr_pages = dmi_dimm->nr_pages;
+ mci_dimm->mtype = dmi_dimm->mtype;
+ mci_dimm->edac_mode = dmi_dimm->edac_mode;
+ mci_dimm->dtype = dmi_dimm->dtype;
+ mci_dimm->grain = dmi_dimm->grain;
+ mci_dimm->smbios_handle = dmi_dimm->smbios_handle;
}
+
+ if (index != mci->tot_dimms)
+ pr_warn("Unexpected number of DIMMs: %d (exp. %d)\n",
+ index, mci->tot_dimms);
}

void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
@@ -451,10 +546,9 @@ static struct acpi_platform_list plat_list[] = {
int ghes_edac_register(struct ghes *ghes, struct device *dev)
{
bool fake = false;
- int rc, num_dimm = 0;
+ int rc;
struct mem_ctl_info *mci;
struct edac_mc_layer layers[1];
- struct ghes_edac_dimm_fill dimm_fill;
int idx = -1;

if (IS_ENABLED(CONFIG_X86)) {
@@ -472,22 +566,24 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
if (atomic_inc_return(&ghes_init) > 1)
return 0;

- /* Get the number of DIMMs */
- dmi_walk(ghes_edac_count_dimms, &num_dimm);
-
- /* Check if we've got a bogus BIOS */
- if (num_dimm == 0) {
+ rc = mem_info_setup();
+ if (rc == -EINVAL) {
+ /* we've got a bogus BIOS */
fake = true;
- num_dimm = 1;
+ rc = mem_info_setup_fake();
+ }
+ if (rc < 0) {
+ pr_err("Can't allocate memory for DIMM data\n");
+ return rc;
}

layers[0].type = EDAC_MC_LAYER_ALL_MEM;
- layers[0].size = num_dimm;
+ layers[0].size = mem_info.num_dimm;
layers[0].is_virt_csrow = true;

mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, sizeof(struct ghes_edac_pvt));
if (!mci) {
- pr_info("Can't allocate memory for EDAC data\n");
+ pr_err("Can't allocate memory for EDAC data\n");
return -ENOMEM;
}

@@ -513,26 +609,14 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
pr_info("If you find incorrect reports, please contact your hardware vendor\n");
pr_info("to correct its BIOS.\n");
- pr_info("This system has %d DIMM sockets.\n", num_dimm);
+ pr_info("This system has %d DIMM sockets.\n", mem_info.num_dimm);
}

- if (!fake) {
- dimm_fill.count = 0;
- dimm_fill.mci = mci;
- dmi_walk(ghes_edac_dmidecode, &dimm_fill);
- } else {
- struct dimm_info *dimm = edac_get_dimm(mci, 0, 0, 0);
-
- dimm->nr_pages = 1;
- dimm->grain = 128;
- dimm->mtype = MEM_UNKNOWN;
- dimm->dtype = DEV_UNKNOWN;
- dimm->edac_mode = EDAC_SECDED;
- }
+ mci_add_dimm_info(mci);

rc = edac_mc_add_mc(mci);
if (rc < 0) {
- pr_info("Can't register at EDAC core\n");
+ pr_err("Can't register at EDAC core\n");
edac_mc_free(mci);
return -ENODEV;
}
@@ -549,4 +633,6 @@ void ghes_edac_unregister(struct ghes *ghes)
mci = ghes_pvt->mci;
edac_mc_del_mc(mci->pdev);
edac_mc_free(mci);
+
+ kfree(mem_info.dimms);
}
--
2.20.1

2019-05-29 08:47:02

by Robert Richter

[permalink] [raw]
Subject: [PATCH 16/21] EDAC, ghes: Create one memory controller device per node

Typically for most systems, there is one edac memory controller device
per node. This patch implements the same for the ghes driver. Now,
create multiple mc devices and map the dimms based on the node id.

We need at least one node that is used as fallback if no node
information is available in the error report.

Here a complete and consistent error report from a ThunderX2 system
(zero counter values dropped):

# find /sys/devices/system/edac/mc/ -name \*count | sort -V | xargs grep . | sed -e '/:0/d'
/sys/devices/system/edac/mc/mc0/ce_count:11
/sys/devices/system/edac/mc/mc0/ce_noinfo_count:1
/sys/devices/system/edac/mc/mc0/csrow2/ce_count:5
/sys/devices/system/edac/mc/mc0/csrow2/ch0_ce_count:5
/sys/devices/system/edac/mc/mc0/csrow3/ce_count:3
/sys/devices/system/edac/mc/mc0/csrow3/ch0_ce_count:3
/sys/devices/system/edac/mc/mc0/csrow4/ce_count:2
/sys/devices/system/edac/mc/mc0/csrow4/ch0_ce_count:2
/sys/devices/system/edac/mc/mc0/dimm2/dimm_ce_count:5
/sys/devices/system/edac/mc/mc0/dimm3/dimm_ce_count:3
/sys/devices/system/edac/mc/mc0/dimm4/dimm_ce_count:2
/sys/devices/system/edac/mc/mc1/ce_count:7
/sys/devices/system/edac/mc/mc1/csrow2/ce_count:4
/sys/devices/system/edac/mc/mc1/csrow2/ch0_ce_count:4
/sys/devices/system/edac/mc/mc1/csrow3/ce_count:1
/sys/devices/system/edac/mc/mc1/csrow3/ch0_ce_count:1
/sys/devices/system/edac/mc/mc1/csrow6/ce_count:2
/sys/devices/system/edac/mc/mc1/csrow6/ch0_ce_count:2
/sys/devices/system/edac/mc/mc1/dimm2/dimm_ce_count:4
/sys/devices/system/edac/mc/mc1/dimm3/dimm_ce_count:1
/sys/devices/system/edac/mc/mc1/dimm6/dimm_ce_count:2

Signed-off-by: Robert Richter <[email protected]>
---
drivers/edac/ghes_edac.c | 126 ++++++++++++++++++++++++++++++++-------
1 file changed, 104 insertions(+), 22 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index c39cdfdfb8db..e5fa977bcfd9 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -18,6 +18,7 @@
#include <ras/ras_event.h>

struct ghes_edac_pvt {
+ struct device dev;
struct list_head list;
struct ghes *ghes;
struct mem_ctl_info *mci;
@@ -28,7 +29,7 @@ struct ghes_edac_pvt {
};

static atomic_t ghes_init = ATOMIC_INIT(0);
-static struct ghes_edac_pvt *ghes_pvt;
+struct mem_ctl_info *fallback;

/*
* Sync with other, potentially concurrent callers of
@@ -161,15 +162,15 @@ static void ghes_edac_set_nid(const struct dmi_header *dh, void *arg)
}
}

-static int get_dimm_smbios_index(u16 handle)
+static int get_dimm_smbios_index(struct mem_ctl_info *mci, u16 handle)
{
- struct mem_ctl_info *mci = ghes_pvt->mci;
struct dimm_info *dimm;

mci_for_each_dimm(mci, dimm) {
if (dimm->smbios_handle == handle)
return dimm->idx;
}
+
return -1;
}

@@ -370,6 +371,9 @@ static void mci_add_dimm_info(struct mem_ctl_info *mci)
int index = 0;

for_each_dimm(dimm) {
+ if (mci->mc_idx != dimm->numa_node)
+ continue;
+
dmi_dimm = &dimm->dimm_info;
mci_dimm = edac_get_dimm_by_index(mci, index);

@@ -390,17 +394,35 @@ static void mci_add_dimm_info(struct mem_ctl_info *mci)
index, mci->tot_dimms);
}

+static struct mem_ctl_info *get_mc_by_node(int nid)
+{
+ struct mem_ctl_info *mci = edac_mc_find(nid);
+
+ if (mci)
+ return mci;
+
+ if (num_possible_nodes() > 1) {
+ edac_mc_printk(fallback, KERN_WARNING,
+ "Invalid or no node information, falling back to first node: %s",
+ fallback->dev_name);
+ }
+
+ return fallback;
+}
+
void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
{
struct dimm_info *dimm_info;
enum hw_event_mc_err_type type;
struct edac_raw_error_desc *e;
struct mem_ctl_info *mci;
- struct ghes_edac_pvt *pvt = ghes_pvt;
+ struct ghes_edac_pvt *pvt;
unsigned long flags;
char *p;
+ int nid = NUMA_NO_NODE;

- if (!pvt)
+ /* We need at least one mc */
+ if (WARN_ON_ONCE(!fallback))
return;

/*
@@ -413,7 +435,11 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)

spin_lock_irqsave(&ghes_lock, flags);

- mci = pvt->mci;
+ /* select the node's mc device */
+ if (mem_err->validation_bits & CPER_MEM_VALID_NODE)
+ nid = mem_err->node;
+ mci = get_mc_by_node(nid);
+ pvt = mci->pvt_info;
e = &mci->error_desc;

/* Cleans the error report buffer */
@@ -546,7 +572,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
p += sprintf(p, "DIMM DMI handle: 0x%.4x ",
mem_err->mem_dev_handle);

- index = get_dimm_smbios_index(mem_err->mem_dev_handle);
+ index = get_dimm_smbios_index(mci, mem_err->mem_dev_handle);
if (index >= 0)
e->top_layer = index;
}
@@ -645,15 +671,29 @@ static struct acpi_platform_list plat_list[] = {
{ } /* End */
};

+void ghes_edac_release(struct device *dev)
+{
+ struct ghes_edac_pvt *ghes_pvt;
+ struct mem_ctl_info *mci;
+
+ ghes_pvt = container_of(dev, struct ghes_edac_pvt, dev);
+
+ mci = ghes_pvt->mci;
+ edac_mc_del_mc(mci->pdev);
+ edac_mc_free(mci);
+}
+
static int
ghes_edac_register_one(int nid, struct ghes *ghes, struct device *parent)
{
+ struct device *dev;
+ struct ghes_edac_pvt *ghes_pvt;
int rc;
struct mem_ctl_info *mci;
struct edac_mc_layer layers[1];

layers[0].type = EDAC_MC_LAYER_ALL_MEM;
- layers[0].size = mem_info.num_dimm;
+ layers[0].size = mem_info.num_per_node[nid];
layers[0].is_virt_csrow = true;

mci = edac_mc_alloc(nid, ARRAY_SIZE(layers), layers,
@@ -667,43 +707,69 @@ ghes_edac_register_one(int nid, struct ghes *ghes, struct device *parent)
ghes_pvt->ghes = ghes;
ghes_pvt->mci = mci;

- mci->pdev = parent;
+ dev = &ghes_pvt->dev;
+ dev->parent = parent;
+ dev->release = ghes_edac_release;
+ dev_set_name(dev, "ghes_mc%d", nid);
+
+ rc = device_register(dev);
+ if (rc) {
+ pr_err("Can't create EDAC device (%d)\n", rc);
+ goto fail;
+ }
+
+ mci->pdev = dev;
mci->mtype_cap = MEM_FLAG_EMPTY;
mci->edac_ctl_cap = EDAC_FLAG_NONE;
mci->edac_cap = EDAC_FLAG_NONE;
mci->mod_name = "ghes_edac.c";
- mci->ctl_name = "ghes_edac";
- mci->dev_name = "ghes";
+ mci->ctl_name = "ghes_mc";
+ mci->dev_name = dev_name(dev);

mci_add_dimm_info(mci);

rc = edac_mc_add_mc(mci);
if (rc < 0) {
- pr_err("Can't register at EDAC core\n");
- edac_mc_free(mci);
- return -ENODEV;
+ pr_err("Can't register at EDAC core (%d)\n", rc);
+ goto fail;
}
+
return 0;
+fail:
+ put_device(dev);
+ return rc;
+}
+
+static void ghes_edac_unregister_one(struct mem_ctl_info *mci)
+{
+ struct ghes_edac_pvt *pvt = mci->pvt_info;
+
+ put_device(&pvt->dev);
}

void ghes_edac_unregister(struct ghes *ghes)
{
struct mem_ctl_info *mci;
+ int nid;

- if (!ghes_pvt)
- return;
-
- mci = ghes_pvt->mci;
- edac_mc_del_mc(mci->pdev);
- edac_mc_free(mci);
+ for_each_node(nid) {
+ mci = edac_mc_find(nid);
+ /* stop fallback at last */
+ if (mci && mci != fallback)
+ ghes_edac_unregister_one(mci);
+ }

+ ghes_edac_unregister_one(fallback);
+ fallback = NULL;
kfree(mem_info.dimms);
+ atomic_dec(&ghes_init);
}

int ghes_edac_register(struct ghes *ghes, struct device *dev)
{
bool fake = false;
int rc;
+ int nid;
int idx = -1;

if (IS_ENABLED(CONFIG_X86)) {
@@ -743,7 +809,23 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
pr_info("This system has %d DIMM sockets.\n", mem_info.num_dimm);
}

- rc = ghes_edac_register_one(0, ghes, dev);
+ for_each_node(nid) {
+ if (!mem_info.num_per_node[nid])
+ continue;

- return rc;
+ rc = ghes_edac_register_one(nid, ghes, dev);
+ if (rc) {
+ ghes_edac_unregister(ghes);
+ return rc;
+ }
+
+ /*
+ * use the first node's mc as fallback in case we can
+ * not detect the node from the error information
+ */
+ if (!fallback)
+ fallback = edac_mc_find(nid);
+ }
+
+ return 0;
}
--
2.20.1

2019-05-29 08:47:06

by Robert Richter

[permalink] [raw]
Subject: [PATCH 12/21] EDAC, ghes: Add support for legacy API counters

The ghes driver is not able yet to count legacy API counters in sysfs,
e.g.:

/sys/devices/system/edac/mc/mc0/csrow2/ce_count
/sys/devices/system/edac/mc/mc0/csrow2/ch0_ce_count
/sys/devices/system/edac/mc/mc0/csrow2/ch1_ce_count

Make counting csrows/channels generic so that the ghes driver can use
it too.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/edac/edac_mc.c | 39 ++++++++++++++++++++++-----------------
drivers/edac/edac_mc.h | 7 ++++++-
drivers/edac/ghes_edac.c | 2 +-
3 files changed, 29 insertions(+), 19 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 8613a31dc86c..f7e6a751f309 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -1007,7 +1007,8 @@ static void edac_ue_error(struct mem_ctl_info *mci,
void edac_raw_mc_handle_error(const enum hw_event_mc_err_type type,
struct mem_ctl_info *mci,
struct dimm_info *dimm,
- struct edac_raw_error_desc *e)
+ struct edac_raw_error_desc *e,
+ int row, int chan)
{
char detail[80];
u8 grain_bits;
@@ -1040,7 +1041,23 @@ void edac_raw_mc_handle_error(const enum hw_event_mc_err_type type,
e->label, detail, e->other_detail);
}

+ /* old API's counters */
+ if (dimm) {
+ row = dimm->csrow;
+ chan = dimm->cschannel;
+ }
+
+ if (row >= 0) {
+ if (type == HW_EVENT_ERR_CORRECTED) {
+ mci->csrows[row]->ce_count += e->error_count;
+ if (chan >= 0)
+ mci->csrows[row]->channels[chan]->ce_count += e->error_count;
+ } else {
+ mci->csrows[row]->ue_count += e->error_count;
+ }
+ }

+ edac_dbg(4, "csrow/channel to increment: (%d,%d)\n", row, chan);
}
EXPORT_SYMBOL_GPL(edac_raw_mc_handle_error);

@@ -1171,22 +1188,10 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
}
}

- if (!per_layer_report) {
+ if (!per_layer_report)
strcpy(e->label, "any memory");
- } else {
- edac_dbg(4, "csrow/channel to increment: (%d,%d)\n", row, chan);
- if (p == e->label)
- strcpy(e->label, "unknown memory");
- if (type == HW_EVENT_ERR_CORRECTED) {
- if (row >= 0) {
- mci->csrows[row]->ce_count += error_count;
- if (chan >= 0)
- mci->csrows[row]->channels[chan]->ce_count += error_count;
- }
- } else
- if (row >= 0)
- mci->csrows[row]->ue_count += error_count;
- }
+ else if (!*e->label)
+ strcpy(e->label, "unknown memory");

/* Fill the RAM location data */
p = e->location;
@@ -1204,6 +1209,6 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,

dimm = edac_get_dimm(mci, top_layer, mid_layer, low_layer);

- edac_raw_mc_handle_error(type, mci, dimm, e);
+ edac_raw_mc_handle_error(type, mci, dimm, e, row, chan);
}
EXPORT_SYMBOL_GPL(edac_mc_handle_error);
diff --git a/drivers/edac/edac_mc.h b/drivers/edac/edac_mc.h
index b816cf3caaee..c4ddd5c1e24c 100644
--- a/drivers/edac/edac_mc.h
+++ b/drivers/edac/edac_mc.h
@@ -216,6 +216,10 @@ extern int edac_mc_find_csrow_by_page(struct mem_ctl_info *mci,
* @mci: a struct mem_ctl_info pointer
* @dimm: a struct dimm_info pointer
* @e: error description
+ * @row: csrow hint if there is no dimm info (<0 if
+ * unknown)
+ * @chan: cschannel hint if there is no dimm info (<0 if
+ * unknown)
*
* This raw function is used internally by edac_mc_handle_error(). It should
* only be called directly when the hardware error come directly from BIOS,
@@ -224,7 +228,8 @@ extern int edac_mc_find_csrow_by_page(struct mem_ctl_info *mci,
void edac_raw_mc_handle_error(const enum hw_event_mc_err_type type,
struct mem_ctl_info *mci,
struct dimm_info *dimm,
- struct edac_raw_error_desc *e);
+ struct edac_raw_error_desc *e,
+ int row, int chan);

/**
* edac_mc_handle_error() - Reports a memory event to userspace.
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index f6ea4b070bfe..ea4d53043199 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -435,7 +435,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)

dimm_info = edac_get_dimm_by_index(mci, e->top_layer);

- edac_raw_mc_handle_error(type, mci, dimm_info, e);
+ edac_raw_mc_handle_error(type, mci, dimm_info, e, -1, -1);

spin_unlock_irqrestore(&ghes_lock, flags);
}
--
2.20.1

2019-05-29 08:47:15

by Robert Richter

[permalink] [raw]
Subject: [PATCH 15/21] EDAC, ghes: Moving code around ghes_edac_register()

This is in preparation of the next patch to make the changes there
more visible.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/edac/ghes_edac.c | 97 ++++++++++++++++++++++------------------
1 file changed, 53 insertions(+), 44 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 083452a48b42..c39cdfdfb8db 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -645,45 +645,19 @@ static struct acpi_platform_list plat_list[] = {
{ } /* End */
};

-int ghes_edac_register(struct ghes *ghes, struct device *dev)
+static int
+ghes_edac_register_one(int nid, struct ghes *ghes, struct device *parent)
{
- bool fake = false;
int rc;
struct mem_ctl_info *mci;
struct edac_mc_layer layers[1];
- int idx = -1;
-
- if (IS_ENABLED(CONFIG_X86)) {
- /* Check if safe to enable on this system */
- idx = acpi_match_platform_list(plat_list);
- if (!force_load && idx < 0)
- return -ENODEV;
- } else {
- idx = 0;
- }
-
- /*
- * We have only one logical memory controller to which all DIMMs belong.
- */
- if (atomic_inc_return(&ghes_init) > 1)
- return 0;
-
- rc = mem_info_setup();
- if (rc == -EINVAL) {
- /* we've got a bogus BIOS */
- fake = true;
- rc = mem_info_setup_fake();
- }
- if (rc < 0) {
- pr_err("Can't allocate memory for DIMM data\n");
- return rc;
- }

layers[0].type = EDAC_MC_LAYER_ALL_MEM;
layers[0].size = mem_info.num_dimm;
layers[0].is_virt_csrow = true;

- mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, sizeof(struct ghes_edac_pvt));
+ mci = edac_mc_alloc(nid, ARRAY_SIZE(layers), layers,
+ sizeof(struct ghes_edac_pvt));
if (!mci) {
pr_err("Can't allocate memory for EDAC data\n");
return -ENOMEM;
@@ -693,7 +667,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
ghes_pvt->ghes = ghes;
ghes_pvt->mci = mci;

- mci->pdev = dev;
+ mci->pdev = parent;
mci->mtype_cap = MEM_FLAG_EMPTY;
mci->edac_ctl_cap = EDAC_FLAG_NONE;
mci->edac_cap = EDAC_FLAG_NONE;
@@ -701,19 +675,6 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
mci->ctl_name = "ghes_edac";
mci->dev_name = "ghes";

- if (fake) {
- pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
- pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
- pr_info("work on such system. Use this driver with caution\n");
- } else if (idx < 0) {
- pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
- pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
- pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
- pr_info("If you find incorrect reports, please contact your hardware vendor\n");
- pr_info("to correct its BIOS.\n");
- pr_info("This system has %d DIMM sockets.\n", mem_info.num_dimm);
- }
-
mci_add_dimm_info(mci);

rc = edac_mc_add_mc(mci);
@@ -738,3 +699,51 @@ void ghes_edac_unregister(struct ghes *ghes)

kfree(mem_info.dimms);
}
+
+int ghes_edac_register(struct ghes *ghes, struct device *dev)
+{
+ bool fake = false;
+ int rc;
+ int idx = -1;
+
+ if (IS_ENABLED(CONFIG_X86)) {
+ /* Check if safe to enable on this system */
+ idx = acpi_match_platform_list(plat_list);
+ if (!force_load && idx < 0)
+ return -ENODEV;
+ } else {
+ idx = 0;
+ }
+
+ /* We have only one ghes instance at a time. */
+ if (atomic_inc_return(&ghes_init) > 1)
+ return 0;
+
+ rc = mem_info_setup();
+ if (rc == -EINVAL) {
+ /* we've got a bogus BIOS */
+ fake = true;
+ rc = mem_info_setup_fake();
+ }
+ if (rc < 0) {
+ pr_err("Can't allocate memory for DIMM data\n");
+ return rc;
+ }
+
+ if (fake) {
+ pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
+ pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
+ pr_info("work on such system. Use this driver with caution\n");
+ } else if (idx < 0) {
+ pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
+ pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
+ pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
+ pr_info("If you find incorrect reports, please contact your hardware vendor\n");
+ pr_info("to correct its BIOS.\n");
+ pr_info("This system has %d DIMM sockets.\n", mem_info.num_dimm);
+ }
+
+ rc = ghes_edac_register_one(0, ghes, dev);
+
+ return rc;
+}
--
2.20.1

2019-05-29 08:47:31

by Robert Richter

[permalink] [raw]
Subject: [PATCH 08/21] EDAC, mc: Rework edac_raw_mc_handle_error() to use struct dimm_info

Rework edac_raw_mc_handle_error() to use struct dimm_info.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/edac/edac_mc.c | 28 +++++++++++++---------------
drivers/edac/edac_mc.h | 2 ++
drivers/edac/ghes_edac.c | 6 +++++-
3 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 2e461c9e1a89..b1bd0a23d02b 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -889,11 +889,9 @@ const char *edac_layer_name[] = {
EXPORT_SYMBOL_GPL(edac_layer_name);

static void edac_inc_ce_error(struct mem_ctl_info *mci,
- const int pos[EDAC_MAX_LAYERS],
+ struct dimm_info *dimm,
const u16 count)
{
- struct dimm_info *dimm = edac_get_dimm(mci, pos[0], pos[1], pos[2]);
-
mci->ce_mc += count;

if (dimm)
@@ -903,11 +901,9 @@ static void edac_inc_ce_error(struct mem_ctl_info *mci,
}

static void edac_inc_ue_error(struct mem_ctl_info *mci,
- const int pos[EDAC_MAX_LAYERS],
- const u16 count)
+ struct dimm_info *dimm,
+ const u16 count)
{
- struct dimm_info *dimm = edac_get_dimm(mci, pos[0], pos[1], pos[2]);
-
mci->ue_mc += count;

if (dimm)
@@ -917,8 +913,8 @@ static void edac_inc_ue_error(struct mem_ctl_info *mci,
}

static void edac_ce_error(struct mem_ctl_info *mci,
+ struct dimm_info *dimm,
const u16 error_count,
- const int pos[EDAC_MAX_LAYERS],
const char *msg,
const char *location,
const char *label,
@@ -946,7 +942,7 @@ static void edac_ce_error(struct mem_ctl_info *mci,
error_count, msg, msg_aux, label,
location, detail);
}
- edac_inc_ce_error(mci, pos, error_count);
+ edac_inc_ce_error(mci, dimm, error_count);

if (mci->scrub_mode == SCRUB_SW_SRC) {
/*
@@ -970,8 +966,8 @@ static void edac_ce_error(struct mem_ctl_info *mci,
}

static void edac_ue_error(struct mem_ctl_info *mci,
+ struct dimm_info *dimm,
const u16 error_count,
- const int pos[EDAC_MAX_LAYERS],
const char *msg,
const char *location,
const char *label,
@@ -1005,15 +1001,15 @@ static void edac_ue_error(struct mem_ctl_info *mci,
msg, msg_aux, label, location, detail);
}

- edac_inc_ue_error(mci, pos, error_count);
+ edac_inc_ue_error(mci, dimm, error_count);
}

void edac_raw_mc_handle_error(const enum hw_event_mc_err_type type,
struct mem_ctl_info *mci,
+ struct dimm_info *dimm,
struct edac_raw_error_desc *e)
{
char detail[80];
- int pos[EDAC_MAX_LAYERS] = { e->top_layer, e->mid_layer, e->low_layer };

/* Memory type dependent details about the error */
if (type == HW_EVENT_ERR_CORRECTED) {
@@ -1021,7 +1017,7 @@ void edac_raw_mc_handle_error(const enum hw_event_mc_err_type type,
"page:0x%lx offset:0x%lx grain:%ld syndrome:0x%lx",
e->page_frame_number, e->offset_in_page,
e->grain, e->syndrome);
- edac_ce_error(mci, e->error_count, pos, e->msg, e->location,
+ edac_ce_error(mci, dimm, e->error_count, e->msg, e->location,
e->label, detail, e->other_detail,
e->page_frame_number, e->offset_in_page, e->grain);
} else {
@@ -1029,7 +1025,7 @@ void edac_raw_mc_handle_error(const enum hw_event_mc_err_type type,
"page:0x%lx offset:0x%lx grain:%ld",
e->page_frame_number, e->offset_in_page, e->grain);

- edac_ue_error(mci, e->error_count, pos, e->msg, e->location,
+ edac_ue_error(mci, dimm, e->error_count, e->msg, e->location,
e->label, detail, e->other_detail);
}

@@ -1206,6 +1202,8 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
(e->page_frame_number << PAGE_SHIFT) | e->offset_in_page,
grain_bits, e->syndrome, e->other_detail);

- edac_raw_mc_handle_error(type, mci, e);
+ dimm = edac_get_dimm(mci, top_layer, mid_layer, low_layer);
+
+ edac_raw_mc_handle_error(type, mci, dimm, e);
}
EXPORT_SYMBOL_GPL(edac_mc_handle_error);
diff --git a/drivers/edac/edac_mc.h b/drivers/edac/edac_mc.h
index 4165e15995ad..b816cf3caaee 100644
--- a/drivers/edac/edac_mc.h
+++ b/drivers/edac/edac_mc.h
@@ -214,6 +214,7 @@ extern int edac_mc_find_csrow_by_page(struct mem_ctl_info *mci,
*
* @type: severity of the error (CE/UE/Fatal)
* @mci: a struct mem_ctl_info pointer
+ * @dimm: a struct dimm_info pointer
* @e: error description
*
* This raw function is used internally by edac_mc_handle_error(). It should
@@ -222,6 +223,7 @@ extern int edac_mc_find_csrow_by_page(struct mem_ctl_info *mci,
*/
void edac_raw_mc_handle_error(const enum hw_event_mc_err_type type,
struct mem_ctl_info *mci,
+ struct dimm_info *dimm,
struct edac_raw_error_desc *e);

/**
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 2c8f816c2cc7..4a13c394fa66 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -196,6 +196,7 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)

void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
{
+ struct dimm_info *dimm_info;
enum hw_event_mc_err_type type;
struct edac_raw_error_desc *e;
struct mem_ctl_info *mci;
@@ -440,7 +441,10 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
(e->page_frame_number << PAGE_SHIFT) | e->offset_in_page,
grain_bits, e->syndrome, pvt->detail_location);

- edac_raw_mc_handle_error(type, mci, e);
+ dimm_info = edac_get_dimm_by_index(mci, e->top_layer);
+
+ edac_raw_mc_handle_error(type, mci, dimm_info, e);
+
spin_unlock_irqrestore(&ghes_lock, flags);
}

--
2.20.1

2019-05-29 08:47:43

by Robert Richter

[permalink] [raw]
Subject: [PATCH 14/21] EDAC, ghes: Extract numa node information for each dimm

In a later patch we want to have one mc device per node. This patch
extracts the numa node information for each dimm. This is done by
collecting the physical address ranges from the DMI table (Memory
Array Mapped Address - Type 19 of SMBIOS spec). The node information
for a physical address is already know to a numa aware system (e.g. by
using the ACPI _PXM method or the ACPI SRAT table), so based on the PA
we can assign the node id to the dimms.

A fallback that disables numa is implemented in case the node
information is inconsistent.

E.g., on a ThunderX2 system the following node mappings are found
based on the DMI table:

EDAC DEBUG: mem_info_setup: DIMM0: Found mem range [0x0000008800000000-0x0000009ffcffffff] on node 0
EDAC DEBUG: mem_info_setup: DIMM1: Found mem range [0x0000008800000000-0x0000009ffcffffff] on node 0
EDAC DEBUG: mem_info_setup: DIMM2: Found mem range [0x0000008800000000-0x0000009ffcffffff] on node 0
EDAC DEBUG: mem_info_setup: DIMM3: Found mem range [0x0000008800000000-0x0000009ffcffffff] on node 0
EDAC DEBUG: mem_info_setup: DIMM4: Found mem range [0x0000008800000000-0x0000009ffcffffff] on node 0
EDAC DEBUG: mem_info_setup: DIMM5: Found mem range [0x0000008800000000-0x0000009ffcffffff] on node 0
EDAC DEBUG: mem_info_setup: DIMM6: Found mem range [0x0000008800000000-0x0000009ffcffffff] on node 0
EDAC DEBUG: mem_info_setup: DIMM7: Found mem range [0x0000008800000000-0x0000009ffcffffff] on node 0
EDAC DEBUG: mem_info_setup: DIMM8: Found mem range [0x0000009ffd000000-0x000000bffcffffff] on node 1
EDAC DEBUG: mem_info_setup: DIMM9: Found mem range [0x0000009ffd000000-0x000000bffcffffff] on node 1
EDAC DEBUG: mem_info_setup: DIMM10: Found mem range [0x0000009ffd000000-0x000000bffcffffff] on node 1
EDAC DEBUG: mem_info_setup: DIMM11: Found mem range [0x0000009ffd000000-0x000000bffcffffff] on node 1
EDAC DEBUG: mem_info_setup: DIMM12: Found mem range [0x0000009ffd000000-0x000000bffcffffff] on node 1
EDAC DEBUG: mem_info_setup: DIMM13: Found mem range [0x0000009ffd000000-0x000000bffcffffff] on node 1
EDAC DEBUG: mem_info_setup: DIMM14: Found mem range [0x0000009ffd000000-0x000000bffcffffff] on node 1
EDAC DEBUG: mem_info_setup: DIMM15: Found mem range [0x0000009ffd000000-0x000000bffcffffff] on node 1

Signed-off-by: Robert Richter <[email protected]>
---
drivers/edac/ghes_edac.c | 104 ++++++++++++++++++++++++++++++++++++++-
1 file changed, 103 insertions(+), 1 deletion(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 50f4ee36b755..083452a48b42 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -67,14 +67,34 @@ struct memdev_dmi_entry {
u16 conf_mem_clk_speed;
} __attribute__((__packed__));

+/* Memory Array Mapped Address - Type 19 of SMBIOS spec */
+struct memarr_dmi_entry {
+ u8 type;
+ u8 length;
+ u16 handle;
+ u32 start;
+ u32 end;
+ u16 phys_mem_array_handle;
+ u8 partition_width;
+ u64 ext_start;
+ u64 ext_end;
+} __attribute__((__packed__));
+
struct ghes_dimm_info {
struct dimm_info dimm_info;
int idx;
+ int numa_node;
+ phys_addr_t start;
+ phys_addr_t end;
+ u16 phys_handle;
};

struct ghes_mem_info {
- int num_dimm;
+ int num_dimm;
struct ghes_dimm_info *dimms;
+ int num_nodes;
+ int num_per_node[MAX_NUMNODES];
+ bool enable_numa;
};

struct ghes_mem_info mem_info;
@@ -97,10 +117,50 @@ static void ghes_dimm_info_init(void)

for_each_dimm(dimm) {
dimm->idx = idx;
+ dimm->numa_node = NUMA_NO_NODE;
idx++;
}
}

+static void ghes_edac_set_nid(const struct dmi_header *dh, void *arg)
+{
+ struct memarr_dmi_entry *entry = (struct memarr_dmi_entry *)dh;
+ struct ghes_dimm_info *dimm;
+ phys_addr_t start, end;
+ int nid;
+
+ if (dh->type != DMI_ENTRY_MEM_ARRAY_MAPPED_ADDR)
+ return;
+
+ /* only support SMBIOS 2.7+ */
+ if (entry->length < sizeof(*entry))
+ return;
+
+ if (entry->start == 0xffffffff)
+ start = entry->ext_start;
+ else
+ start = entry->start;
+ if (entry->end == 0xffffffff)
+ end = entry->ext_end;
+ else
+ end = entry->end;
+
+ if (!pfn_valid(PHYS_PFN(start)))
+ return;
+
+ nid = pfn_to_nid(PHYS_PFN(start));
+ if (nid < 0 || nid >= MAX_NUMNODES || !node_possible(nid))
+ nid = NUMA_NO_NODE;
+
+ for_each_dimm(dimm) {
+ if (entry->phys_mem_array_handle == dimm->phys_handle) {
+ dimm->numa_node = nid;
+ dimm->start = start;
+ dimm->end = end;
+ }
+ }
+}
+
static int get_dimm_smbios_index(u16 handle)
{
struct mem_ctl_info *mci = ghes_pvt->mci;
@@ -213,8 +273,25 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
}
}

+static void mem_info_disable_numa(void)
+{
+ struct ghes_dimm_info *dimm;
+
+ for_each_dimm(dimm) {
+ if (dimm->numa_node != NUMA_NO_NODE)
+ mem_info.num_per_node[dimm->numa_node] = 0;
+ dimm->numa_node = 0;
+ }
+
+ mem_info.num_per_node[0] = mem_info.num_dimm;
+ mem_info.num_nodes = 1;
+ mem_info.enable_numa = false;
+}
+
static int mem_info_setup(void)
{
+ struct ghes_dimm_info *dimm;
+ bool enable_numa = true;
int idx = 0;

memset(&mem_info, 0, sizeof(mem_info));
@@ -231,6 +308,29 @@ static int mem_info_setup(void)

ghes_dimm_info_init();
dmi_walk(ghes_edac_dmidecode, &idx);
+ dmi_walk(ghes_edac_set_nid, NULL);
+
+ for_each_dimm(dimm) {
+ if (dimm->numa_node == NUMA_NO_NODE) {
+ enable_numa = false;
+ } else {
+ if (!mem_info.num_per_node[dimm->numa_node])
+ mem_info.num_nodes++;
+ mem_info.num_per_node[dimm->numa_node]++;
+ }
+
+ edac_dbg(1, "DIMM%i: Found mem range [%pa-%pa] on node %d\n",
+ dimm->idx, &dimm->start, &dimm->end, dimm->numa_node);
+ }
+
+ mem_info.enable_numa = enable_numa;
+ if (enable_numa)
+ return 0;
+
+ /* something went wrong, disable numa */
+ if (num_possible_nodes() > 1)
+ pr_warn("Can't get numa info, disabling numa\n");
+ mem_info_disable_numa();

return 0;
}
@@ -258,6 +358,8 @@ static int mem_info_setup_fake(void)
dimm->dtype = DEV_UNKNOWN;
dimm->edac_mode = EDAC_SECDED;

+ mem_info_disable_numa();
+
return 0;
}

--
2.20.1

2019-05-29 08:47:45

by Robert Richter

[permalink] [raw]
Subject: [PATCH 21/21] EDAC, Documentation: Describe CPER module definition and DIMM ranks

Update on CPER DIMM naming convention and DIMM ranks.

Signed-off-by: Robert Richter <[email protected]>
---
Documentation/admin-guide/ras.rst | 31 +++++++++++++++++++------------
1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/Documentation/admin-guide/ras.rst b/Documentation/admin-guide/ras.rst
index c7495e42e6f4..4e2a01c77a9c 100644
--- a/Documentation/admin-guide/ras.rst
+++ b/Documentation/admin-guide/ras.rst
@@ -330,9 +330,12 @@ There can be multiple csrows and multiple channels.

.. [#f4] Nowadays, the term DIMM (Dual In-line Memory Module) is widely
used to refer to a memory module, although there are other memory
- packaging alternatives, like SO-DIMM, SIMM, etc. Along this document,
- and inside the EDAC system, the term "dimm" is used for all memory
- modules, even when they use a different kind of packaging.
+ packaging alternatives, like SO-DIMM, SIMM, etc. The UEFI
+ specification (Version 2.7) defines a memory module in the Common
+ Platform Error Record (CPER) section to be an SMBIOS Memory Device
+ (Type 17). Along this document, and inside the EDAC system, the term
+ "dimm" is used for all memory modules, even when they use a
+ different kind of packaging.

Memory controllers allow for several csrows, with 8 csrows being a
typical value. Yet, the actual number of csrows depends on the layout of
@@ -349,12 +352,14 @@ controllers. The following example will assume 2 channels:
| | ``ch0`` | ``ch1`` |
+============+===========+===========+
| ``csrow0`` | DIMM_A0 | DIMM_B0 |
- +------------+ | |
- | ``csrow1`` | | |
+ | | rank0 | rank0 |
+ +------------+ - | - |
+ | ``csrow1`` | rank1 | rank1 |
+------------+-----------+-----------+
| ``csrow2`` | DIMM_A1 | DIMM_B1 |
- +------------+ | |
- | ``csrow3`` | | |
+ | | rank0 | rank0 |
+ +------------+ - | - |
+ | ``csrow3`` | rank1 | rank1 |
+------------+-----------+-----------+

In the above example, there are 4 physical slots on the motherboard
@@ -374,11 +379,13 @@ which the memory DIMM is placed. Thus, when 1 DIMM is placed in each
Channel, the csrows cross both DIMMs.

Memory DIMMs come single or dual "ranked". A rank is a populated csrow.
-Thus, 2 single ranked DIMMs, placed in slots DIMM_A0 and DIMM_B0 above
-will have just one csrow (csrow0). csrow1 will be empty. On the other
-hand, when 2 dual ranked DIMMs are similarly placed, then both csrow0
-and csrow1 will be populated. The pattern repeats itself for csrow2 and
-csrow3.
+In the example above 2 dual ranked DIMMs are similarly placed. Thus,
+both csrow0 and csrow1 are populated. On the other hand, when 2 single
+ranked DIMMs are placed in slots DIMM_A0 and DIMM_B0, then they will
+have just one csrow (csrow0) and csrow1 will be empty. The pattern
+repeats itself for csrow2 and csrow3. Also note that some memory
+controller doesn't have any logic to identify the memory module, see
+``rankX`` directories below.

The representation of the above is reflected in the directory
tree in EDAC's sysfs interface. Starting in directory
--
2.20.1

2019-05-29 08:47:45

by Robert Richter

[permalink] [raw]
Subject: [PATCH 17/21] EDAC, ghes: Fill sysfs with the DMI DIMM label information

This patch extracts the DIMM label from the DMI table and puts this
information into sysfs. E.g. on a ThunderX2 system we found this now:

# grep . /sys/devices/system/edac/mc/mc*/dimm*/dimm_label
/sys/devices/system/edac/mc/mc0/dimm0/dimm_label:N0 DIMM_A0
/sys/devices/system/edac/mc/mc0/dimm1/dimm_label:N0 DIMM_B0
/sys/devices/system/edac/mc/mc0/dimm2/dimm_label:N0 DIMM_C0
/sys/devices/system/edac/mc/mc0/dimm3/dimm_label:N0 DIMM_D0
/sys/devices/system/edac/mc/mc0/dimm4/dimm_label:N0 DIMM_E0
/sys/devices/system/edac/mc/mc0/dimm5/dimm_label:N0 DIMM_F0
/sys/devices/system/edac/mc/mc0/dimm6/dimm_label:N0 DIMM_G0
/sys/devices/system/edac/mc/mc0/dimm7/dimm_label:N0 DIMM_H0
/sys/devices/system/edac/mc/mc1/dimm0/dimm_label:N1 DIMM_I0
/sys/devices/system/edac/mc/mc1/dimm1/dimm_label:N1 DIMM_J0
/sys/devices/system/edac/mc/mc1/dimm2/dimm_label:N1 DIMM_K0
/sys/devices/system/edac/mc/mc1/dimm3/dimm_label:N1 DIMM_L0
/sys/devices/system/edac/mc/mc1/dimm4/dimm_label:N1 DIMM_M0
/sys/devices/system/edac/mc/mc1/dimm5/dimm_label:N1 DIMM_N0
/sys/devices/system/edac/mc/mc1/dimm6/dimm_label:N1 DIMM_O0
/sys/devices/system/edac/mc/mc1/dimm7/dimm_label:N1 DIMM_P0

Signed-off-by: Robert Richter <[email protected]>
---
drivers/edac/ghes_edac.c | 26 ++++++++++++++++++++------
1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index e5fa977bcfd9..b8878ff498d1 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -254,10 +254,6 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
dimm->dtype = DEV_UNKNOWN;
dimm->grain = 128; /* Likely, worse case */

- /*
- * FIXME: It shouldn't be hard to also fill the DIMM labels
- */
-
if (dimm->nr_pages) {
edac_dbg(1, "DIMM%i: %s size = %d MB%s\n",
mi->idx, edac_mem_types[dimm->mtype],
@@ -293,6 +289,7 @@ static int mem_info_setup(void)
{
struct ghes_dimm_info *dimm;
bool enable_numa = true;
+ const char *bank, *device;
int idx = 0;

memset(&mem_info, 0, sizeof(mem_info));
@@ -312,6 +309,17 @@ static int mem_info_setup(void)
dmi_walk(ghes_edac_set_nid, NULL);

for_each_dimm(dimm) {
+ bank = device = NULL;
+ dmi_memdev_name(dimm->dimm_info.smbios_handle,
+ &bank, &device);
+ if (bank && device) {
+ snprintf(dimm->dimm_info.label,
+ sizeof(dimm->dimm_info.label),
+ "%s %s", bank, device);
+ } else {
+ *dimm->dimm_info.label = '\0';
+ }
+
if (dimm->numa_node == NUMA_NO_NODE) {
enable_numa = false;
} else {
@@ -320,8 +328,11 @@ static int mem_info_setup(void)
mem_info.num_per_node[dimm->numa_node]++;
}

- edac_dbg(1, "DIMM%i: Found mem range [%pa-%pa] on node %d\n",
- dimm->idx, &dimm->start, &dimm->end, dimm->numa_node);
+ edac_dbg(1, "DIMM%i: Found mem range [%pa-%pa] on node %d, handle: 0x%.4x%s%s\n",
+ dimm->idx, &dimm->start, &dimm->end, dimm->numa_node,
+ dimm->dimm_info.smbios_handle,
+ *dimm->dimm_info.label ? ", label: " : "",
+ dimm->dimm_info.label);
}

mem_info.enable_numa = enable_numa;
@@ -387,6 +398,9 @@ static void mci_add_dimm_info(struct mem_ctl_info *mci)
mci_dimm->dtype = dmi_dimm->dtype;
mci_dimm->grain = dmi_dimm->grain;
mci_dimm->smbios_handle = dmi_dimm->smbios_handle;
+
+ if (*dmi_dimm->label)
+ strcpy(mci_dimm->label, dmi_dimm->label);
}

if (index != mci->tot_dimms)
--
2.20.1

2019-05-29 08:47:56

by Robert Richter

[permalink] [raw]
Subject: [PATCH 02/21] EDAC: Fixes to use put_device() after device_add() errors

Always use put_device() after device_add() failed.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/edac/edac_mc_sysfs.c | 36 +++++++++++++++++++-----------------
1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
index 464174685589..dbef699162a8 100644
--- a/drivers/edac/edac_mc_sysfs.c
+++ b/drivers/edac/edac_mc_sysfs.c
@@ -404,6 +404,8 @@ static inline int nr_pages_per_csrow(struct csrow_info *csrow)
static int edac_create_csrow_object(struct mem_ctl_info *mci,
struct csrow_info *csrow, int index)
{
+ int err;
+
csrow->dev.type = &csrow_attr_type;
csrow->dev.groups = csrow_dev_groups;
device_initialize(&csrow->dev);
@@ -415,7 +417,11 @@ static int edac_create_csrow_object(struct mem_ctl_info *mci,
edac_dbg(0, "creating (virtual) csrow node %s\n",
dev_name(&csrow->dev));

- return device_add(&csrow->dev);
+ err = device_add(&csrow->dev);
+ if (err)
+ put_device(&csrow->dev);
+
+ return err;
}

/* Create a CSROW object under specifed edac_mc_device */
@@ -646,8 +652,11 @@ static int edac_create_dimm_object(struct mem_ctl_info *mci,
pm_runtime_forbid(&mci->dev);

err = device_add(&dimm->dev);
-
- edac_dbg(0, "creating rank/dimm device %s\n", dev_name(&dimm->dev));
+ if (err)
+ put_device(&dimm->dev);
+ else
+ edac_dbg(0, "creating rank/dimm device %s\n",
+ dev_name(&dimm->dev));

return err;
}
@@ -927,8 +936,9 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci,
edac_dbg(0, "creating device %s\n", dev_name(&mci->dev));
err = device_add(&mci->dev);
if (err < 0) {
+ put_device(&mci->dev);
edac_dbg(1, "failure: create device %s\n", dev_name(&mci->dev));
- goto out;
+ return err;
}

/*
@@ -977,7 +987,6 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci,
}
device_unregister(&mci->dev);

-out:
return err;
}

@@ -1034,10 +1043,8 @@ int __init edac_mc_sysfs_init(void)
int err;

mci_pdev = kzalloc(sizeof(*mci_pdev), GFP_KERNEL);
- if (!mci_pdev) {
- err = -ENOMEM;
- goto out;
- }
+ if (!mci_pdev)
+ return -ENOMEM;

mci_pdev->bus = edac_get_sysfs_subsys();
mci_pdev->type = &mc_attr_type;
@@ -1046,15 +1053,10 @@ int __init edac_mc_sysfs_init(void)

err = device_add(mci_pdev);
if (err < 0)
- goto out_put_device;
-
- edac_dbg(0, "device %s created\n", dev_name(mci_pdev));
-
- return 0;
+ put_device(mci_pdev);
+ else
+ edac_dbg(0, "device %s created\n", dev_name(mci_pdev));

- out_put_device:
- put_device(mci_pdev);
- out:
return err;
}

--
2.20.1

2019-05-29 08:48:33

by Robert Richter

[permalink] [raw]
Subject: [PATCH 11/21] EDAC, ghes: Unify trace_mc_event() code with edac_mc driver

Almost duplicate code, remove it.

Note: there is a difference in the calculation of the grain_bits,
using the edac_mc's version here.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/edac/edac_mc.c | 22 +++++++++++-----------
drivers/edac/ghes_edac.c | 9 ---------
2 files changed, 11 insertions(+), 20 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index b1bd0a23d02b..8613a31dc86c 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -1010,6 +1010,17 @@ void edac_raw_mc_handle_error(const enum hw_event_mc_err_type type,
struct edac_raw_error_desc *e)
{
char detail[80];
+ u8 grain_bits;
+
+ /* Report the error via the trace interface */
+ grain_bits = fls_long(e->grain) + 1;
+
+ if (IS_ENABLED(CONFIG_RAS))
+ trace_mc_event(type, e->msg, e->label, e->error_count,
+ mci->mc_idx, e->top_layer, e->mid_layer,
+ e->low_layer,
+ (e->page_frame_number << PAGE_SHIFT) | e->offset_in_page,
+ grain_bits, e->syndrome, e->other_detail);

/* Memory type dependent details about the error */
if (type == HW_EVENT_ERR_CORRECTED) {
@@ -1050,7 +1061,6 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
int row = -1, chan = -1;
int pos[EDAC_MAX_LAYERS] = { top_layer, mid_layer, low_layer };
int i, n_labels = 0;
- u8 grain_bits;
struct edac_raw_error_desc *e = &mci->error_desc;
bool per_layer_report = false;

@@ -1192,16 +1202,6 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
if (p > e->location)
*(p - 1) = '\0';

- /* Report the error via the trace interface */
- grain_bits = fls_long(e->grain) + 1;
-
- if (IS_ENABLED(CONFIG_RAS))
- trace_mc_event(type, e->msg, e->label, e->error_count,
- mci->mc_idx, e->top_layer, e->mid_layer,
- e->low_layer,
- (e->page_frame_number << PAGE_SHIFT) | e->offset_in_page,
- grain_bits, e->syndrome, e->other_detail);
-
dimm = edac_get_dimm(mci, top_layer, mid_layer, low_layer);

edac_raw_mc_handle_error(type, mci, dimm, e);
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index c18f16bc9e4d..f6ea4b070bfe 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -202,7 +202,6 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
struct ghes_edac_pvt *pvt = ghes_pvt;
unsigned long flags;
char *p;
- u8 grain_bits;

if (!pvt)
return;
@@ -434,14 +433,6 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
if (p > pvt->other_detail)
*(p - 1) = '\0';

- /* Generate the trace event */
- grain_bits = fls_long(e->grain);
-
- trace_mc_event(type, e->msg, e->label, e->error_count,
- mci->mc_idx, e->top_layer, e->mid_layer, e->low_layer,
- (e->page_frame_number << PAGE_SHIFT) | e->offset_in_page,
- grain_bits, e->syndrome, e->other_detail);
-
dimm_info = edac_get_dimm_by_index(mci, e->top_layer);

edac_raw_mc_handle_error(type, mci, dimm_info, e);
--
2.20.1

2019-05-29 08:48:52

by Robert Richter

[permalink] [raw]
Subject: [PATCH 03/21] EDAC: Kill EDAC_DIMM_PTR() macro

Get rid of this macro and instead use the new function
edac_get_dimm(). Also introduce the edac_get_dimm_by_index() function
for later use.

Semantic patch used:

@@ expression mci, a, b,c; @@

-EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers, a, b, c)
+edac_get_dimm(mci, a, b, c)

Signed-off-by: Robert Richter <[email protected]>
---
drivers/edac/edac_mc.c | 1 +
drivers/edac/ghes_edac.c | 8 ++--
drivers/edac/i10nm_base.c | 3 +-
drivers/edac/i3200_edac.c | 3 +-
drivers/edac/i5000_edac.c | 5 +--
drivers/edac/i5100_edac.c | 3 +-
drivers/edac/i5400_edac.c | 4 +-
drivers/edac/i7300_edac.c | 3 +-
drivers/edac/i7core_edac.c | 3 +-
drivers/edac/ie31200_edac.c | 7 +---
drivers/edac/pnd2_edac.c | 4 +-
drivers/edac/sb_edac.c | 2 +-
drivers/edac/skx_base.c | 3 +-
drivers/edac/ti_edac.c | 2 +-
include/linux/edac.h | 84 +++++++++++++++++++++++--------------
15 files changed, 73 insertions(+), 62 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 64922c8fa7e3..5f565e5949b3 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -439,6 +439,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
goto error;
mci->dimms[off] = dimm;
dimm->mci = mci;
+ dimm->idx = off;

/*
* Copy DIMM location and initialize it.
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 49396bf6ad88..42afa2604db3 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -100,9 +100,8 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)

if (dh->type == DMI_ENTRY_MEM_DEVICE) {
struct memdev_dmi_entry *entry = (struct memdev_dmi_entry *)dh;
- struct dimm_info *dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
- mci->n_layers,
- dimm_fill->count, 0, 0);
+ struct dimm_info *dimm = edac_get_dimm(mci, dimm_fill->count,
+ 0, 0);
u16 rdr_mask = BIT(7) | BIT(13);

if (entry->size == 0xffff) {
@@ -529,8 +528,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
dimm_fill.mci = mci;
dmi_walk(ghes_edac_dmidecode, &dimm_fill);
} else {
- struct dimm_info *dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
- mci->n_layers, 0, 0, 0);
+ struct dimm_info *dimm = edac_get_dimm(mci, 0, 0, 0);

dimm->nr_pages = 1;
dimm->grain = 128;
diff --git a/drivers/edac/i10nm_base.c b/drivers/edac/i10nm_base.c
index c334fb7c63df..bc7c990b57a5 100644
--- a/drivers/edac/i10nm_base.c
+++ b/drivers/edac/i10nm_base.c
@@ -152,8 +152,7 @@ static int i10nm_get_dimm_config(struct mem_ctl_info *mci)

ndimms = 0;
for (j = 0; j < I10NM_NUM_DIMMS; j++) {
- dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
- mci->n_layers, i, j, 0);
+ dimm = edac_get_dimm(mci, i, j, 0);
mtr = I10NM_GET_DIMMMTR(imc, i, j);
mcddrtcfg = I10NM_GET_MCDDRTCFG(imc, i, j);
edac_dbg(1, "dimmmtr 0x%x mcddrtcfg 0x%x (mc%d ch%d dimm%d)\n",
diff --git a/drivers/edac/i3200_edac.c b/drivers/edac/i3200_edac.c
index 299b441647cd..432b375a4075 100644
--- a/drivers/edac/i3200_edac.c
+++ b/drivers/edac/i3200_edac.c
@@ -392,8 +392,7 @@ static int i3200_probe1(struct pci_dev *pdev, int dev_idx)
unsigned long nr_pages;

for (j = 0; j < nr_channels; j++) {
- struct dimm_info *dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
- mci->n_layers, i, j, 0);
+ struct dimm_info *dimm = edac_get_dimm(mci, i, j, 0);

nr_pages = drb_to_nr_pages(drbs, stacked, j, i);
if (nr_pages == 0)
diff --git a/drivers/edac/i5000_edac.c b/drivers/edac/i5000_edac.c
index 078a7351bf05..1a6f69c859ab 100644
--- a/drivers/edac/i5000_edac.c
+++ b/drivers/edac/i5000_edac.c
@@ -1275,9 +1275,8 @@ static int i5000_init_csrows(struct mem_ctl_info *mci)
if (!MTR_DIMMS_PRESENT(mtr))
continue;

- dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers,
- channel / MAX_BRANCHES,
- channel % MAX_BRANCHES, slot);
+ dimm = edac_get_dimm(mci, channel / MAX_BRANCHES,
+ channel % MAX_BRANCHES, slot);

csrow_megs = pvt->dimm_info[slot][channel].megabytes;
dimm->grain = 8;
diff --git a/drivers/edac/i5100_edac.c b/drivers/edac/i5100_edac.c
index b506eef6b146..39ba7f2414ae 100644
--- a/drivers/edac/i5100_edac.c
+++ b/drivers/edac/i5100_edac.c
@@ -858,8 +858,7 @@ static void i5100_init_csrows(struct mem_ctl_info *mci)
if (!npages)
continue;

- dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers,
- chan, rank, 0);
+ dimm = edac_get_dimm(mci, chan, rank, 0);

dimm->nr_pages = npages;
dimm->grain = 32;
diff --git a/drivers/edac/i5400_edac.c b/drivers/edac/i5400_edac.c
index 6f8bcdb9256a..a50a8707337b 100644
--- a/drivers/edac/i5400_edac.c
+++ b/drivers/edac/i5400_edac.c
@@ -1196,8 +1196,8 @@ static int i5400_init_dimms(struct mem_ctl_info *mci)
if (!MTR_DIMMS_PRESENT(mtr))
continue;

- dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers,
- channel / 2, channel % 2, slot);
+ dimm = edac_get_dimm(mci, channel / 2, channel % 2,
+ slot);

size_mb = pvt->dimm_info[slot][channel].megabytes;

diff --git a/drivers/edac/i7300_edac.c b/drivers/edac/i7300_edac.c
index 6b5a554ba8e4..b76be69f0d74 100644
--- a/drivers/edac/i7300_edac.c
+++ b/drivers/edac/i7300_edac.c
@@ -796,8 +796,7 @@ static int i7300_init_csrows(struct mem_ctl_info *mci)
for (ch = 0; ch < max_channel; ch++) {
int channel = to_channel(ch, branch);

- dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
- mci->n_layers, branch, ch, slot);
+ dimm = edac_get_dimm(mci, branch, ch, slot);

dinfo = &pvt->dimm_info[slot][channel];

diff --git a/drivers/edac/i7core_edac.c b/drivers/edac/i7core_edac.c
index 40297550313a..4d7afcd55626 100644
--- a/drivers/edac/i7core_edac.c
+++ b/drivers/edac/i7core_edac.c
@@ -587,8 +587,7 @@ static int get_dimm_config(struct mem_ctl_info *mci)
if (!DIMM_PRESENT(dimm_dod[j]))
continue;

- dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers,
- i, j, 0);
+ dimm = edac_get_dimm(mci, i, j, 0);
banks = numbank(MC_DOD_NUMBANK(dimm_dod[j]));
ranks = numrank(MC_DOD_NUMRANK(dimm_dod[j]));
rows = numrow(MC_DOD_NUMROW(dimm_dod[j]));
diff --git a/drivers/edac/ie31200_edac.c b/drivers/edac/ie31200_edac.c
index aac9b9b360b8..a2b4d80ddc57 100644
--- a/drivers/edac/ie31200_edac.c
+++ b/drivers/edac/ie31200_edac.c
@@ -467,9 +467,7 @@ static int ie31200_probe1(struct pci_dev *pdev, int dev_idx)

if (dimm_info[j][i].dual_rank) {
nr_pages = nr_pages / 2;
- dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
- mci->n_layers, (i * 2) + 1,
- j, 0);
+ dimm = edac_get_dimm(mci, (i * 2) + 1, j, 0);
dimm->nr_pages = nr_pages;
edac_dbg(0, "set nr pages: 0x%lx\n", nr_pages);
dimm->grain = 8; /* just a guess */
@@ -480,8 +478,7 @@ static int ie31200_probe1(struct pci_dev *pdev, int dev_idx)
dimm->dtype = DEV_UNKNOWN;
dimm->edac_mode = EDAC_UNKNOWN;
}
- dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
- mci->n_layers, i * 2, j, 0);
+ dimm = edac_get_dimm(mci, i * 2, j, 0);
dimm->nr_pages = nr_pages;
edac_dbg(0, "set nr pages: 0x%lx\n", nr_pages);
dimm->grain = 8; /* same guess */
diff --git a/drivers/edac/pnd2_edac.c b/drivers/edac/pnd2_edac.c
index 903a4f1fadcc..2f7dcafd84b1 100644
--- a/drivers/edac/pnd2_edac.c
+++ b/drivers/edac/pnd2_edac.c
@@ -1234,7 +1234,7 @@ static void apl_get_dimm_config(struct mem_ctl_info *mci)
if (!(chan_mask & BIT(i)))
continue;

- dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers, i, 0, 0);
+ dimm = edac_get_dimm(mci, i, 0, 0);
if (!dimm) {
edac_dbg(0, "No allocated DIMM for channel %d\n", i);
continue;
@@ -1314,7 +1314,7 @@ static void dnv_get_dimm_config(struct mem_ctl_info *mci)
if (!ranks_of_dimm[j])
continue;

- dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers, i, j, 0);
+ dimm = edac_get_dimm(mci, i, j, 0);
if (!dimm) {
edac_dbg(0, "No allocated DIMM for channel %d DIMM %d\n", i, j);
continue;
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 9353c3fc7c05..5dfa9ba1c252 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -1623,7 +1623,7 @@ static int __populate_dimms(struct mem_ctl_info *mci,
}

for (j = 0; j < max_dimms_per_channel; j++) {
- dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers, i, j, 0);
+ dimm = edac_get_dimm(mci, i, j, 0);
if (pvt->info.type == KNIGHTS_LANDING) {
pci_read_config_dword(pvt->knl.pci_channel[i],
knl_mtr_reg, &mtr);
diff --git a/drivers/edac/skx_base.c b/drivers/edac/skx_base.c
index adae4c848ca1..6a828d6c0852 100644
--- a/drivers/edac/skx_base.c
+++ b/drivers/edac/skx_base.c
@@ -177,8 +177,7 @@ static int skx_get_dimm_config(struct mem_ctl_info *mci)
pci_read_config_dword(imc->chan[i].cdev, 0x8C, &amap);
pci_read_config_dword(imc->chan[i].cdev, 0x400, &mcddrtcfg);
for (j = 0; j < SKX_NUM_DIMMS; j++) {
- dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
- mci->n_layers, i, j, 0);
+ dimm = edac_get_dimm(mci, i, j, 0);
pci_read_config_dword(imc->chan[i].cdev,
0x80 + 4 * j, &mtr);
if (IS_DIMM_PRESENT(mtr)) {
diff --git a/drivers/edac/ti_edac.c b/drivers/edac/ti_edac.c
index 6ac26d1b929f..8be3e89a510e 100644
--- a/drivers/edac/ti_edac.c
+++ b/drivers/edac/ti_edac.c
@@ -135,7 +135,7 @@ static void ti_edac_setup_dimm(struct mem_ctl_info *mci, u32 type)
u32 val;
u32 memsize;

- dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers, 0, 0, 0);
+ dimm = edac_get_dimm(mci, 0, 0, 0);

val = ti_edac_readl(edac, EMIF_SDRAM_CONFIG);

diff --git a/include/linux/edac.h b/include/linux/edac.h
index 342dabda9c7e..1367a3fc544f 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -403,37 +403,6 @@ struct edac_mc_layer {
__i; \
})

-/**
- * EDAC_DIMM_PTR - Macro responsible to get a pointer inside a pointer array
- * for the element given by [layer0,layer1,layer2] position
- *
- * @layers: a struct edac_mc_layer array, describing how many elements
- * were allocated for each layer
- * @var: name of the var where we want to get the pointer
- * (like mci->dimms)
- * @nlayers: Number of layers at the @layers array
- * @layer0: layer0 position
- * @layer1: layer1 position. Unused if n_layers < 2
- * @layer2: layer2 position. Unused if n_layers < 3
- *
- * For 1 layer, this macro returns "var[layer0]";
- *
- * For 2 layers, this macro is similar to allocate a bi-dimensional array
- * and to return "var[layer0][layer1]";
- *
- * For 3 layers, this macro is similar to allocate a tri-dimensional array
- * and to return "var[layer0][layer1][layer2]";
- */
-#define EDAC_DIMM_PTR(layers, var, nlayers, layer0, layer1, layer2) ({ \
- typeof(*var) __p; \
- int ___i = EDAC_DIMM_OFF(layers, nlayers, layer0, layer1, layer2); \
- if (___i < 0) \
- __p = NULL; \
- else \
- __p = (var)[___i]; \
- __p; \
-})
-
struct dimm_info {
struct device dev;

@@ -443,6 +412,7 @@ struct dimm_info {
unsigned location[EDAC_MAX_LAYERS];

struct mem_ctl_info *mci; /* the parent */
+ int idx; /* index within the parent dimm array */

u32 grain; /* granularity of reported error in bytes */
enum dev_type dtype; /* memory device type */
@@ -669,4 +639,56 @@ struct mem_ctl_info {
bool fake_inject_ue;
u16 fake_inject_count;
};
+
+/**
+ * edac_get_dimm_by_index - Get DIMM info from a memory controller
+ * given by an index
+ *
+ * @mci: a struct mem_ctl_info
+ * @index: index in the memory controller's DIMM array
+ *
+ * Returns a struct dimm_info*.
+ */
+static inline struct dimm_info *
+edac_get_dimm_by_index(struct mem_ctl_info *mci, int index)
+{
+ if (index < 0 || index >= mci->tot_dimms)
+ return NULL;
+
+ if (WARN_ON_ONCE(mci->dimms[index]->idx != index))
+ return NULL;
+
+ return mci->dimms[index];
+}
+
+/**
+ * edac_get_dimm - Get DIMM info from a memory controller given by
+ * [layer0,layer1,layer2] position
+ *
+ * @mci: a struct mem_ctl_info
+ * @layer0: layer0 position
+ * @layer1: layer1 position. Unused if n_layers < 2
+ * @layer2: layer2 position. Unused if n_layers < 3
+ *
+ * For 1 layer, this macro returns "dimms[layer0]";
+ *
+ * For 2 layers, this macro is similar to allocate a bi-dimensional array
+ * and to return "dimms[layer0][layer1]";
+ *
+ * For 3 layers, this macro is similar to allocate a tri-dimensional array
+ * and to return "dimms[layer0][layer1][layer2]";
+ */
+static inline struct dimm_info *
+edac_get_dimm(struct mem_ctl_info *mci, int layer0, int layer1, int layer2)
+{
+ int index = layer0;
+
+ if (index >= 0 && mci->n_layers > 1)
+ index = index * mci->layers[1].size + layer1;
+ if (index >= 0 && mci->n_layers > 2)
+ index = index * mci->layers[2].size + layer2;
+
+ return edac_get_dimm_by_index(mci, index);
+}
+
#endif
--
2.20.1

2019-05-29 08:48:53

by Robert Richter

[permalink] [raw]
Subject: [PATCH 06/21] EDAC, mc: Cleanup _edac_mc_free() code

Remove needless and boilerplate variable declarations. No functional
changes.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/edac/edac_mc.c | 21 +++++++++------------
1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index a6b34ccce3d4..a849993a8dcd 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -280,26 +280,23 @@ static void _edac_mc_free(struct mem_ctl_info *mci)
{
int i, chn, row;
struct csrow_info *csr;
- const unsigned int tot_dimms = mci->tot_dimms;
- const unsigned int tot_channels = mci->num_cschannel;
- const unsigned int tot_csrows = mci->nr_csrows;

if (mci->dimms) {
- for (i = 0; i < tot_dimms; i++)
+ for (i = 0; i < mci->tot_dimms; i++)
kfree(mci->dimms[i]);
kfree(mci->dimms);
}
if (mci->csrows) {
- for (row = 0; row < tot_csrows; row++) {
+ for (row = 0; row < mci->nr_csrows; row++) {
csr = mci->csrows[row];
- if (csr) {
- if (csr->channels) {
- for (chn = 0; chn < tot_channels; chn++)
- kfree(csr->channels[chn]);
- kfree(csr->channels);
- }
- kfree(csr);
+ if (!csr)
+ continue;
+ if (csr->channels) {
+ for (chn = 0; chn < mci->num_cschannel; chn++)
+ kfree(csr->channels[chn]);
+ kfree(csr->channels);
}
+ kfree(csr);
}
kfree(mci->csrows);
}
--
2.20.1

2019-05-29 14:57:54

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 00/21] EDAC, mc, ghes: Fixes and updates to improve memory error reporting

On Wed, May 29, 2019 at 08:44:01AM +0000, Robert Richter wrote:
> Patch #1: Repost of an already accepted patch sent to the ml. Adding
> it here for completeness as I did not find it in a repository yet.

Try mainline:

29a0c843973b ("EDAC/mc: Fix edac_mc_find() in case no device is found")

:-)

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply. Srsly.

2019-05-29 15:10:16

by James Morse

[permalink] [raw]
Subject: Re: [PATCH 13/21] EDAC, ghes: Rework memory hierarchy detection

Hi Robert,

On 29/05/2019 09:44, Robert Richter wrote:
> In a later patch we want add more information about the memory
> hierarchy (NUMA topology, DIMM label information). Rework memory
> hierarchy detection to make the code extendable for this.
>
> The general approach is roughly like:
>
> mem_info_setup();
> for_each_node(nid) {
> mci = edac_mc_alloc(nid);
> mci_add_dimm_info(mci);
> edac_mc_add_mc(mci);
> };
>
> This patch introduces mem_info_setup() and mci_add_dimm_info().
>
> All data of the memory hierarchy is collected in a local struct
> ghes_mem_info.
>
> Note: Per (NUMA) node registration will be implemented in a later
> patch.


> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> index ea4d53043199..50f4ee36b755 100644
> --- a/drivers/edac/ghes_edac.c
> +++ b/drivers/edac/ghes_edac.c
> @@ -67,17 +67,38 @@ struct memdev_dmi_entry {
> u16 conf_mem_clk_speed;
> } __attribute__((__packed__));
>
> -struct ghes_edac_dimm_fill {
> - struct mem_ctl_info *mci;
> - unsigned count;

> +struct ghes_dimm_info {
> + struct dimm_info dimm_info;
> + int idx;
> +};

> +struct ghes_mem_info {
> + int num_dimm;
> + struct ghes_dimm_info *dimms;
> };
>
> +struct ghes_mem_info mem_info;

static?


> @@ -94,18 +115,17 @@ static int get_dimm_smbios_index(u16 handle)
>
> static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
> {
> - struct ghes_edac_dimm_fill *dimm_fill = arg;
> - struct mem_ctl_info *mci = dimm_fill->mci;
> -
> if (dh->type == DMI_ENTRY_MEM_DEVICE) {
> + int *idx = arg;
> struct memdev_dmi_entry *entry = (struct memdev_dmi_entry *)dh;
> - struct dimm_info *dimm = edac_get_dimm(mci, dimm_fill->count,
> - 0, 0);
> + struct ghes_dimm_info *mi = &mem_info.dimms[*idx];
> + struct dimm_info *dimm = &mi->dimm_info;
> u16 rdr_mask = BIT(7) | BIT(13);


> + mi->phys_handle = entry->phys_mem_array_handle;

Where did this come from, and what is it for?

...

Should this be in a later patch? (did you bisect build this?)


> if (entry->size == 0xffff) {
> - pr_info("Can't get DIMM%i size\n",
> - dimm_fill->count);
> + pr_info("Can't get DIMM%i size\n", mi->idx);
> dimm->nr_pages = MiB_TO_PAGES(32);/* Unknown */
> } else if (entry->size == 0x7fff) {
> dimm->nr_pages = MiB_TO_PAGES(entry->extended_size);


> +static int mem_info_setup(void)
> +{
> + int idx = 0;
> +
> + memset(&mem_info, 0, sizeof(mem_info));

Is this necessary? Isn't mem_info in the BSS, it will zero'd already.


> + /* Get the number of DIMMs */
> + dmi_walk(ghes_edac_count_dimms, NULL);
> + if (!mem_info.num_dimm)
> + return -EINVAL;

> + mem_info.dimms = kcalloc(mem_info.num_dimm,
> + sizeof(*mem_info.dimms), GFP_KERNEL);
> + if (!mem_info.dimms)
> + return -ENOMEM;

> + ghes_dimm_info_init();

Could you move the kcalloc() into ghes_dimm_info_init()? This would save you having a
unnecessarily-different version in mem_info_setup_fake().


> + dmi_walk(ghes_edac_dmidecode, &idx);
> +
> + return 0;
> +}

> +static int mem_info_setup_fake(void)
> +{
> + struct ghes_dimm_info *ghes_dimm;
> + struct dimm_info *dimm;
> +
> + memset(&mem_info, 0, sizeof(mem_info));

Is this necessary? Its only been touched by mem_info_setup(), and you get in here because
mem_info.num_dimm == 0...


> + ghes_dimm = kzalloc(sizeof(*mem_info.dimms), GFP_KERNEL);
> + if (!ghes_dimm)
> + return -ENOMEM;

This is common with mem_info_setup(). If ghes_dimm_info_init() read mem_info.num_dimm and
did the rest, you'd avoid some duplication here.


> + mem_info.num_dimm = 1;
> + mem_info.dimms = ghes_dimm;
> +
> + ghes_dimm_info_init();
> +
> + dimm = &ghes_dimm->dimm_info;
> + dimm->nr_pages = 1;
> + dimm->grain = 128;
> + dimm->mtype = MEM_UNKNOWN;
> + dimm->dtype = DEV_UNKNOWN;
> + dimm->edac_mode = EDAC_SECDED;
> +
> + return 0;
> +}


> +static void mci_add_dimm_info(struct mem_ctl_info *mci)

(From the name I expected this to be in edac_mc.c)


> +{
> + struct dimm_info *mci_dimm, *dmi_dimm;
> + struct ghes_dimm_info *dimm;
> + int index = 0;
> +
> + for_each_dimm(dimm) {
> + dmi_dimm = &dimm->dimm_info;
> + mci_dimm = edac_get_dimm_by_index(mci, index);
> +
> + index++;
> + if (index > mci->tot_dimms)
> + break;
> +
> + mci_dimm->nr_pages = dmi_dimm->nr_pages;
> + mci_dimm->mtype = dmi_dimm->mtype;
> + mci_dimm->edac_mode = dmi_dimm->edac_mode;
> + mci_dimm->dtype = dmi_dimm->dtype;
> + mci_dimm->grain = dmi_dimm->grain;
> + mci_dimm->smbios_handle = dmi_dimm->smbios_handle;
> }

This isn't fun. I guess 'numa' is the reason for generating a shadow copy of all this, and
then having to copy it into edac. But surely that isn't a problem unique to ghes_edac.c?

Can't you add the nid, and any other properties to struct dimm_info? It already has
smbios_handle, which is hardly useful to other drivers!


> + if (index != mci->tot_dimms)
> + pr_warn("Unexpected number of DIMMs: %d (exp. %d)\n",
> + index, mci->tot_dimms);
> }


> @@ -472,22 +566,24 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)

> mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, sizeof(struct ghes_edac_pvt));
> if (!mci) {
> - pr_info("Can't allocate memory for EDAC data\n");
> + pr_err("Can't allocate memory for EDAC data\n");

Leftover debug?

kfree(mem_info.dimms); ?

> return -ENOMEM;
> }
>
> @@ -513,26 +609,14 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)

> - if (!fake) {
> - dimm_fill.count = 0;
> - dimm_fill.mci = mci;
> - dmi_walk(ghes_edac_dmidecode, &dimm_fill);
> - } else {
> - struct dimm_info *dimm = edac_get_dimm(mci, 0, 0, 0);
> -
> - dimm->nr_pages = 1;
> - dimm->grain = 128;
> - dimm->mtype = MEM_UNKNOWN;
> - dimm->dtype = DEV_UNKNOWN;
> - dimm->edac_mode = EDAC_SECDED;
> - }
> + mci_add_dimm_info(mci);
>
> rc = edac_mc_add_mc(mci);
> if (rc < 0) {
> - pr_info("Can't register at EDAC core\n");
> + pr_err("Can't register at EDAC core\n");

Leftover debug?

> edac_mc_free(mci);

kfree(mem_info.dimms); ?

> return -ENODEV;
> }


Thanks!

James

2019-05-29 15:14:35

by James Morse

[permalink] [raw]
Subject: Re: [PATCH 11/21] EDAC, ghes: Unify trace_mc_event() code with edac_mc driver

Hi Robert,

On 29/05/2019 09:44, Robert Richter wrote:
> Almost duplicate code, remove it.

almost?


> Note: there is a difference in the calculation of the grain_bits,
> using the edac_mc's version here.

But is it the right thing to do?

Is this an off-by-one bug being papered over as some cleanup?
If so could you post a separate fix that can be picked up for an rc.

Do Marvell have firmware that populates this field?

...

Unless the argument is no one cares about this...

From ghes_edac_report_mem_error():
| /* Error grain */
| if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK)
| e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK);

Fishy, why would the kernel page-size be relevant here?

If physical_addr_mask were the same as PAGE_MASK this wouldn't this always give ~0?
(masking logic like this always does my head in)

/me gives it ago:
| {1}[Hardware Error]: physical_address: 0x00000000deadbeef
| {1}[Hardware Error]: physical_address_mask: 0xffffffffffff0000
| {1}[Hardware Error]: error_type: 6, master abort
| EDAC MC0: 1 CE Master abort on unknown label ( page:0xdead offset:0xbeef
| grain:-1 syndrome:0x0 - status(0x0000000000000001): reserved)

That 'grain:-1' is because the calculated e->grain was an unlikely 0xffffffffffffffff.
Patch incoming, if you could test it on your platform that'd be great.

I don't think ghes_edac.c wants this '+1'.


Thanks,

James

2019-05-29 15:15:40

by James Morse

[permalink] [raw]
Subject: Re: [PATCH 09/21] EDAC, ghes: Use standard kernel macros for page calculations

Hi Robert,

On 29/05/2019 09:44, Robert Richter wrote:
> Use standard macros for page calculations.

Reviewed-by: James Morse <[email protected]>


Thanks,

James

2019-05-29 15:16:04

by James Morse

[permalink] [raw]
Subject: Re: [PATCH 12/21] EDAC, ghes: Add support for legacy API counters

Hi Robert,

On 29/05/2019 09:44, Robert Richter wrote:
> The ghes driver is not able yet to count legacy API counters in sysfs,
> e.g.:
>
> /sys/devices/system/edac/mc/mc0/csrow2/ce_count
> /sys/devices/system/edac/mc/mc0/csrow2/ch0_ce_count
> /sys/devices/system/edac/mc/mc0/csrow2/ch1_ce_count
>
> Make counting csrows/channels generic so that the ghes driver can use
> it too.

What for?

Is this for an arm64 system? Surely we don't have any systems that used to work with these
legacy counters. Aren't they legacy because we want new software to stop using them!


Thanks,

James

2019-05-29 15:16:34

by James Morse

[permalink] [raw]
Subject: Re: [PATCH 10/21] EDAC, ghes: Remove pvt->detail_location string

Hi Robert,

On 29/05/2019 09:44, Robert Richter wrote:
> The detail_location[] string in struct ghes_edac_pvt is complete
> useless and data is just copied around. Put everything into
> e->other_detail from the beginning.

We still print all that complete-useless detail_location stuff... so this commit message
had me confused about what you're doing here. I think you meant the space for the string,
instead of the value!

| detail_location[] is used to collect two location strings so they can be passed as one
| to trace_mc_event(). Instead of having an extra copy step, assemble the location string
| in other_detail[] from the beginning.


> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> index 39702bac5eaf..c18f16bc9e4d 100644
> --- a/drivers/edac/ghes_edac.c
> +++ b/drivers/edac/ghes_edac.c
> @@ -23,8 +23,7 @@ struct ghes_edac_pvt {
> struct mem_ctl_info *mci;
>
> /* Buffers for the error handling routine */
> - char detail_location[240];
> - char other_detail[160];
> + char other_detail[400];
> char msg[80];
> };
>
> @@ -225,13 +224,14 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
> memset(e, 0, sizeof (*e));
> e->error_count = 1;
> strcpy(e->label, "unknown label");
> - e->msg = pvt->msg;
> - e->other_detail = pvt->other_detail;
> e->top_layer = -1;
> e->mid_layer = -1;
> e->low_layer = -1;
> - *pvt->other_detail = '\0';
> + e->msg = pvt->msg;
> + e->other_detail = pvt->other_detail;
> +
> *pvt->msg = '\0';
> + *pvt->other_detail = '\0';

... so no change? Could you drop this hunk?

Regardless,
Reviewed-by: James Morse <[email protected]>


Thanks,

James

2019-05-29 17:52:21

by James Morse

[permalink] [raw]
Subject: Re: [PATCH 14/21] EDAC, ghes: Extract numa node information for each dimm

Hi Robert,

On 29/05/2019 09:44, Robert Richter wrote:
> In a later patch we want to have one mc device per node. This patch
> extracts the numa node information for each dimm. This is done by
> collecting the physical address ranges from the DMI table (Memory
> Array Mapped Address - Type 19 of SMBIOS spec). The node information> for a physical address is already know to a numa aware system (e.g. by
> using the ACPI _PXM method or the ACPI SRAT table), so based on the PA
> we can assign the node id to the dimms.

I think you're letting the smbios information drive you here. We'd like to do as much as
possible without it, all its really good for is telling us the label on the PCB.

With this approach, you only get numa information by parsing more smbios, which we have to
try and validate, and fall back to some error path if it smells wrong. We end up needing
things like a 'fallback memory controller' in the case the firmware fault-time value is
missing, or nuts.

What bugs me is we already know the numa information from the address. We could expose
that without the smbios tables at all, and it would be useful to someone playing the
dimm-bisect game. Not making it depend on smbios means there is a good chance it can
become common with other edac drivers.

I don't think we need to know the dimm->node mapping up front. When we get an error,
pfn_to_nid() on the address tells us which node that memory is attached to. This should be
the only place nid information comes from, that way we don't need to check it. Trying to
correlate it with smbios tables is much more code. If the CPER comes with a DIMM handle,
we know the DIMM too.

So all we really need is to know at setup time is how many numa-nodes there are, and the
maximum DIMM per node if we don't want phantom-dimms. Type-17 already has a
Physical-Memory-Array-Handle, which points at Type-19... but we don't need to read it,
just count them and find the biggest.

If the type-19 information is missing from smbios, or not linked up properly, or the
values provided at fault-time don't overlap with the values in the table, or there is no
fault-time node information: you still get the numa-node information based on the address.

Using the minimum information should give us the least code, and the least exposure to
misdescribed tables.


> A fallback that disables numa is implemented in case the node
> information is inconsistent.

> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> index 50f4ee36b755..083452a48b42 100644
> --- a/drivers/edac/ghes_edac.c
> +++ b/drivers/edac/ghes_edac.c
> @@ -67,14 +67,34 @@ struct memdev_dmi_entry {
> u16 conf_mem_clk_speed;
> } __attribute__((__packed__));
>
> +/* Memory Array Mapped Address - Type 19 of SMBIOS spec */
> +struct memarr_dmi_entry {
> + u8 type;
> + u8 length;
> + u16 handle;
> + u32 start;
> + u32 end;
> + u16 phys_mem_array_handle;
> + u8 partition_width;
> + u64 ext_start;
> + u64 ext_end;
> +} __attribute__((__packed__));

Any chance we could collect the structures from the smbios spec in a header file somewhere?

I'd prefer not to read this thing at all if we can help it.

> struct ghes_dimm_info {
> struct dimm_info dimm_info;
> int idx;
> + int numa_node;

(I thought nid was the preferred term)


> + phys_addr_t start;
> + phys_addr_t end;

I think start and end are deceptive as they overlap with other DIMMs on systems with
interleaving memory controllers. I'd prefer not to keep these values around.


> + u16 phys_handle;
> };
>
> struct ghes_mem_info {
> - int num_dimm;
> + int num_dimm;
> struct ghes_dimm_info *dimms;
> + int num_nodes;

> + int num_per_node[MAX_NUMNODES];

Number of what?


> + bool enable_numa;

This is used locally in mem_info_setup(), but its not read from here by any of the later
patches in the series. Is it needed?

I don't like the idea that this is behaviour that is turned on/off. Its a property of the
system. I think it would be better if CONFIG_NUMA causes you to get multiple
memory-controllers created, but if its not actually a NUMA machine there would only be
one. This lets us test that code on not-really-numa systems.


> };
>
> struct ghes_mem_info mem_info;
> @@ -97,10 +117,50 @@ static void ghes_dimm_info_init(void)
>
> for_each_dimm(dimm) {
> dimm->idx = idx;
> + dimm->numa_node = NUMA_NO_NODE;
> idx++;
> }
> }
>
> +static void ghes_edac_set_nid(const struct dmi_header *dh, void *arg)
> +{
> + struct memarr_dmi_entry *entry = (struct memarr_dmi_entry *)dh;
> + struct ghes_dimm_info *dimm;
> + phys_addr_t start, end;
> + int nid;
> +
> + if (dh->type != DMI_ENTRY_MEM_ARRAY_MAPPED_ADDR)
> + return;

> + /* only support SMBIOS 2.7+ */
> + if (entry->length < sizeof(*entry))
> + return;

Lovely. I still hope we can get away without parsing this table.


> + if (entry->start == 0xffffffff)
> + start = entry->ext_start;
> + else
> + start = entry->start;
> + if (entry->end == 0xffffffff)
> + end = entry->ext_end;
> + else
> + end = entry->end;


> + if (!pfn_valid(PHYS_PFN(start)))
> + return;

Eh? Just because there is no struct page doesn't mean firmware won't report errors for it.
This is going to bite on arm64 if the 'start' page happens to have been reserved by
firmware, and thus doesn't have a struct page. Bottom-up allocation doesn't sound unlikely.


> + nid = pfn_to_nid(PHYS_PFN(start));

... Ugh, because pfn_to_nid() goes via struct page.

You can make this robust by scanning start->end looking for a pfn_valid() you can pull the
nid out of. (no, I don't think its a good idea either!)

I'd like to see if we can get rid of the 'via address' part of this.


> + if (nid < 0 || nid >= MAX_NUMNODES || !node_possible(nid))
> + nid = NUMA_NO_NODE;

Can this happen? Does this indicate the firmware tables are wrong, or mm is about derail?


> + for_each_dimm(dimm) {
> + if (entry->phys_mem_array_handle == dimm->phys_handle) {
> + dimm->numa_node = nid;
> + dimm->start = start;
> + dimm->end = end;
> + }
> + }
> +}
> +
> static int get_dimm_smbios_index(u16 handle)
> {
> struct mem_ctl_info *mci = ghes_pvt->mci;
> @@ -213,8 +273,25 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
> }
> }
>
> +static void mem_info_disable_numa(void)
> +{
> + struct ghes_dimm_info *dimm;
> +
> + for_each_dimm(dimm) {
> + if (dimm->numa_node != NUMA_NO_NODE)
> + mem_info.num_per_node[dimm->numa_node] = 0;

> + dimm->numa_node = 0;

NUMA_NO_NODE?

> + }
> +
> + mem_info.num_per_node[0] = mem_info.num_dimm;
> + mem_info.num_nodes = 1;
> + mem_info.enable_numa = false;
> +}
> +
> static int mem_info_setup(void)
> {
> + struct ghes_dimm_info *dimm;
> + bool enable_numa = true;
> int idx = 0;
>
> memset(&mem_info, 0, sizeof(mem_info));
> @@ -231,6 +308,29 @@ static int mem_info_setup(void)
>
> ghes_dimm_info_init();
> dmi_walk(ghes_edac_dmidecode, &idx);
> + dmi_walk(ghes_edac_set_nid, NULL);
> +
> + for_each_dimm(dimm) {
> + if (dimm->numa_node == NUMA_NO_NODE) {
> + enable_numa = false;
> + } else {

> + if (!mem_info.num_per_node[dimm->numa_node])
> + mem_info.num_nodes++;

This is to try and hide empty nodes?


> + mem_info.num_per_node[dimm->numa_node]++;

Could you do these two in your previous for_each_dimm() walk?


> + }
> +
> + edac_dbg(1, "DIMM%i: Found mem range [%pa-%pa] on node %d\n",
> + dimm->idx, &dimm->start, &dimm->end, dimm->numa_node);
> + }


> + mem_info.enable_numa = enable_numa;
> + if (enable_numa)
> + return 0;
> +
> + /* something went wrong, disable numa */
> + if (num_possible_nodes() > 1)
> + pr_warn("Can't get numa info, disabling numa\n");
> + mem_info_disable_numa();

I'd like to find a way of doing this where we don't need this sort of thing!


Thanks,

James

2019-05-31 13:43:55

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH 13/21] EDAC, ghes: Rework memory hierarchy detection

Hi James,

thanks for your review. See below.

On 29.05.19 16:06:55, James Morse wrote:
> Hi Robert,
>
> On 29/05/2019 09:44, Robert Richter wrote:
> > In a later patch we want add more information about the memory
> > hierarchy (NUMA topology, DIMM label information). Rework memory
> > hierarchy detection to make the code extendable for this.
> >
> > The general approach is roughly like:
> >
> > mem_info_setup();
> > for_each_node(nid) {
> > mci = edac_mc_alloc(nid);
> > mci_add_dimm_info(mci);
> > edac_mc_add_mc(mci);
> > };
> >
> > This patch introduces mem_info_setup() and mci_add_dimm_info().
> >
> > All data of the memory hierarchy is collected in a local struct
> > ghes_mem_info.
> >
> > Note: Per (NUMA) node registration will be implemented in a later
> > patch.
>
>
> > diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> > index ea4d53043199..50f4ee36b755 100644
> > --- a/drivers/edac/ghes_edac.c
> > +++ b/drivers/edac/ghes_edac.c
> > @@ -67,17 +67,38 @@ struct memdev_dmi_entry {
> > u16 conf_mem_clk_speed;
> > } __attribute__((__packed__));
> >
> > -struct ghes_edac_dimm_fill {
> > - struct mem_ctl_info *mci;
> > - unsigned count;
>
> > +struct ghes_dimm_info {
> > + struct dimm_info dimm_info;
> > + int idx;
> > +};
>
> > +struct ghes_mem_info {
> > + int num_dimm;
> > + struct ghes_dimm_info *dimms;
> > };
> >
> > +struct ghes_mem_info mem_info;
>
> static?

Yes, this can be made static.

Will update the code.

>
>
> > @@ -94,18 +115,17 @@ static int get_dimm_smbios_index(u16 handle)
> >
> > static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
> > {
> > - struct ghes_edac_dimm_fill *dimm_fill = arg;
> > - struct mem_ctl_info *mci = dimm_fill->mci;
> > -
> > if (dh->type == DMI_ENTRY_MEM_DEVICE) {
> > + int *idx = arg;
> > struct memdev_dmi_entry *entry = (struct memdev_dmi_entry *)dh;
> > - struct dimm_info *dimm = edac_get_dimm(mci, dimm_fill->count,
> > - 0, 0);
> > + struct ghes_dimm_info *mi = &mem_info.dimms[*idx];
> > + struct dimm_info *dimm = &mi->dimm_info;
> > u16 rdr_mask = BIT(7) | BIT(13);
>
>
> > + mi->phys_handle = entry->phys_mem_array_handle;
>
> Where did this come from, and what is it for?
>
> ...
>
> Should this be in a later patch? (did you bisect build this?)

The change should be part of the next patch:

EDAC, ghes: Extract numa node information for each dimm

It breaks the build here.

Will update the code.

>
>
> > if (entry->size == 0xffff) {
> > - pr_info("Can't get DIMM%i size\n",
> > - dimm_fill->count);
> > + pr_info("Can't get DIMM%i size\n", mi->idx);
> > dimm->nr_pages = MiB_TO_PAGES(32);/* Unknown */
> > } else if (entry->size == 0x7fff) {
> > dimm->nr_pages = MiB_TO_PAGES(entry->extended_size);
>
>
> > +static int mem_info_setup(void)
> > +{
> > + int idx = 0;
> > +
> > + memset(&mem_info, 0, sizeof(mem_info));
>
> Is this necessary? Isn't mem_info in the BSS, it will zero'd already.

For clarity I don't want to just *assume* the data is zero, instead I
want to *ensure* it is because mem_info could be initialized and used
more than one time. I think this is of not much cost but improves code
maintainability.

This structure is used locally here for all driver instances (there is
only one istance allowed but there could be a 2nd initialization
attempt after a first instance has been shut down). For consistent
data we rely on that struct being zero from the beginning. Without
that it would be zero only after a boot. I put this in here to
emphasize the assumption the struct must be zero, even if the mem is
zero when initializing it the first time, and it is very unlikely, the
driver will be initialized a 2nd time.

>
>
> > + /* Get the number of DIMMs */
> > + dmi_walk(ghes_edac_count_dimms, NULL);
> > + if (!mem_info.num_dimm)
> > + return -EINVAL;
>
> > + mem_info.dimms = kcalloc(mem_info.num_dimm,
> > + sizeof(*mem_info.dimms), GFP_KERNEL);
> > + if (!mem_info.dimms)
> > + return -ENOMEM;
>
> > + ghes_dimm_info_init();
>
> Could you move the kcalloc() into ghes_dimm_info_init()? This would save you having a
> unnecessarily-different version in mem_info_setup_fake().

Sure, see below.

>
>
> > + dmi_walk(ghes_edac_dmidecode, &idx);
> > +
> > + return 0;
> > +}
>
> > +static int mem_info_setup_fake(void)
> > +{
> > + struct ghes_dimm_info *ghes_dimm;
> > + struct dimm_info *dimm;
> > +
> > + memset(&mem_info, 0, sizeof(mem_info));
>
> Is this necessary? Its only been touched by mem_info_setup(), and you get in here because
> mem_info.num_dimm == 0...

In this particular case everything is still zero. But the above
applies here too.

>
>
> > + ghes_dimm = kzalloc(sizeof(*mem_info.dimms), GFP_KERNEL);
> > + if (!ghes_dimm)
> > + return -ENOMEM;
>
> This is common with mem_info_setup(). If ghes_dimm_info_init() read mem_info.num_dimm and
> did the rest, you'd avoid some duplication here.

Looks good to me and makes the setup code more straight.

Will update the code.

>
>
> > + mem_info.num_dimm = 1;
> > + mem_info.dimms = ghes_dimm;
> > +
> > + ghes_dimm_info_init();
> > +
> > + dimm = &ghes_dimm->dimm_info;
> > + dimm->nr_pages = 1;
> > + dimm->grain = 128;
> > + dimm->mtype = MEM_UNKNOWN;
> > + dimm->dtype = DEV_UNKNOWN;
> > + dimm->edac_mode = EDAC_SECDED;
> > +
> > + return 0;
> > +}
>
>
> > +static void mci_add_dimm_info(struct mem_ctl_info *mci)
>
> (From the name I expected this to be in edac_mc.c)

I will rename it to mem_info_prepare_mci() which is more in the line
with the other mem_info_*() functions.

Will update the code.

>
>
> > +{
> > + struct dimm_info *mci_dimm, *dmi_dimm;
> > + struct ghes_dimm_info *dimm;
> > + int index = 0;
> > +
> > + for_each_dimm(dimm) {
> > + dmi_dimm = &dimm->dimm_info;
> > + mci_dimm = edac_get_dimm_by_index(mci, index);
> > +
> > + index++;
> > + if (index > mci->tot_dimms)
> > + break;
> > +
> > + mci_dimm->nr_pages = dmi_dimm->nr_pages;
> > + mci_dimm->mtype = dmi_dimm->mtype;
> > + mci_dimm->edac_mode = dmi_dimm->edac_mode;
> > + mci_dimm->dtype = dmi_dimm->dtype;
> > + mci_dimm->grain = dmi_dimm->grain;
> > + mci_dimm->smbios_handle = dmi_dimm->smbios_handle;
> > }
>
> This isn't fun. I guess 'numa' is the reason for generating a shadow copy of all this, and
> then having to copy it into edac. But surely that isn't a problem unique to ghes_edac.c?

We need to collect all the memory hierarchy and dimm info before we
can call edac_mc_alloc(). Thus, the data is almost duplicate but I
don't see a way to avoid this.

I was thinking of splitting struct dimm_info in 2 parts with another
struct in it to copy over the data as struct in one shot without
overwriting parts of the data setup by edac_mc_alloc().

I also see this isn't ideal but don't see an alternative at the
moment.

>
> Can't you add the nid, and any other properties to struct dimm_info? It already has
> smbios_handle, which is hardly useful to other drivers!

I will move smbios_handle back to struct ghes_dimm_info.

Will update the code.

>
>
> > + if (index != mci->tot_dimms)
> > + pr_warn("Unexpected number of DIMMs: %d (exp. %d)\n",
> > + index, mci->tot_dimms);
> > }
>
>
> > @@ -472,22 +566,24 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
>
> > mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, sizeof(struct ghes_edac_pvt));
> > if (!mci) {
> > - pr_info("Can't allocate memory for EDAC data\n");
> > + pr_err("Can't allocate memory for EDAC data\n");
>
> Leftover debug?

No, this is a real error causing the init to fail. Thus, adjusting log
level here. I changed it in this patch to align the log levels with
the other new introduced error message in this patch.

>
> kfree(mem_info.dimms); ?
>
> > return -ENOMEM;
> > }
> >
> > @@ -513,26 +609,14 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
>
> > - if (!fake) {
> > - dimm_fill.count = 0;
> > - dimm_fill.mci = mci;
> > - dmi_walk(ghes_edac_dmidecode, &dimm_fill);
> > - } else {
> > - struct dimm_info *dimm = edac_get_dimm(mci, 0, 0, 0);
> > -
> > - dimm->nr_pages = 1;
> > - dimm->grain = 128;
> > - dimm->mtype = MEM_UNKNOWN;
> > - dimm->dtype = DEV_UNKNOWN;
> > - dimm->edac_mode = EDAC_SECDED;
> > - }
> > + mci_add_dimm_info(mci);
> >
> > rc = edac_mc_add_mc(mci);
> > if (rc < 0) {
> > - pr_info("Can't register at EDAC core\n");
> > + pr_err("Can't register at EDAC core\n");
>
> Leftover debug?

Same here.

>
> > edac_mc_free(mci);
>
> kfree(mem_info.dimms); ?
>
> > return -ENODEV;
> > }
>
>
> Thanks!

Thank you for review.

-Robert

>
> James

2019-05-31 14:50:19

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH 00/21] EDAC, mc, ghes: Fixes and updates to improve memory error reporting

On 29.05.19 16:54:52, Borislav Petkov wrote:
> On Wed, May 29, 2019 at 08:44:01AM +0000, Robert Richter wrote:
> > Patch #1: Repost of an already accepted patch sent to the ml. Adding
> > it here for completeness as I did not find it in a repository yet.
>
> Try mainline:
>
> 29a0c843973b ("EDAC/mc: Fix edac_mc_find() in case no device is found")
>
> :-)

Ah, right. Will rebase.

-Robert

2019-06-03 13:12:40

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH 11/21] EDAC, ghes: Unify trace_mc_event() code with edac_mc driver

On 29.05.19 16:12:38, James Morse wrote:
> Hi Robert,
>
> On 29/05/2019 09:44, Robert Richter wrote:
> > Almost duplicate code, remove it.
>
> almost?

The grain ... as noted below.

>
>
> > Note: there is a difference in the calculation of the grain_bits,
> > using the edac_mc's version here.
>
> But is it the right thing to do?
>
> Is this an off-by-one bug being papered over as some cleanup?
> If so could you post a separate fix that can be picked up for an rc.
>
> Do Marvell have firmware that populates this field?
>
> ...
>
> Unless the argument is no one cares about this...
>
> >From ghes_edac_report_mem_error():
> | /* Error grain */
> | if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK)
> | e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK);
>
> Fishy, why would the kernel page-size be relevant here?

That looked broken to me too, I did not put to much effort in fixing
the grain yet. So I just took the edac_mc version first in the
assumption, that one is working.

It looks like the intention here is to limit the grain to the page
size. But right, the calculation is wrong here. I am also going to
reply to your patch you sent on this.

>
> If physical_addr_mask were the same as PAGE_MASK this wouldn't this always give ~0?
> (masking logic like this always does my head in)
>
> /me gives it ago:
> | {1}[Hardware Error]: physical_address: 0x00000000deadbeef
> | {1}[Hardware Error]: physical_address_mask: 0xffffffffffff0000
> | {1}[Hardware Error]: error_type: 6, master abort
> | EDAC MC0: 1 CE Master abort on unknown label ( page:0xdead offset:0xbeef
> | grain:-1 syndrome:0x0 - status(0x0000000000000001): reserved)
>
> That 'grain:-1' is because the calculated e->grain was an unlikely 0xffffffffffffffff.
> Patch incoming, if you could test it on your platform that'd be great.
>
> I don't think ghes_edac.c wants this '+1'.

The +1 looks odd to me also for the edac_mc driver, but I need to take
a closer look here as well as some logs suggest the grain is
calculated correctly.

I will do some further examination here and also respond to your
patch.

Thank you for review.

-Robert

2019-06-04 17:18:28

by James Morse

[permalink] [raw]
Subject: Re: [PATCH 11/21] EDAC, ghes: Unify trace_mc_event() code with edac_mc driver

Hi Robert,

On 03/06/2019 14:10, Robert Richter wrote:
> On 29.05.19 16:12:38, James Morse wrote:
>> On 29/05/2019 09:44, Robert Richter wrote:
>>> Almost duplicate code, remove it.
>>
>>> Note: there is a difference in the calculation of the grain_bits,
>>> using the edac_mc's version here.
>>
>> But is it the right thing to do?
>>
>> Is this an off-by-one bug being papered over as some cleanup?
>> If so could you post a separate fix that can be picked up for an rc.
>>
>> Do Marvell have firmware that populates this field?
>>
>> ...
>>
>> Unless the argument is no one cares about this...
>>
>> >From ghes_edac_report_mem_error():
>> | /* Error grain */
>> | if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK)
>> | e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK);
>>
>> Fishy, why would the kernel page-size be relevant here?
>
> That looked broken to me too, I did not put to much effort in fixing
> the grain yet. So I just took the edac_mc version first in the
> assumption, that one is working.

(Ah, it would have been good to note this in the commit-message)


> It looks like the intention here is to limit the grain to the page
> size.
I'm not convinced that makes sense. If some architecture let you configure the page-size,
(as arm64 does), and your hypervisor had a bigger page-size, then any hardware fault would
be rounded up to hypervisor's page-size.

The kernel's page-size has very little to do with the error, it only matters for when we
go unmapping stuff in memory_failure().


> But right, the calculation is wrong here. I am also going to
> reply to your patch you sent on this.

Thanks!


>> If physical_addr_mask were the same as PAGE_MASK this wouldn't this always give ~0?
>> (masking logic like this always does my head in)
>>
>> /me gives it ago:
>> | {1}[Hardware Error]: physical_address: 0x00000000deadbeef
>> | {1}[Hardware Error]: physical_address_mask: 0xffffffffffff0000
>> | {1}[Hardware Error]: error_type: 6, master abort
>> | EDAC MC0: 1 CE Master abort on unknown label ( page:0xdead offset:0xbeef
>> | grain:-1 syndrome:0x0 - status(0x0000000000000001): reserved)
>>
>> That 'grain:-1' is because the calculated e->grain was an unlikely 0xffffffffffffffff.
>> Patch incoming, if you could test it on your platform that'd be great.
>>
>> I don't think ghes_edac.c wants this '+1'.
>
> The +1 looks odd to me also for the edac_mc driver, but I need to take
> a closer look here as well as some logs suggest the grain is
> calculated correctly.

My theory on this is that ghes_edac.c is generating a grain like 0x1000, fls() does the
right thing. Other edac drivers are generating a grain like 0xfff to describe the same
size, fls() is now off-by-one, hence the addition.
I don't have a platform where I can trigger any other edac driver to test this though.

The way round this would be to put the grain_bits in struct edac_raw_error_desc so that
ghes_edac.c can calculate it directly.


Thanks,

James

2019-06-11 17:30:00

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 02/21] EDAC: Fixes to use put_device() after device_add() errors

On Wed, May 29, 2019 at 08:44:05AM +0000, Robert Richter wrote:
> Always use put_device() after device_add() failed.
>
> Signed-off-by: Robert Richter <[email protected]>
> ---
> drivers/edac/edac_mc_sysfs.c | 36 +++++++++++++++++++-----------------
> 1 file changed, 19 insertions(+), 17 deletions(-)

I already have a partial fix for that, you can send me the rest ontop:

https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/commit/?h=for-next&id=f5d59da9663d115b9cf62cce75a33382c880b560

Thx.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-06-12 18:08:54

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH 02/21] EDAC: Fixes to use put_device() after device_add() errors

On 11.06.19 19:28:30, Borislav Petkov wrote:
> On Wed, May 29, 2019 at 08:44:05AM +0000, Robert Richter wrote:
> > Always use put_device() after device_add() failed.
> >
> > Signed-off-by: Robert Richter <[email protected]>
> > ---
> > drivers/edac/edac_mc_sysfs.c | 36 +++++++++++++++++++-----------------
> > 1 file changed, 19 insertions(+), 17 deletions(-)
>
> I already have a partial fix for that, you can send me the rest ontop:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/commit/?h=for-next&id=f5d59da9663d115b9cf62cce75a33382c880b560

Ok, I have rebased onto for-next.

Thanks,

-Robert

2019-06-12 18:17:09

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH 10/21] EDAC, ghes: Remove pvt->detail_location string

On 29.05.19 16:13:20, James Morse wrote:
> Hi Robert,
>
> On 29/05/2019 09:44, Robert Richter wrote:
> > The detail_location[] string in struct ghes_edac_pvt is complete
> > useless and data is just copied around. Put everything into
> > e->other_detail from the beginning.
>
> We still print all that complete-useless detail_location stuff... so this commit message
> had me confused about what you're doing here. I think you meant the space for the string,
> instead of the value!

No, this patch does not remove the driver details nor it changes the
data representation. It changes how that data is prepared internally.
The patch removes the (useless) intermediate buffer detail_location[]
and writes everything directly into other_detail[] buffer.

>
> | detail_location[] is used to collect two location strings so they can be passed as one
> | to trace_mc_event(). Instead of having an extra copy step, assemble the location string
> | in other_detail[] from the beginning.
>
>
> > diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> > index 39702bac5eaf..c18f16bc9e4d 100644
> > --- a/drivers/edac/ghes_edac.c
> > +++ b/drivers/edac/ghes_edac.c
> > @@ -23,8 +23,7 @@ struct ghes_edac_pvt {
> > struct mem_ctl_info *mci;
> >
> > /* Buffers for the error handling routine */
> > - char detail_location[240];
> > - char other_detail[160];
> > + char other_detail[400];
> > char msg[80];
> > };
> >
> > @@ -225,13 +224,14 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
> > memset(e, 0, sizeof (*e));
> > e->error_count = 1;
> > strcpy(e->label, "unknown label");
> > - e->msg = pvt->msg;
> > - e->other_detail = pvt->other_detail;
> > e->top_layer = -1;
> > e->mid_layer = -1;
> > e->low_layer = -1;
> > - *pvt->other_detail = '\0';
> > + e->msg = pvt->msg;
> > + e->other_detail = pvt->other_detail;
> > +
> > *pvt->msg = '\0';
> > + *pvt->other_detail = '\0';
>
> ... so no change? Could you drop this hunk?

No actual change here, but I found it useful to reorder things here
for comparization with the similar code block in
edac_mc_handle_error().


>
> Regardless,
> Reviewed-by: James Morse <[email protected]>

Thank you for review.

-Robert

>
>
> Thanks,
>
> James

2019-06-12 18:42:35

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH 12/21] EDAC, ghes: Add support for legacy API counters

On 29.05.19 16:13:02, James Morse wrote:
> On 29/05/2019 09:44, Robert Richter wrote:
> > The ghes driver is not able yet to count legacy API counters in sysfs,
> > e.g.:
> >
> > /sys/devices/system/edac/mc/mc0/csrow2/ce_count
> > /sys/devices/system/edac/mc/mc0/csrow2/ch0_ce_count
> > /sys/devices/system/edac/mc/mc0/csrow2/ch1_ce_count
> >
> > Make counting csrows/channels generic so that the ghes driver can use
> > it too.
>
> What for?

With EDAC_LEGACY_SYSFS enabled those counters are exposed to sysfs,
but the numbers are wrong (all zero).

> Is this for an arm64 system? Surely we don't have any systems that used to work with these
> legacy counters. Aren't they legacy because we want new software to stop using them!

The option is to support legacy userland. If we want to provide a
similar "user experience" as for x86 the counters should be correct.
Of course it is not a real mapping to csrows, but it makes that i/f
work.

In any case, this patch cleans up code as old API's counter code is
isolated and moved to common code. Making the counter's work for ghes
is actually a side-effect here. The cleanup is a prerequisit for
follow on patches.

-Robert

>
>
> Thanks,
>
> James

2019-06-13 20:54:47

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH 14/21] EDAC, ghes: Extract numa node information for each dimm

Hi James,

thank you for your review and response here. See my comments below.

On 29.05.19 18:51:00, James Morse wrote:
> On 29/05/2019 09:44, Robert Richter wrote:
> > In a later patch we want to have one mc device per node. This patch
> > extracts the numa node information for each dimm. This is done by
> > collecting the physical address ranges from the DMI table (Memory
> > Array Mapped Address - Type 19 of SMBIOS spec). The node information> for a physical address is already know to a numa aware system (e.g. by
> > using the ACPI _PXM method or the ACPI SRAT table), so based on the PA
> > we can assign the node id to the dimms.
>
> I think you're letting the smbios information drive you here. We'd like to do as much as
> possible without it, all its really good for is telling us the label on the PCB.
>
> With this approach, you only get numa information by parsing more smbios, which we have to
> try and validate, and fall back to some error path if it smells wrong. We end up needing
> things like a 'fallback memory controller' in the case the firmware fault-time value is
> missing, or nuts.
>
> What bugs me is we already know the numa information from the address. We could expose
> that without the smbios tables at all, and it would be useful to someone playing the
> dimm-bisect game. Not making it depend on smbios means there is a good chance it can
> become common with other edac drivers.

What a ghes driver will never have common with other edac drivers
is the knowledge of the memory hierarchy. Other drivers know the
underlying hardware and can determine the total number of dimms and
their location mapping indicated by a tuple (card/module, row/channel,
top_layer/mid_layer, etc.) using something like:

index = top_layer * mid_layer_size + mid_layer;

The ghes driver cannot calculate a dimm index in that way since the
size of each layer is unknown. This only leaves you using the
dmi_handle from the error record to do the dimm mapping. I don't see
any other way here.

>
> I don't think we need to know the dimm->node mapping up front. When we get an error,
> pfn_to_nid() on the address tells us which node that memory is attached to. This should be
> the only place nid information comes from, that way we don't need to check it. Trying to
> correlate it with smbios tables is much more code. If the CPER comes with a DIMM handle,
> we know the DIMM too.

The dimm/node mapping is not the issue here and we could also use the
phys addr to select the node's memory controller. But we still need to
be able to somehow select the dimm the error belongs to. The ghes
driver cannot use the location tuple here to get the dimm index in the
mc's array.

> So all we really need is to know at setup time is how many numa-nodes there are, and the
> maximum DIMM per node if we don't want phantom-dimms. Type-17 already has a
> Physical-Memory-Array-Handle, which points at Type-19... but we don't need to read it,
> just count them and find the biggest.
>
> If the type-19 information is missing from smbios, or not linked up properly, or the
> values provided at fault-time don't overlap with the values in the table, or there is no
> fault-time node information: you still get the numa-node information based on the address.
>
> Using the minimum information should give us the least code, and the least exposure to
> misdescribed tables.

As said, we need the firmware here to locate the correct dimm an error
is reported for. I also would like to use the information of the
smbios at a minimum, but we rely on correct firmware tables here.
Assuming a broken fw and still having a correct driver does not work.
Why not just blame the firmware if something is wrong? I am sure it
will be corrected if edac does not properly work.

>
>
> > A fallback that disables numa is implemented in case the node
> > information is inconsistent.
>
> > diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> > index 50f4ee36b755..083452a48b42 100644
> > --- a/drivers/edac/ghes_edac.c
> > +++ b/drivers/edac/ghes_edac.c
> > @@ -67,14 +67,34 @@ struct memdev_dmi_entry {
> > u16 conf_mem_clk_speed;
> > } __attribute__((__packed__));
> >
> > +/* Memory Array Mapped Address - Type 19 of SMBIOS spec */
> > +struct memarr_dmi_entry {
> > + u8 type;
> > + u8 length;
> > + u16 handle;
> > + u32 start;
> > + u32 end;
> > + u16 phys_mem_array_handle;
> > + u8 partition_width;
> > + u64 ext_start;
> > + u64 ext_end;
> > +} __attribute__((__packed__));
>
> Any chance we could collect the structures from the smbios spec in a header file somewhere?

I could create a new ghes_edac.h file, but the only user is
ghes_edac.c? Does not make sense to me.

>
> I'd prefer not to read this thing at all if we can help it.

I don't see how else we identify the dimm other than the phys addr
range and the smbios handle?

>
> > struct ghes_dimm_info {
> > struct dimm_info dimm_info;
> > int idx;
> > + int numa_node;
>
> (I thought nid was the preferred term)

struct device uses numa_node here, so I chose this one.

>
>
> > + phys_addr_t start;
> > + phys_addr_t end;
>
> I think start and end are deceptive as they overlap with other DIMMs on systems with
> interleaving memory controllers. I'd prefer not to keep these values around.

The (start) address is only used for dimm/node mapping.

>
>
> > + u16 phys_handle;
> > };
> >
> > struct ghes_mem_info {
> > - int num_dimm;
> > + int num_dimm;
> > struct ghes_dimm_info *dimms;
> > + int num_nodes;
>
> > + int num_per_node[MAX_NUMNODES];
>
> Number of what?

dimms_per_node, will change.

>
>
> > + bool enable_numa;
>
> This is used locally in mem_info_setup(), but its not read from here by any of the later
> patches in the series. Is it needed?

No, not really, will remove it.

>
> I don't like the idea that this is behaviour that is turned on/off. Its a property of the
> system. I think it would be better if CONFIG_NUMA causes you to get multiple
> memory-controllers created, but if its not actually a NUMA machine there would only be
> one. This lets us test that code on not-really-numa systems.

There is only one node if CONFIG_NUMA is disabled and only one mc is
created.

We disable per-node memory controllers only if the node id cannot be
determined properly for some reason.

>
>
> > };
> >
> > struct ghes_mem_info mem_info;
> > @@ -97,10 +117,50 @@ static void ghes_dimm_info_init(void)
> >
> > for_each_dimm(dimm) {
> > dimm->idx = idx;
> > + dimm->numa_node = NUMA_NO_NODE;
> > idx++;
> > }
> > }
> >
> > +static void ghes_edac_set_nid(const struct dmi_header *dh, void *arg)
> > +{
> > + struct memarr_dmi_entry *entry = (struct memarr_dmi_entry *)dh;
> > + struct ghes_dimm_info *dimm;
> > + phys_addr_t start, end;
> > + int nid;
> > +
> > + if (dh->type != DMI_ENTRY_MEM_ARRAY_MAPPED_ADDR)
> > + return;
>
> > + /* only support SMBIOS 2.7+ */
> > + if (entry->length < sizeof(*entry))
> > + return;
>
> Lovely. I still hope we can get away without parsing this table.
>
>
> > + if (entry->start == 0xffffffff)
> > + start = entry->ext_start;
> > + else
> > + start = entry->start;
> > + if (entry->end == 0xffffffff)
> > + end = entry->ext_end;
> > + else
> > + end = entry->end;
>
>
> > + if (!pfn_valid(PHYS_PFN(start)))
> > + return;
>
> Eh? Just because there is no struct page doesn't mean firmware won't report errors for it.
> This is going to bite on arm64 if the 'start' page happens to have been reserved by
> firmware, and thus doesn't have a struct page. Bottom-up allocation doesn't sound unlikely.

It looks like the memblock areas have a finer granularity than the
memory ranges in the DMI and SRAT table. DMI and SRAT have the same
areas on the system that I have used for testing.

SRAT is the area used to setup pfns. If that maps with the dmi table
and the memblocks are within that range, I don't see an issue.

The fallback would be the node is not detectable and per-node mc
allocation is disabled.

----
# dmidecode | grep -A 5 'Memory Array Mapped Address'
Memory Array Mapped Address
Starting Address: 0x0000000080000000k
Ending Address: 0x00000000FEFFFFFFk
Range Size: 2032 MB
Physical Array Handle: 0x0037
Partition Width: 1
--
Memory Array Mapped Address
Starting Address: 0x0000000880000000k
Ending Address: 0x0000000FFFFFFFFFk
Range Size: 30 GB
Physical Array Handle: 0x0037
Partition Width: 1
--
Memory Array Mapped Address
Starting Address: 0x0000008800000000k
Ending Address: 0x0000009FFCFFFFFFk
Range Size: 98256 MB
Physical Array Handle: 0x0037
Partition Width: 1
--
Memory Array Mapped Address
Starting Address: 0x0000009FFD000000k
Ending Address: 0x000000BFFCFFFFFFk
Range Size: 128 GB
Physical Array Handle: 0x004E
Partition Width: 1
# dmesg | grep SRAT:.*mem
[ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x80000000-0xfeffffff]
[ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x880000000-0xfffffffff]
[ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x8800000000-0x9ffcffffff]
[ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x9ffd000000-0xbffcffffff]
# dmesg
[...]
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x00000000802f0000-0x000000008030ffff]
[ 0.000000] node 0: [mem 0x0000000080310000-0x00000000bfffffff]
[ 0.000000] node 0: [mem 0x00000000c0000000-0x00000000c0ccffff]
[ 0.000000] node 0: [mem 0x00000000c0cd0000-0x00000000f95effff]
[ 0.000000] node 0: [mem 0x00000000f95f0000-0x00000000f961ffff]
[ 0.000000] node 0: [mem 0x00000000f9620000-0x00000000fac3ffff]
[ 0.000000] node 0: [mem 0x00000000fac40000-0x00000000faddffff]
[ 0.000000] node 0: [mem 0x00000000fade0000-0x00000000fc8dffff]
[ 0.000000] node 0: [mem 0x00000000fc8e0000-0x00000000fc8effff]
[ 0.000000] node 0: [mem 0x00000000fc8f0000-0x00000000fcaaffff]
[ 0.000000] node 0: [mem 0x00000000fcab0000-0x00000000fcacffff]
[ 0.000000] node 0: [mem 0x00000000fcad0000-0x00000000fcb4ffff]
[ 0.000000] node 0: [mem 0x00000000fcb50000-0x00000000fd1fffff]
[ 0.000000] node 0: [mem 0x00000000fd200000-0x00000000fecfffff]
[ 0.000000] node 0: [mem 0x00000000fed00000-0x00000000fed2ffff]
[ 0.000000] node 0: [mem 0x00000000fed30000-0x00000000fed3ffff]
[ 0.000000] node 0: [mem 0x00000000fed40000-0x00000000fedeffff]
[ 0.000000] node 0: [mem 0x00000000fedf0000-0x00000000feffffff]
[ 0.000000] node 0: [mem 0x0000000880000000-0x0000000fffffffff]
[ 0.000000] node 0: [mem 0x0000008800000000-0x0000009ffcffffff]
[ 0.000000] node 1: [mem 0x0000009ffd000000-0x000000bffcffffff]
[...]
----

>
>
> > + nid = pfn_to_nid(PHYS_PFN(start));
>
> ... Ugh, because pfn_to_nid() goes via struct page.
>
> You can make this robust by scanning start->end looking for a pfn_valid() you can pull the
> nid out of. (no, I don't think its a good idea either!)
>
> I'd like to see if we can get rid of the 'via address' part of this.
>
>
> > + if (nid < 0 || nid >= MAX_NUMNODES || !node_possible(nid))
> > + nid = NUMA_NO_NODE;
>
> Can this happen? Does this indicate the firmware tables are wrong, or mm is about derail?

It's a range check, pfn_to_nid() is implementation defined, just make
sure things are as expected.

>
>
> > + for_each_dimm(dimm) {
> > + if (entry->phys_mem_array_handle == dimm->phys_handle) {
> > + dimm->numa_node = nid;
> > + dimm->start = start;
> > + dimm->end = end;
> > + }
> > + }
> > +}
> > +
> > static int get_dimm_smbios_index(u16 handle)
> > {
> > struct mem_ctl_info *mci = ghes_pvt->mci;
> > @@ -213,8 +273,25 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
> > }
> > }
> >
> > +static void mem_info_disable_numa(void)
> > +{
> > + struct ghes_dimm_info *dimm;
> > +
> > + for_each_dimm(dimm) {
> > + if (dimm->numa_node != NUMA_NO_NODE)
> > + mem_info.num_per_node[dimm->numa_node] = 0;
>
> > + dimm->numa_node = 0;
>
> NUMA_NO_NODE?

No, this is the index to the one and only mem controller that we have
with numa disabled for edac.

>
> > + }
> > +
> > + mem_info.num_per_node[0] = mem_info.num_dimm;
> > + mem_info.num_nodes = 1;
> > + mem_info.enable_numa = false;
> > +}
> > +
> > static int mem_info_setup(void)
> > {
> > + struct ghes_dimm_info *dimm;
> > + bool enable_numa = true;
> > int idx = 0;
> >
> > memset(&mem_info, 0, sizeof(mem_info));
> > @@ -231,6 +308,29 @@ static int mem_info_setup(void)
> >
> > ghes_dimm_info_init();
> > dmi_walk(ghes_edac_dmidecode, &idx);
> > + dmi_walk(ghes_edac_set_nid, NULL);
> > +
> > + for_each_dimm(dimm) {
> > + if (dimm->numa_node == NUMA_NO_NODE) {
> > + enable_numa = false;
> > + } else {
>
> > + if (!mem_info.num_per_node[dimm->numa_node])
> > + mem_info.num_nodes++;
>
> This is to try and hide empty nodes?

This is consumed nowhere and can be removed.

>
>
> > + mem_info.num_per_node[dimm->numa_node]++;
>
> Could you do these two in your previous for_each_dimm() walk?

This must be called after the ghes_edac_set_nid walker.

>
>
> > + }
> > +
> > + edac_dbg(1, "DIMM%i: Found mem range [%pa-%pa] on node %d\n",
> > + dimm->idx, &dimm->start, &dimm->end, dimm->numa_node);
> > + }
>
>
> > + mem_info.enable_numa = enable_numa;
> > + if (enable_numa)
> > + return 0;
> > +
> > + /* something went wrong, disable numa */
> > + if (num_possible_nodes() > 1)
> > + pr_warn("Can't get numa info, disabling numa\n");
> > + mem_info_disable_numa();
>
> I'd like to find a way of doing this where we don't need this sort of thing!

I fear that might not be possible and you can't have one without the
other. You need the tables to setup the dimms and you need the smbios
handle to map the error to the dimm.

-Robert

>
>
> Thanks,
>
> James

2019-06-13 22:24:23

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH 11/21] EDAC, ghes: Unify trace_mc_event() code with edac_mc driver

On 04.06.19 18:15:50, James Morse wrote:
> On 03/06/2019 14:10, Robert Richter wrote:
> > The +1 looks odd to me also for the edac_mc driver, but I need to take
> > a closer look here as well as some logs suggest the grain is
> > calculated correctly.
>
> My theory on this is that ghes_edac.c is generating a grain like 0x1000, fls() does the
> right thing. Other edac drivers are generating a grain like 0xfff to describe the same
> size, fls() is now off-by-one, hence the addition.
> I don't have a platform where I can trigger any other edac driver to test this though.
>
> The way round this would be to put the grain_bits in struct edac_raw_error_desc so that
> ghes_edac.c can calculate it directly.

I think the grain calculation in edac_mc is broken from the beginning:

53f2d0289875 RAS: Add a tracepoint for reporting memory controller events

The log we see in the patch desc is:

mc_event: 1 Corrected error:memory read on memory stick DIMM_1A (mc:0 location:0:0:0 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 area:DMA)

The grain reported in mc_event there is probably 8 (quad word,
granularity in bytes) and calculates as follows:

dimm->grain = 8
e->grain = dimm->grain
grain_bits = fls_long(e->grain) + 1 = 4 + 1 = 5
__entry->grain_bits = grain_bits
TP_printk() = 1 << __entry->grain_bits = 2 << 5 = 32

So the reported grain of 32 should actually be 8.

I think the following is correct:

grain_bits = fls_long(e->grain ? e->grain - 1 : 0);

This also handles the case if e->grain is not a power of 2.

Thoughts?

-Robert

2019-06-19 17:23:21

by James Morse

[permalink] [raw]
Subject: Re: [PATCH 12/21] EDAC, ghes: Add support for legacy API counters

Hi Robert,

On 12/06/2019 19:41, Robert Richter wrote:
> On 29.05.19 16:13:02, James Morse wrote:
>> On 29/05/2019 09:44, Robert Richter wrote:
>>> The ghes driver is not able yet to count legacy API counters in sysfs,
>>> e.g.:
>>>
>>> /sys/devices/system/edac/mc/mc0/csrow2/ce_count
>>> /sys/devices/system/edac/mc/mc0/csrow2/ch0_ce_count
>>> /sys/devices/system/edac/mc/mc0/csrow2/ch1_ce_count
>>>
>>> Make counting csrows/channels generic so that the ghes driver can use
>>> it too.
>>
>> What for?
>
> With EDAC_LEGACY_SYSFS enabled those counters are exposed to sysfs,
> but the numbers are wrong (all zero).

Excellent, so its legacy and broken.


>> Is this for an arm64 system? Surely we don't have any systems that used to work with these
>> legacy counters. Aren't they legacy because we want new software to stop using them!
>
> The option is to support legacy userland. If we want to provide a> similar "user experience" as for x86 the counters should be correct.

The flip-side is arm64 doesn't have the same baggage. These counters have never worked
with this driver (even on x86).

This ghes driver also probes on HPE Server platform, so the architecture isn't really
relevant. (I was curious why Marvell care).


> Of course it is not a real mapping to csrows, but it makes that i/f
> work.

(...which stinks)


> In any case, this patch cleans up code as old API's counter code is
> isolated and moved to common code. Making the counter's work for ghes
> is actually a side-effect here. The cleanup is a prerequisit for
> follow on patches.

I'm all for removing/warning-its-broken it when ghes_edac is in use. But the convincing
argument is debian ships a 'current' version of edac-utils that predates 199747106934,
(that made all this fake csrow stuff deprecated), and debian's popcon says ~1000 people
have it installed.


If you want it fixed, please don't do it as a side effect of cleanup. Fixes need to be a
small separate series that can be backported. (unless we're confident no-one uses it, in
which case, why fix it?)



Thanks,

James

2019-06-20 06:56:25

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH 12/21] EDAC, ghes: Add support for legacy API counters

On 19.06.19 18:22:32, James Morse wrote:
> > In any case, this patch cleans up code as old API's counter code is
> > isolated and moved to common code. Making the counter's work for ghes
> > is actually a side-effect here. The cleanup is a prerequisit for
> > follow on patches.
>
> I'm all for removing/warning-its-broken it when ghes_edac is in use. But the convincing
> argument is debian ships a 'current' version of edac-utils that predates 199747106934,
> (that made all this fake csrow stuff deprecated), and debian's popcon says ~1000 people
> have it installed.

All arm64 distribution kernels that I have checked come with:

CONFIG_EDAC_SUPPORT=y
CONFIG_EDAC=y
CONFIG_EDAC_LEGACY_SYSFS=y
# CONFIG_EDAC_DEBUG is not set
CONFIG_EDAC_GHES=y
CONFIG_EDAC_LAYERSCAPE=m
CONFIG_EDAC_THUNDERX=m
CONFIG_EDAC_XGENE=m

> If you want it fixed, please don't do it as a side effect of cleanup. Fixes need to be a
> small separate series that can be backported. (unless we're confident no-one uses it, in
> which case, why fix it?)

It is not that I am keen on fixing legacy edac sysfs. It just happens
while unifying the error handlers in ghes_edac and edac_mc. As I see
you are reluctant on just letting it go, let's just disable
EDAC_LEGACY_SYSFS for ARM64. Though, I don't agree with it as there
still could be some userland tools that use this interface that cannot
be used any longer after a transition from x86 to arm64. I leave that
decision up to you. Please just ACK a patch with the Kconfig change
which I will add to my v2 series.

Thanks,

-Robert

2019-06-26 09:33:52

by James Morse

[permalink] [raw]
Subject: Re: [PATCH 12/21] EDAC, ghes: Add support for legacy API counters

Hi Robert,

On 20/06/2019 07:55, Robert Richter wrote:
> On 19.06.19 18:22:32, James Morse wrote:
>>> In any case, this patch cleans up code as old API's counter code is
>>> isolated and moved to common code. Making the counter's work for ghes
>>> is actually a side-effect here. The cleanup is a prerequisit for
>>> follow on patches.
>>
>> I'm all for removing/warning-its-broken it when ghes_edac is in use. But the convincing
>> argument is debian ships a 'current' version of edac-utils that predates 199747106934,
>> (that made all this fake csrow stuff deprecated), and debian's popcon says ~1000 people
>> have it installed.
>
> All arm64 distribution kernels that I have checked come with:
>
> CONFIG_EDAC_SUPPORT=y
> CONFIG_EDAC=y
> CONFIG_EDAC_LEGACY_SYSFS=y
> # CONFIG_EDAC_DEBUG is not set
> CONFIG_EDAC_GHES=y
> CONFIG_EDAC_LAYERSCAPE=m
> CONFIG_EDAC_THUNDERX=m
> CONFIG_EDAC_XGENE=m

(distros also enable drivers for hardware no-one has!)

Who uses this? edac-utils, on both arm64 and x86.


>> If you want it fixed, please don't do it as a side effect of cleanup. Fixes need to be a
>> small separate series that can be backported. (unless we're confident no-one uses it, in
>> which case, why fix it?)

> It is not that I am keen on fixing legacy edac sysfs. It just happens
> while unifying the error handlers in ghes_edac and edac_mc. As I see
> you are reluctant on just letting it go, let's just disable
> EDAC_LEGACY_SYSFS for ARM64.

That would break other drivers where those legacy counters expose valid values.

You're painting me as some kind of stubborn villan here. You're right my initial reaction
was 'what for?'. Adding new support for legacy counters that have never worked with
ghes_edac looks like the wrong thing to do.

But unfortunately edac-utils is still using this legacy interface.

If we're going to fix it, could we fix it properly? (separate series that can be
backported to stable).


> Though, I don't agree with it as there
> still could be some userland tools that use this interface that cannot
> be used any longer after a transition from x86 to arm64.

I don't think this is the right thing to do. ghes_edac's behaviour should not change
between architectures.


Where we aren't agreeing is how we fix bugs:

Its either broken, and no-one cares, we should remove it.
Or, we should fix it and those fixes should go to stable.

We can't mix fixes and features in a patch series, as the fixes then can't easily be
backported. If its ever in doubt, the patches should still be as separate fixes so the
maintainer can decide.


Thanks,

James

2019-06-26 10:28:22

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH 12/21] EDAC, ghes: Add support for legacy API counters

On 26.06.19 10:33:28, James Morse wrote:
> On 20/06/2019 07:55, Robert Richter wrote:

> > It is not that I am keen on fixing legacy edac sysfs. It just happens
> > while unifying the error handlers in ghes_edac and edac_mc. As I see
> > you are reluctant on just letting it go, let's just disable
> > EDAC_LEGACY_SYSFS for ARM64.
>
> That would break other drivers where those legacy counters expose valid values.
>
> You're painting me as some kind of stubborn villan here. You're right my initial reaction
> was 'what for?'. Adding new support for legacy counters that have never worked with
> ghes_edac looks like the wrong thing to do.
>
> But unfortunately edac-utils is still using this legacy interface.

I am sorry for mis-understanding you here. I haven't seen your
motivation for this which is now clear to me.

> If we're going to fix it, could we fix it properly? (separate series that can be
> backported to stable).

I see your point here. This is also the reason why I (try to) put
fixes at the beginning of a series to allow backports to stable (or
distros). Clearly, this must be better separated here.

> > Though, I don't agree with it as there
> > still could be some userland tools that use this interface that cannot
> > be used any longer after a transition from x86 to arm64.
>
> I don't think this is the right thing to do. ghes_edac's behaviour should not change
> between architectures.
>
>
> Where we aren't agreeing is how we fix bugs:
>
> Its either broken, and no-one cares, we should remove it.
> Or, we should fix it and those fixes should go to stable.
>
> We can't mix fixes and features in a patch series, as the fixes then can't easily be
> backported. If its ever in doubt, the patches should still be as separate fixes so the
> maintainer can decide.

I will better separate fix here and update in the next v3.

Thanks and sorry again,

-Robert