2020-04-22 12:00:31

by Robert Richter

[permalink] [raw]
Subject: [PATCH v2 00/10] EDAC/mc/ghes: Fixes, cleanup and reworks

This series contains edac fixes and a significant cleanup and rework
of the ghes driver:

* fixes and updates for edac_mc (patches #1, #2),

* removed smbios_handle from struct dimm_info (patch #4),

* fix of DIMM label in error reports (patch #5),

* general ghes_edac cleanup and rework (patches #3, #6-#10).

First 2 patches for edac_mc are individual patches, the remaining
patches do not depend on them.

Tested on a Marvell/Cavium ThunderX2 Sabre (dual socket) system.

v1:
* https://lore.kernel.org/patchwork/cover/1205901/

v2:
* reordered patches to have fixes and struct changes first, code
refactoring patches last,
* dropped v1 patches #9 to #11 (multiple conrollers) to handle them
in a separate series,
* rewrote patch to remove smbios_handle (based on v1 #9): EDAC/ghes:
Move smbios_handle from struct dimm_info to ghes private data,
* added lockdep_assert_held() checkers,
* renamed struct ghes_dimm_fill to struct dimm_fill,
* renamed local variable dimms to dimm_list to avoid conflict with
the global variable,
* removed dimm list for "fake" controller,
* fixed return code check to use (rc < 0),
* added: EDAC/mc: Fix usage of snprintf() and dimm location setup


Robert Richter (10):
EDAC/mc: Fix usage of snprintf() and dimm location setup
EDAC/mc: Use int type for parameters of edac_mc_alloc()
EDAC/ghes: Remove unused members of struct ghes_edac_pvt, rename it to
ghes_mci
EDAC/ghes: Make SMBIOS handle private data to ghes
EDAC/ghes: Setup DIMM label from DMI and use it in error reports
EDAC/ghes: Remove local variable rdr_mask in ghes_edac_dmidecode()
EDAC/ghes: Cleanup struct ghes_edac_dimm_fill, rename it to
ghes_dimm_fill
EDAC/ghes: Carve out MC device handling into separate functions
EDAC/ghes: Have a separate code path for creating the fake MC
EDAC/ghes: Carve out code into ghes_edac_register_{one,fake}()

drivers/edac/edac_mc.c | 27 ++-
drivers/edac/edac_mc.h | 6 +-
drivers/edac/ghes_edac.c | 380 +++++++++++++++++++++++++++------------
include/linux/edac.h | 2 -
4 files changed, 280 insertions(+), 135 deletions(-)

--
2.20.1


2020-04-22 12:00:46

by Robert Richter

[permalink] [raw]
Subject: [PATCH v2 03/10] EDAC/ghes: Remove unused members of struct ghes_edac_pvt, rename it to ghes_mci

The struct members list and ghes of struct ghes_edac_pvt are unused,
remove them. On that occasion, rename it to struct ghes_mci. This is
shorter and aligns better with the current naming scheme.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/edac/ghes_edac.c | 13 +++++--------
1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index cb3dab56a875..39efce0df881 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -15,9 +15,7 @@
#include "edac_module.h"
#include <ras/ras_event.h>

-struct ghes_edac_pvt {
- struct list_head list;
- struct ghes *ghes;
+struct ghes_mci {
struct mem_ctl_info *mci;

/* Buffers for the error handling routine */
@@ -32,7 +30,7 @@ static refcount_t ghes_refcount = REFCOUNT_INIT(0);
* also provides the necessary (implicit) memory barrier for the SMP
* case to make the pointer visible on another CPU.
*/
-static struct ghes_edac_pvt *ghes_pvt;
+static struct ghes_mci *ghes_pvt;

/* GHES registration mutex */
static DEFINE_MUTEX(ghes_reg_mutex);
@@ -203,7 +201,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
{
struct edac_raw_error_desc *e;
struct mem_ctl_info *mci;
- struct ghes_edac_pvt *pvt;
+ struct ghes_mci *pvt;
unsigned long flags;
char *p;

@@ -457,7 +455,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
bool fake = false;
int rc = 0, num_dimm = 0;
struct mem_ctl_info *mci;
- struct ghes_edac_pvt *pvt;
+ struct ghes_mci *pvt;
struct edac_mc_layer layers[1];
struct ghes_edac_dimm_fill dimm_fill;
unsigned long flags;
@@ -494,7 +492,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
layers[0].size = num_dimm;
layers[0].is_virt_csrow = true;

- mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, sizeof(struct ghes_edac_pvt));
+ mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, sizeof(*pvt));
if (!mci) {
pr_info("Can't allocate memory for EDAC data\n");
rc = -ENOMEM;
@@ -502,7 +500,6 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
}

pvt = mci->pvt_info;
- pvt->ghes = ghes;
pvt->mci = mci;

mci->pdev = dev;
--
2.20.1

2020-04-22 12:00:54

by Robert Richter

[permalink] [raw]
Subject: [PATCH v2 07/10] EDAC/ghes: Cleanup struct ghes_edac_dimm_fill, rename it to ghes_dimm_fill

The struct is used to store temporary data for the dmidecode callback.
Clean this up a bit:

1) Rename member count to index since this is what it is used for.

2) Move code close to ghes_edac_dmidecode() where it is used.

3) While at it, use edac_get_dimm_by_index().

Signed-off-by: Robert Richter <[email protected]>
---
drivers/edac/ghes_edac.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 038e560fd332..4eadc5b344c8 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -88,12 +88,6 @@ struct memdev_dmi_entry {
u16 conf_mem_clk_speed;
} __attribute__((__packed__));

-struct dimm_fill {
- struct list_head dimms;
- struct mem_ctl_info *mci;
- unsigned int count;
-};
-
static int ghes_dimm_pool_create(int num_dimm)
{
struct ghes_dimm *ghes_dimm;
@@ -182,6 +176,12 @@ static void ghes_dimm_setup_label(struct dimm_info *dimm, u16 handle)
"unknown memory (handle: 0x%.4x)", handle);
}

+struct dimm_fill {
+ struct list_head dimms;
+ struct mem_ctl_info *mci;
+ int index;
+};
+
static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
{
struct dimm_fill *dimm_fill = arg;
@@ -190,11 +190,11 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)

if (dh->type == DMI_ENTRY_MEM_DEVICE) {
struct memdev_dmi_entry *entry = (struct memdev_dmi_entry *)dh;
- struct dimm_info *dimm = edac_get_dimm(mci, dimm_fill->count, 0, 0);
+ struct dimm_info *dimm = edac_get_dimm_by_index(mci, dimm_fill->index);

if (entry->size == 0xffff) {
pr_info("Can't get DIMM%i size\n",
- dimm_fill->count);
+ dimm_fill->index);
dimm->nr_pages = MiB_TO_PAGES(32);/* Unknown */
} else if (entry->size == 0x7fff) {
dimm->nr_pages = MiB_TO_PAGES(entry->extended_size);
@@ -267,7 +267,7 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)

if (dimm->nr_pages) {
edac_dbg(1, "DIMM%i: %s size = %d MB%s\n",
- dimm_fill->count, edac_mem_types[dimm->mtype],
+ dimm_fill->index, edac_mem_types[dimm->mtype],
PAGES_TO_MiB(dimm->nr_pages),
(dimm->edac_mode != EDAC_NONE) ? "(ECC)" : "");
edac_dbg(2, "\ttype %d, detail 0x%02x, width %d(total %d)\n",
@@ -280,7 +280,7 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
if (ghes_dimm)
list_add_tail(&ghes_dimm->entry, &dimm_fill->dimms);

- dimm_fill->count++;
+ dimm_fill->index++;
}
}

@@ -614,11 +614,11 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
INIT_LIST_HEAD(&dimm_fill.dimms);

if (!fake) {
- dimm_fill.count = 0;
+ dimm_fill.index = 0;
dimm_fill.mci = mci;
dmi_walk(ghes_edac_dmidecode, &dimm_fill);
} else {
- struct dimm_info *dimm = edac_get_dimm(mci, 0, 0, 0);
+ struct dimm_info *dimm = edac_get_dimm_by_index(mci, 0);

dimm->nr_pages = 1;
dimm->grain = 128;
--
2.20.1

2020-04-22 12:01:02

by Robert Richter

[permalink] [raw]
Subject: [PATCH v2 08/10] EDAC/ghes: Carve out MC device handling into separate functions

The functions are too long, carve out code that handles MC devices
into the new functions ghes_mc_create(), ghes_mc_add_or_free() and
ghes_mc_free(). Apart from better code readability the functions can
be reused and the implementation of the error paths becomes easier.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/edac/ghes_edac.c | 141 +++++++++++++++++++++++----------------
1 file changed, 83 insertions(+), 58 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 4eadc5b344c8..af0a769071f4 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -535,16 +535,88 @@ static struct acpi_platform_list plat_list[] = {
{ } /* End */
};

-int ghes_edac_register(struct ghes *ghes, struct device *dev)
+static struct mem_ctl_info *ghes_mc_create(struct device *dev, int mc_idx,
+ int num_dimm)
{
- bool fake = false;
- int rc = 0, num_dimm = 0;
+ struct edac_mc_layer layers[1];
struct mem_ctl_info *mci;
struct ghes_mci *pvt;
- struct edac_mc_layer layers[1];
- struct dimm_fill dimm_fill;
+
+ layers[0].type = EDAC_MC_LAYER_ALL_MEM;
+ layers[0].size = num_dimm;
+ layers[0].is_virt_csrow = true;
+
+ mci = edac_mc_alloc(mc_idx, ARRAY_SIZE(layers), layers, sizeof(*pvt));
+ if (!mci)
+ return NULL;
+
+ pvt = mci->pvt_info;
+ pvt->mci = mci;
+
+ mci->pdev = dev;
+ mci->mtype_cap = MEM_FLAG_EMPTY;
+ mci->edac_ctl_cap = EDAC_FLAG_NONE;
+ mci->edac_cap = EDAC_FLAG_NONE;
+ mci->mod_name = "ghes_edac.c";
+ mci->ctl_name = "ghes_edac";
+ mci->dev_name = "ghes";
+
+ return mci;
+}
+
+static int ghes_mc_add_or_free(struct mem_ctl_info *mci,
+ struct list_head *dimm_list)
+{
unsigned long flags;
- int idx = -1;
+ int rc;
+
+ rc = edac_mc_add_mc(mci);
+ if (rc < 0) {
+ ghes_dimm_release(dimm_list);
+ edac_mc_free(mci);
+ return rc;
+ }
+
+ spin_lock_irqsave(&ghes_lock, flags);
+ ghes_pvt = mci->pvt_info;
+ list_splice_tail(dimm_list, &ghes_dimm_list);
+ spin_unlock_irqrestore(&ghes_lock, flags);
+
+ return 0;
+}
+
+static void ghes_mc_free(void)
+{
+ struct mem_ctl_info *mci;
+ unsigned long flags;
+ LIST_HEAD(dimm_list);
+
+ /*
+ * Wait for the irq handler being finished.
+ */
+ spin_lock_irqsave(&ghes_lock, flags);
+ mci = ghes_pvt ? ghes_pvt->mci : NULL;
+ ghes_pvt = NULL;
+ list_splice_init(&ghes_dimm_list, &dimm_list);
+ spin_unlock_irqrestore(&ghes_lock, flags);
+
+ ghes_dimm_release(&dimm_list);
+
+ if (!mci)
+ return;
+
+ mci = edac_mc_del_mc(mci->pdev);
+ if (mci)
+ edac_mc_free(mci);
+}
+
+int ghes_edac_register(struct ghes *ghes, struct device *dev)
+{
+ struct dimm_fill dimm_fill;
+ int rc = 0, num_dimm = 0;
+ struct mem_ctl_info *mci;
+ bool fake = false;
+ int idx;

if (IS_ENABLED(CONFIG_X86)) {
/* Check if safe to enable on this system */
@@ -577,27 +649,12 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
num_dimm = 1;
}

- layers[0].type = EDAC_MC_LAYER_ALL_MEM;
- layers[0].size = num_dimm;
- layers[0].is_virt_csrow = true;
-
- mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, sizeof(*pvt));
+ mci = ghes_mc_create(dev, 0, num_dimm);
if (!mci) {
rc = -ENOMEM;
goto unlock;
}

- pvt = mci->pvt_info;
- pvt->mci = mci;
-
- mci->pdev = dev;
- mci->mtype_cap = MEM_FLAG_EMPTY;
- mci->edac_ctl_cap = EDAC_FLAG_NONE;
- mci->edac_cap = EDAC_FLAG_NONE;
- mci->mod_name = "ghes_edac.c";
- mci->ctl_name = "ghes_edac";
- mci->dev_name = "ghes";
-
if (fake) {
pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
@@ -627,18 +684,9 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
dimm->edac_mode = EDAC_SECDED;
}

- rc = edac_mc_add_mc(mci);
- if (rc < 0) {
- ghes_dimm_release(&dimm_fill.dimms);
- edac_mc_free(mci);
- rc = -ENODEV;
+ rc = ghes_mc_add_or_free(mci, &dimm_fill.dimms);
+ if (rc < 0)
goto unlock;
- }
-
- spin_lock_irqsave(&ghes_lock, flags);
- ghes_pvt = pvt;
- list_splice_tail(&dimm_fill.dimms, &ghes_dimm_list);
- spin_unlock_irqrestore(&ghes_lock, flags);

/* only set on success */
refcount_set(&ghes_refcount, 1);
@@ -656,35 +704,12 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)

void ghes_edac_unregister(struct ghes *ghes)
{
- struct mem_ctl_info *mci;
- unsigned long flags;
- LIST_HEAD(dimm_list);
-
mutex_lock(&ghes_reg_mutex);

- if (!refcount_dec_and_test(&ghes_refcount))
- goto unlock;
-
- /*
- * Wait for the irq handler being finished.
- */
- spin_lock_irqsave(&ghes_lock, flags);
- mci = ghes_pvt ? ghes_pvt->mci : NULL;
- ghes_pvt = NULL;
- list_splice_init(&ghes_dimm_list, &dimm_list);
- spin_unlock_irqrestore(&ghes_lock, flags);
-
- ghes_dimm_release(&dimm_list);
-
- if (!mci)
- goto unlock;
-
- mci = edac_mc_del_mc(mci->pdev);
- if (mci) {
- edac_mc_free(mci);
+ if (refcount_dec_and_test(&ghes_refcount)) {
+ ghes_mc_free();
ghes_dimm_pool_destroy();
}

-unlock:
mutex_unlock(&ghes_reg_mutex);
}
--
2.20.1

2020-04-22 12:01:04

by Robert Richter

[permalink] [raw]
Subject: [PATCH v2 01/10] EDAC/mc: Fix usage of snprintf() and dimm location setup

The setup of the dimm->location may be incomplete in case writing to
dimm->label fails due to small buffer size. Fix this by iterating
through all existing layers.

Also, the return value of snprintf() can be higher than the number of
bytes written to the buffer in case it is to small. Fix usage of
snprintf() by either porting it to scnprintf() or fixing the handling
of the return code.

It is very unlikely the buffer is too small in practice, but fixing it
anyway.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/edac/edac_mc.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 75ede27bdf6a..107d7c4de933 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -130,11 +130,11 @@ unsigned int edac_dimm_info_location(struct dimm_info *dimm, char *buf,
n = snprintf(p, len, "%s %d ",
edac_layer_name[mci->layers[i].type],
dimm->location[i]);
+ if (len <= n)
+ return count + len - 1;
p += n;
len -= n;
count += n;
- if (!len)
- break;
}

return count;
@@ -397,19 +397,19 @@ static int edac_mc_alloc_dimms(struct mem_ctl_info *mci)
*/
len = sizeof(dimm->label);
p = dimm->label;
- n = snprintf(p, len, "mc#%u", mci->mc_idx);
+ n = scnprintf(p, len, "mc#%u", mci->mc_idx);
p += n;
len -= n;
+
for (layer = 0; layer < mci->n_layers; layer++) {
- n = snprintf(p, len, "%s#%u",
- edac_layer_name[mci->layers[layer].type],
- pos[layer]);
+ dimm->location[layer] = pos[layer];
+ if (len <= 1)
+ continue;
+ n = scnprintf(p, len, "%s#%u",
+ edac_layer_name[mci->layers[layer].type],
+ pos[layer]);
p += n;
len -= n;
- dimm->location[layer] = pos[layer];
-
- if (len <= 0)
- break;
}

/* Link it to the csrows old API data */
--
2.20.1

2020-04-22 12:01:11

by Robert Richter

[permalink] [raw]
Subject: [PATCH v2 10/10] EDAC/ghes: Carve out code into ghes_edac_register_{one,fake}()

Factor out code to register a memory controller including DIMMs. Do
this for standard and fake memory controller in the two functions
ghes_edac_register_one() and ghes_edac_register_fake().

Function ghes_edac_register_one() could be reused to register multiple
*mci structs.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/edac/ghes_edac.c | 80 ++++++++++++++++++++++------------------
1 file changed, 44 insertions(+), 36 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index af72da156696..ee1e95e9b59b 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -610,11 +610,50 @@ static void ghes_mc_free(void)
edac_mc_free(mci);
}

-int ghes_edac_register(struct ghes *ghes, struct device *dev)
+static int ghes_edac_register_fake(struct device *dev)
+{
+ struct mem_ctl_info *mci;
+ struct dimm_info *dimm;
+ LIST_HEAD(empty);
+
+ mci = ghes_mc_create(dev, 0, 1);
+ if (!mci)
+ return -ENOMEM;
+
+ dimm = edac_get_dimm_by_index(mci, 0);
+
+ dimm->nr_pages = 1;
+ dimm->grain = 128;
+ dimm->mtype = MEM_UNKNOWN;
+ dimm->dtype = DEV_UNKNOWN;
+ dimm->edac_mode = EDAC_SECDED;
+
+ snprintf(dimm->label, sizeof(dimm->label), "unknown memory");
+
+ return ghes_mc_add_or_free(mci, &empty);
+}
+
+static int ghes_edac_register_one(struct device *dev, int mc_idx, int num_dimm)
{
struct dimm_fill dimm_fill;
- int rc = 0, num_dimm = 0;
struct mem_ctl_info *mci;
+
+ mci = ghes_mc_create(dev, mc_idx, num_dimm);
+ if (!mci)
+ return -ENOMEM;
+
+ dimm_fill.index = 0;
+ dimm_fill.mci = mci;
+ INIT_LIST_HEAD(&dimm_fill.dimms);
+
+ dmi_walk(ghes_edac_dmidecode, &dimm_fill);
+
+ return ghes_mc_add_or_free(mci, &dimm_fill.dimms);
+}
+
+int ghes_edac_register(struct ghes *ghes, struct device *dev)
+{
+ int rc = 0, num_dimm = 0;
int idx;

if (IS_ENABLED(CONFIG_X86)) {
@@ -648,27 +687,8 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
* with only one DIMM for the whole address range to
* catch all errros.
*/
- struct dimm_info *dimm;
- LIST_HEAD(empty);
-
- mci = ghes_mc_create(dev, 0, 1);
- if (!mci) {
- rc = -ENOMEM;
- goto unlock;
- }
-
- dimm = edac_get_dimm_by_index(mci, 0);
-
- dimm->nr_pages = 1;
- dimm->grain = 128;
- dimm->mtype = MEM_UNKNOWN;
- dimm->dtype = DEV_UNKNOWN;
- dimm->edac_mode = EDAC_SECDED;
-
- snprintf(dimm->label, sizeof(dimm->label), "unknown memory");
-
- rc = ghes_mc_add_or_free(mci, &empty);
- if (rc)
+ rc = ghes_edac_register_fake(dev);
+ if (rc < 0)
goto unlock;

pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
@@ -687,19 +707,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
pr_info("This system has %d DIMM sockets.\n", num_dimm);
}

- mci = ghes_mc_create(dev, 0, num_dimm);
- if (!mci) {
- rc = -ENOMEM;
- goto unlock;
- }
-
- dimm_fill.index = 0;
- dimm_fill.mci = mci;
- INIT_LIST_HEAD(&dimm_fill.dimms);
-
- dmi_walk(ghes_edac_dmidecode, &dimm_fill);
-
- rc = ghes_mc_add_or_free(mci, &dimm_fill.dimms);
+ rc = ghes_edac_register_one(dev, 0, num_dimm);
if (rc < 0)
goto unlock;

--
2.20.1

2020-04-22 12:01:38

by Robert Richter

[permalink] [raw]
Subject: [PATCH v2 04/10] EDAC/ghes: Make SMBIOS handle private data to ghes

To identify a hw error's location, the ghes driver needs to know the
mapping between a DIMM and the used SMBIOS handle. The handle is
stored in struct dimm_info in the smbios_handle field where other
drivers carry it too. Make the SMBIOS handle private and implement an
own SMBIOS handle mapping table that is only used by the ghes
driver. As a consequence, smbios_handle is removed from struct
dimm_info.

The mapping table is implemented as a list. This makes the locking and
the allocation of table entries when adding or removing DIMMs much
easier. Since the list is accessed by the interrupt handler it must be
protected by a spinlock. Thanks to the use of a list, multiple entries
can be prepared in advance without a lock being held. Once ready, the
entries are added to an active list of DIMMs (ghes_dimm_list) by just
a single list operation that needs the locking. The same list entry is
also used for memory allocation, all entries are created at once using
a single array allocation and are put in a pool (ghes_dimm_pool) that
holds unused entries for later use.

While at it, rename struct ghes_edac_dimm_fill to struct dimm_fill.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/edac/ghes_edac.c | 117 ++++++++++++++++++++++++++++++++++-----
include/linux/edac.h | 2 -
2 files changed, 104 insertions(+), 15 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 39efce0df881..23adb7674f9b 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -15,6 +15,12 @@
#include "edac_module.h"
#include <ras/ras_event.h>

+struct ghes_dimm {
+ struct list_head entry;
+ struct dimm_info *dimm;
+ u16 handle;
+};
+
struct ghes_mci {
struct mem_ctl_info *mci;

@@ -42,6 +48,16 @@ static DEFINE_MUTEX(ghes_reg_mutex);
*/
static DEFINE_SPINLOCK(ghes_lock);

+/*
+ * Locking:
+ *
+ * dimms, ghes_dimm_pool: ghes_reg_mutex
+ * ghes_dimm_list: ghes_lock
+ */
+static struct ghes_dimm *dimms;
+static LIST_HEAD(ghes_dimm_list);
+static LIST_HEAD(ghes_dimm_pool);
+
/* "ghes_edac.force_load=1" skips the platform check */
static bool __read_mostly force_load;
module_param(force_load, bool, 0);
@@ -72,11 +88,63 @@ struct memdev_dmi_entry {
u16 conf_mem_clk_speed;
} __attribute__((__packed__));

-struct ghes_edac_dimm_fill {
+struct dimm_fill {
+ struct list_head dimms;
struct mem_ctl_info *mci;
unsigned int count;
};

+static int ghes_dimm_pool_create(int num_dimm)
+{
+ struct ghes_dimm *ghes_dimm;
+
+ if (!num_dimm)
+ return 0;
+
+ lockdep_assert_held(ghes_reg_mutex);
+
+ dimms = kcalloc(num_dimm, sizeof(*dimms), GFP_KERNEL);
+ if (!dimms)
+ return -ENOMEM;
+
+ for (ghes_dimm = dimms; ghes_dimm < dimms + num_dimm; ghes_dimm++)
+ list_add(&ghes_dimm->entry, &ghes_dimm_pool);
+
+ return 0;
+}
+
+static void ghes_dimm_pool_destroy(void)
+{
+ lockdep_assert_held(ghes_reg_mutex);
+ INIT_LIST_HEAD(&ghes_dimm_pool);
+ kfree(dimms);
+}
+
+static struct ghes_dimm *ghes_dimm_alloc(struct dimm_info *dimm, u16 handle)
+{
+ struct ghes_dimm *ghes_dimm;
+
+ lockdep_assert_held(ghes_reg_mutex);
+
+ ghes_dimm = list_first_entry_or_null(&ghes_dimm_pool,
+ struct ghes_dimm, entry);
+
+ /* should be always non-zero */
+ if (!WARN_ON_ONCE(!ghes_dimm)) {
+ ghes_dimm->dimm = dimm;
+ ghes_dimm->handle = handle;
+ list_del(&ghes_dimm->entry);
+ }
+
+ return ghes_dimm;
+}
+
+static void ghes_dimm_release(struct list_head *dimms)
+{
+ lockdep_assert_held(ghes_reg_mutex);
+ list_splice(dimms, &ghes_dimm_pool);
+}
+
static void ghes_edac_count_dimms(const struct dmi_header *dh, void *arg)
{
int *num_dimm = arg;
@@ -85,13 +153,15 @@ static void ghes_edac_count_dimms(const struct dmi_header *dh, void *arg)
(*num_dimm)++;
}

-static int get_dimm_smbios_index(struct mem_ctl_info *mci, u16 handle)
+static int get_dimm_smbios_index(u16 handle)
{
- struct dimm_info *dimm;
+ struct ghes_dimm *ghes_dimm;

- mci_for_each_dimm(mci, dimm) {
- if (dimm->smbios_handle == handle)
- return dimm->idx;
+ lockdep_assert_held(&ghes_lock);
+
+ list_for_each_entry(ghes_dimm, &ghes_dimm_list, entry) {
+ if (ghes_dimm->handle == handle)
+ return ghes_dimm->dimm->idx;
}

return -1;
@@ -99,8 +169,9 @@ static int get_dimm_smbios_index(struct mem_ctl_info *mci, u16 handle)

static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
{
- struct ghes_edac_dimm_fill *dimm_fill = arg;
+ struct dimm_fill *dimm_fill = arg;
struct mem_ctl_info *mci = dimm_fill->mci;
+ struct ghes_dimm *ghes_dimm;

if (dh->type == DMI_ENTRY_MEM_DEVICE) {
struct memdev_dmi_entry *entry = (struct memdev_dmi_entry *)dh;
@@ -191,7 +262,10 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
entry->total_width, entry->data_width);
}

- dimm->smbios_handle = entry->handle;
+
+ ghes_dimm = ghes_dimm_alloc(dimm, entry->handle);
+ if (ghes_dimm)
+ list_add_tail(&ghes_dimm->entry, &dimm_fill->dimms);

dimm_fill->count++;
}
@@ -352,7 +426,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
p += sprintf(p, "DIMM DMI handle: 0x%.4x ",
mem_err->mem_dev_handle);

- index = get_dimm_smbios_index(mci, mem_err->mem_dev_handle);
+ index = get_dimm_smbios_index(mem_err->mem_dev_handle);
if (index >= 0)
e->top_layer = index;
}
@@ -457,7 +531,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
struct mem_ctl_info *mci;
struct ghes_mci *pvt;
struct edac_mc_layer layers[1];
- struct ghes_edac_dimm_fill dimm_fill;
+ struct dimm_fill dimm_fill;
unsigned long flags;
int idx = -1;

@@ -482,6 +556,10 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
/* Get the number of DIMMs */
dmi_walk(ghes_edac_count_dimms, &num_dimm);

+ rc = ghes_dimm_pool_create(num_dimm);
+ if (rc < 0)
+ goto unlock;
+
/* Check if we've got a bogus BIOS */
if (num_dimm == 0) {
fake = true;
@@ -494,7 +572,6 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)

mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, sizeof(*pvt));
if (!mci) {
- pr_info("Can't allocate memory for EDAC data\n");
rc = -ENOMEM;
goto unlock;
}
@@ -523,6 +600,8 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
pr_info("This system has %d DIMM sockets.\n", num_dimm);
}

+ INIT_LIST_HEAD(&dimm_fill.dimms);
+
if (!fake) {
dimm_fill.count = 0;
dimm_fill.mci = mci;
@@ -539,7 +618,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)

rc = edac_mc_add_mc(mci);
if (rc < 0) {
- pr_info("Can't register at EDAC core\n");
+ ghes_dimm_release(&dimm_fill.dimms);
edac_mc_free(mci);
rc = -ENODEV;
goto unlock;
@@ -547,12 +626,18 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)

spin_lock_irqsave(&ghes_lock, flags);
ghes_pvt = pvt;
+ list_splice_tail(&dimm_fill.dimms, &ghes_dimm_list);
spin_unlock_irqrestore(&ghes_lock, flags);

/* only set on success */
refcount_set(&ghes_refcount, 1);

unlock:
+ if (rc < 0) {
+ ghes_dimm_pool_destroy();
+ pr_err("Can't register at EDAC core: %d\n", rc);
+ }
+
mutex_unlock(&ghes_reg_mutex);

return rc;
@@ -562,6 +647,7 @@ void ghes_edac_unregister(struct ghes *ghes)
{
struct mem_ctl_info *mci;
unsigned long flags;
+ LIST_HEAD(dimm_list);

mutex_lock(&ghes_reg_mutex);

@@ -574,14 +660,19 @@ void ghes_edac_unregister(struct ghes *ghes)
spin_lock_irqsave(&ghes_lock, flags);
mci = ghes_pvt ? ghes_pvt->mci : NULL;
ghes_pvt = NULL;
+ list_splice_init(&ghes_dimm_list, &dimm_list);
spin_unlock_irqrestore(&ghes_lock, flags);

+ ghes_dimm_release(&dimm_list);
+
if (!mci)
goto unlock;

mci = edac_mc_del_mc(mci->pdev);
- if (mci)
+ if (mci) {
edac_mc_free(mci);
+ ghes_dimm_pool_destroy();
+ }

unlock:
mutex_unlock(&ghes_reg_mutex);
diff --git a/include/linux/edac.h b/include/linux/edac.h
index 0f20b986b0ab..6b7f594782c0 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -382,8 +382,6 @@ struct dimm_info {

unsigned int csrow, cschannel; /* Points to the old API data */

- u16 smbios_handle; /* Handle for SMBIOS type 17 */
-
u32 ce_count;
u32 ue_count;
};
--
2.20.1

2020-04-22 12:01:46

by Robert Richter

[permalink] [raw]
Subject: [PATCH v2 05/10] EDAC/ghes: Setup DIMM label from DMI and use it in error reports

The ghes driver reports errors with 'unknown label' even if the actual
DIMM label is known, e.g.:

EDAC MC0: 1 CE Single-bit ECC on unknown label (node:0 card:0
module:0 rank:1 bank:0 col:13 bit_pos:16 DIMM location:N0 DIMM_A0
page:0x966a9b3 offset:0x0 grain:1 syndrome:0x0 - APEI location:
node:0 card:0 module:0 rank:1 bank:0 col:13 bit_pos:16 DIMM
location:N0 DIMM_A0 status(0x0000000000000400): Storage error in
DRAM memory)

Fix this by using struct dimm_info's label string in error reports:

EDAC MC0: 1 CE Single-bit ECC on N0 DIMM_A0 (node:0 card:0 module:0
rank:1 bank:515 col:14 bit_pos:16 DIMM location:N0 DIMM_A0
page:0x99223d8 offset:0x0 grain:1 syndrome:0x0 - APEI location:
node:0 card:0 module:0 rank:1 bank:515 col:14 bit_pos:16 DIMM
location:N0 DIMM_A0 status(0x0000000000000400): Storage error in
DRAM memory)

The labels are initialized by reading the bank and device strings from
DMI. Now, the label information can also read from sysfs. E.g. a
ThunderX2 system will show the following:

/sys/devices/system/edac/mc/mc0/dimm0/dimm_label:N0 DIMM_A0
/sys/devices/system/edac/mc/mc0/dimm1/dimm_label:N0 DIMM_B0
/sys/devices/system/edac/mc/mc0/dimm2/dimm_label:N0 DIMM_C0
/sys/devices/system/edac/mc/mc0/dimm3/dimm_label:N0 DIMM_D0
/sys/devices/system/edac/mc/mc0/dimm4/dimm_label:N0 DIMM_E0
/sys/devices/system/edac/mc/mc0/dimm5/dimm_label:N0 DIMM_F0
/sys/devices/system/edac/mc/mc0/dimm6/dimm_label:N0 DIMM_G0
/sys/devices/system/edac/mc/mc0/dimm7/dimm_label:N0 DIMM_H0
/sys/devices/system/edac/mc/mc0/dimm8/dimm_label:N1 DIMM_I0
/sys/devices/system/edac/mc/mc0/dimm9/dimm_label:N1 DIMM_J0
/sys/devices/system/edac/mc/mc0/dimm10/dimm_label:N1 DIMM_K0
/sys/devices/system/edac/mc/mc0/dimm11/dimm_label:N1 DIMM_L0
/sys/devices/system/edac/mc/mc0/dimm12/dimm_label:N1 DIMM_M0
/sys/devices/system/edac/mc/mc0/dimm13/dimm_label:N1 DIMM_N0
/sys/devices/system/edac/mc/mc0/dimm14/dimm_label:N1 DIMM_O0
/sys/devices/system/edac/mc/mc0/dimm15/dimm_label:N1 DIMM_P0

Since dimm_labels can be rewritten, that label will be used in a later
error report:

# echo foobar >/sys/devices/system/edac/mc/mc0/dimm0/dimm_label
# # some error injection here
# dmesg | grep foobar
[ 2119.784489] EDAC MC0: 1 CE Single-bit ECC on foobar (node:0 card:0
module:0 rank:0 bank:769 col:1 bit_pos:16 DIMM location:foobar
page:0x94d027 offset:0x0 grain:1 syndrome:0x0 - APEI location: node:0
card:0 module:0 rank:0 bank:769 col:1 bit_pos:16 DIMM location:foobar
status(0x0000000000000400): Storage error in DRAM memory)

Signed-off-by: Robert Richter <[email protected]>
---
drivers/edac/ghes_edac.c | 43 +++++++++++++++++++++++++---------------
1 file changed, 27 insertions(+), 16 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 23adb7674f9b..a5890afa9c71 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -153,7 +153,7 @@ static void ghes_edac_count_dimms(const struct dmi_header *dh, void *arg)
(*num_dimm)++;
}

-static int get_dimm_smbios_index(u16 handle)
+static struct dimm_info *find_dimm_by_handle(u16 handle)
{
struct ghes_dimm *ghes_dimm;

@@ -161,10 +161,25 @@ static int get_dimm_smbios_index(u16 handle)

list_for_each_entry(ghes_dimm, &ghes_dimm_list, entry) {
if (ghes_dimm->handle == handle)
- return ghes_dimm->dimm->idx;
+ return ghes_dimm->dimm;
}

- return -1;
+ return NULL;
+}
+
+static void ghes_dimm_setup_label(struct dimm_info *dimm, u16 handle)
+{
+ const char *bank = NULL, *device = NULL;
+
+ dmi_memdev_name(handle, &bank, &device);
+
+ /* both strings must be non-zero */
+ if (bank && *bank && device && *device)
+ snprintf(dimm->label, sizeof(dimm->label),
+ "%s %s", bank, device);
+ else
+ snprintf(dimm->label, sizeof(dimm->label),
+ "unknown memory (handle: 0x%.4x)", handle);
}

static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
@@ -248,9 +263,7 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
dimm->dtype = DEV_UNKNOWN;
dimm->grain = 128; /* Likely, worse case */

- /*
- * FIXME: It shouldn't be hard to also fill the DIMM labels
- */
+ ghes_dimm_setup_label(dimm, entry->handle);

if (dimm->nr_pages) {
edac_dbg(1, "DIMM%i: %s size = %d MB%s\n",
@@ -416,19 +429,17 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
if (mem_err->validation_bits & CPER_MEM_VALID_BIT_POSITION)
p += sprintf(p, "bit_pos:%d ", mem_err->bit_pos);
if (mem_err->validation_bits & CPER_MEM_VALID_MODULE_HANDLE) {
- const char *bank = NULL, *device = NULL;
- int index = -1;
+ struct dimm_info *dimm;

- dmi_memdev_name(mem_err->mem_dev_handle, &bank, &device);
- if (bank != NULL && device != NULL)
- p += sprintf(p, "DIMM location:%s %s ", bank, device);
- else
+ dimm = find_dimm_by_handle(mem_err->mem_dev_handle);
+ if (dimm) {
+ e->top_layer = dimm->idx;
+ strcpy(e->label, dimm->label);
+ p += sprintf(p, "DIMM location:%s ", dimm->label);
+ } else {
p += sprintf(p, "DIMM DMI handle: 0x%.4x ",
mem_err->mem_dev_handle);
-
- index = get_dimm_smbios_index(mem_err->mem_dev_handle);
- if (index >= 0)
- e->top_layer = index;
+ }
}
if (p > e->location)
*(p - 1) = '\0';
--
2.20.1

2020-04-22 12:03:02

by Robert Richter

[permalink] [raw]
Subject: [PATCH v2 09/10] EDAC/ghes: Have a separate code path for creating the fake MC

The code in ghes_edac_register() switches back and forth between
standard and fake controller creation. Do one thing only and separate
the code path that creates the fake MC.

Note: For better review the code is not yet carved out in separate
functions.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/edac/ghes_edac.c | 68 +++++++++++++++++++++++++---------------
1 file changed, 42 insertions(+), 26 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index af0a769071f4..af72da156696 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -615,7 +615,6 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
struct dimm_fill dimm_fill;
int rc = 0, num_dimm = 0;
struct mem_ctl_info *mci;
- bool fake = false;
int idx;

if (IS_ENABLED(CONFIG_X86)) {
@@ -643,23 +642,43 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
if (rc < 0)
goto unlock;

- /* Check if we've got a bogus BIOS */
- if (num_dimm == 0) {
- fake = true;
- num_dimm = 1;
- }
+ if (!num_dimm) {
+ /*
+ * Bogus BIOS: Ignore DMI topology and use a single MC
+ * with only one DIMM for the whole address range to
+ * catch all errros.
+ */
+ struct dimm_info *dimm;
+ LIST_HEAD(empty);

- mci = ghes_mc_create(dev, 0, num_dimm);
- if (!mci) {
- rc = -ENOMEM;
- goto unlock;
- }
+ mci = ghes_mc_create(dev, 0, 1);
+ if (!mci) {
+ rc = -ENOMEM;
+ goto unlock;
+ }
+
+ dimm = edac_get_dimm_by_index(mci, 0);
+
+ dimm->nr_pages = 1;
+ dimm->grain = 128;
+ dimm->mtype = MEM_UNKNOWN;
+ dimm->dtype = DEV_UNKNOWN;
+ dimm->edac_mode = EDAC_SECDED;
+
+ snprintf(dimm->label, sizeof(dimm->label), "unknown memory");
+
+ rc = ghes_mc_add_or_free(mci, &empty);
+ if (rc)
+ goto unlock;

- if (fake) {
pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
pr_info("work on such system. Use this driver with caution\n");
- } else if (idx < 0) {
+
+ goto out;
+ }
+
+ if (idx < 0) {
pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
@@ -668,26 +687,23 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
pr_info("This system has %d DIMM sockets.\n", num_dimm);
}

- INIT_LIST_HEAD(&dimm_fill.dimms);
+ mci = ghes_mc_create(dev, 0, num_dimm);
+ if (!mci) {
+ rc = -ENOMEM;
+ goto unlock;
+ }

- if (!fake) {
- dimm_fill.index = 0;
- dimm_fill.mci = mci;
- dmi_walk(ghes_edac_dmidecode, &dimm_fill);
- } else {
- struct dimm_info *dimm = edac_get_dimm_by_index(mci, 0);
+ dimm_fill.index = 0;
+ dimm_fill.mci = mci;
+ INIT_LIST_HEAD(&dimm_fill.dimms);

- dimm->nr_pages = 1;
- dimm->grain = 128;
- dimm->mtype = MEM_UNKNOWN;
- dimm->dtype = DEV_UNKNOWN;
- dimm->edac_mode = EDAC_SECDED;
- }
+ dmi_walk(ghes_edac_dmidecode, &dimm_fill);

rc = ghes_mc_add_or_free(mci, &dimm_fill.dimms);
if (rc < 0)
goto unlock;

+out:
/* only set on success */
refcount_set(&ghes_refcount, 1);

--
2.20.1

2020-04-22 12:05:19

by Robert Richter

[permalink] [raw]
Subject: [PATCH v2 02/10] EDAC/mc: Use int type for parameters of edac_mc_alloc()

Most iterators use int type as index. mci->mc_idx is also type int. So
use int type for parameters of edac_mc_alloc(). Extend the range check
to check for negative values. There is a type cast now when assigning
variable n_layers to mci->n_layer, it is safe due to the range check.

While at it, rename the users of edac_mc_alloc() to mc_idx as this
fits better here.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/edac/edac_mc.c | 7 +++----
drivers/edac/edac_mc.h | 6 +++---
2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 107d7c4de933..57d1d356d69c 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -444,8 +444,7 @@ static int edac_mc_alloc_dimms(struct mem_ctl_info *mci)
return 0;
}

-struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
- unsigned int n_layers,
+struct mem_ctl_info *edac_mc_alloc(int mc_idx, int n_layers,
struct edac_mc_layer *layers,
unsigned int sz_pvt)
{
@@ -456,7 +455,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
void *pvt, *ptr = NULL;
bool per_rank = false;

- if (WARN_ON(n_layers > EDAC_MAX_LAYERS || n_layers == 0))
+ if (WARN_ON(mc_idx < 0 || n_layers < 1 || n_layers > EDAC_MAX_LAYERS))
return NULL;

/*
@@ -505,7 +504,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
pvt = sz_pvt ? (((char *)mci) + ((unsigned long)pvt)) : NULL;

/* setup index and various internal pointers */
- mci->mc_idx = mc_num;
+ mci->mc_idx = mc_idx;
mci->tot_dimms = tot_dimms;
mci->pvt_info = pvt;
mci->n_layers = n_layers;
diff --git a/drivers/edac/edac_mc.h b/drivers/edac/edac_mc.h
index 881b00eadf7a..4815f50afea0 100644
--- a/drivers/edac/edac_mc.h
+++ b/drivers/edac/edac_mc.h
@@ -98,7 +98,7 @@ do { \
/**
* edac_mc_alloc() - Allocate and partially fill a struct &mem_ctl_info.
*
- * @mc_num: Memory controller number
+ * @mc_idx: Memory controller number
* @n_layers: Number of MC hierarchy layers
* @layers: Describes each layer as seen by the Memory Controller
* @sz_pvt: size of private storage needed
@@ -122,8 +122,8 @@ do { \
* On success, return a pointer to struct mem_ctl_info pointer;
* %NULL otherwise
*/
-struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
- unsigned int n_layers,
+struct mem_ctl_info *edac_mc_alloc(int mc_idx,
+ int n_layers,
struct edac_mc_layer *layers,
unsigned int sz_pvt);

--
2.20.1

2020-04-22 12:24:49

by Robert Richter

[permalink] [raw]
Subject: [PATCH v2 06/10] EDAC/ghes: Remove local variable rdr_mask in ghes_edac_dmidecode()

The local variable rdr_mask serves as a static constant here. It hides
what the code is doing. Remove it and replace it with the actual logic
that checks some bits.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/edac/ghes_edac.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index a5890afa9c71..038e560fd332 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -191,7 +191,6 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
if (dh->type == DMI_ENTRY_MEM_DEVICE) {
struct memdev_dmi_entry *entry = (struct memdev_dmi_entry *)dh;
struct dimm_info *dimm = edac_get_dimm(mci, dimm_fill->count, 0, 0);
- u16 rdr_mask = BIT(7) | BIT(13);

if (entry->size == 0xffff) {
pr_info("Can't get DIMM%i size\n",
@@ -241,7 +240,8 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
default:
if (entry->type_detail & BIT(6))
dimm->mtype = MEM_RMBS;
- else if ((entry->type_detail & rdr_mask) == rdr_mask)
+ else if ((entry->type_detail & BIT(7)) &&
+ (entry->type_detail & BIT(13)))
dimm->mtype = MEM_RDR;
else if (entry->type_detail & BIT(7))
dimm->mtype = MEM_SDR;
--
2.20.1

2020-04-22 20:54:12

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 01/10] EDAC/mc: Fix usage of snprintf() and dimm location setup

On Wed, Apr 22, 2020 at 01:58:05PM +0200, Robert Richter wrote:
> The setup of the dimm->location may be incomplete in case writing to
> dimm->label fails due to small buffer size. Fix this by iterating
> through all existing layers.
>
> Also, the return value of snprintf() can be higher than the number of
> bytes written to the buffer in case it is to small. Fix usage of
> snprintf() by either porting it to scnprintf() or fixing the handling
> of the return code.
>
> It is very unlikely the buffer is too small in practice, but fixing it
> anyway.
>
> Signed-off-by: Robert Richter <[email protected]>
> ---
> drivers/edac/edac_mc.c | 20 ++++++++++----------
> 1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
> index 75ede27bdf6a..107d7c4de933 100644
> --- a/drivers/edac/edac_mc.c
> +++ b/drivers/edac/edac_mc.c
> @@ -130,11 +130,11 @@ unsigned int edac_dimm_info_location(struct dimm_info *dimm, char *buf,
> n = snprintf(p, len, "%s %d ",
> edac_layer_name[mci->layers[i].type],
> dimm->location[i]);
> + if (len <= n)
> + return count + len - 1;
> p += n;
> len -= n;
> count += n;
> - if (!len)
> - break;
> }
>
> return count;
> @@ -397,19 +397,19 @@ static int edac_mc_alloc_dimms(struct mem_ctl_info *mci)
> */
> len = sizeof(dimm->label);
> p = dimm->label;
> - n = snprintf(p, len, "mc#%u", mci->mc_idx);
> + n = scnprintf(p, len, "mc#%u", mci->mc_idx);
> p += n;
> len -= n;
> +
> for (layer = 0; layer < mci->n_layers; layer++) {
> - n = snprintf(p, len, "%s#%u",
> - edac_layer_name[mci->layers[layer].type],
> - pos[layer]);

The edac_layer_name[]'s are single words of a couple of letters and the
pos is a number. The buffer we pass in is at least 80 chars and in one
place even a PAGE_SIZE.

But in general, this is just silly with the buffers on stack and
printing into them.

It would be much better to opencode that loop in
edac_dimm_info_location() and simply dump those layer names at the call
sites. And then kill that silly edac_dimm_info_location() function. See
below for example.

And then since two call sites do edac_dbg(), you can put that in a
function edac_dbg_dump_dimm_location() or so and call it and not care
about any buffer lengths and s*printf's and so on.

Right?

---
diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 422120793a6b..7c04ef0c3536 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -91,16 +91,23 @@ static void edac_mc_dump_channel(struct rank_info *chan)

static void edac_mc_dump_dimm(struct dimm_info *dimm)
{
- char location[80];
+ struct mem_ctl_info *mci = dimm->mci;
+ int i;

if (!dimm->nr_pages)
return;

- edac_dimm_info_location(dimm, location, sizeof(location));
+ edac_dbg(4, "%s%i: ", dimm->mci->csbased ? "rank" : "dimm", dimm->idx);
+
+ for (i = 0; i < mci->n_layers; i++)
+ edac_dbg(4, "%s %d ",
+ edac_layer_name[mci->layers[i].type],
+ dimm->location[i]);
+
+ edac_dbg(4, "mapped as virtual row %d, chan %d\n",
+ dimm->csrow, dimm->cschannel);

- edac_dbg(4, "%s%i: %smapped as virtual row %d, chan %d\n",
- dimm->mci->csbased ? "rank" : "dimm",
- dimm->idx, location, dimm->csrow, dimm->cschannel);
edac_dbg(4, " dimm = %p\n", dimm);
edac_dbg(4, " dimm->label = '%s'\n", dimm->label);
edac_dbg(4, " dimm->nr_pages = 0x%x\n", dimm->nr_pages);

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-04-23 17:51:56

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 02/10] EDAC/mc: Use int type for parameters of edac_mc_alloc()

On Wed, Apr 22, 2020 at 01:58:06PM +0200, Robert Richter wrote:
> Most iterators use int type as index. mci->mc_idx is also type int. So
> use int type for parameters of edac_mc_alloc(). Extend the range check
> to check for negative values. There is a type cast now when assigning
> variable n_layers to mci->n_layer, it is safe due to the range check.
>
> While at it, rename the users of edac_mc_alloc() to mc_idx as this
> fits better here.
>
> Signed-off-by: Robert Richter <[email protected]>
> ---
> drivers/edac/edac_mc.c | 7 +++----
> drivers/edac/edac_mc.h | 6 +++---
> 2 files changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
> index 107d7c4de933..57d1d356d69c 100644
> --- a/drivers/edac/edac_mc.c
> +++ b/drivers/edac/edac_mc.c
> @@ -444,8 +444,7 @@ static int edac_mc_alloc_dimms(struct mem_ctl_info *mci)
> return 0;
> }
>
> -struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
> - unsigned int n_layers,
> +struct mem_ctl_info *edac_mc_alloc(int mc_idx, int n_layers,
> struct edac_mc_layer *layers,
> unsigned int sz_pvt)
> {
> @@ -456,7 +455,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
> void *pvt, *ptr = NULL;
> bool per_rank = false;
>
> - if (WARN_ON(n_layers > EDAC_MAX_LAYERS || n_layers == 0))
> + if (WARN_ON(mc_idx < 0 || n_layers < 1 || n_layers > EDAC_MAX_LAYERS))
> return NULL;

Yeah, no, this doesn't make sense to me. The memory controller number
and the number of layers can never ever be negative and thus signed.

And some drivers supply unsigned types and some signed. So if anything,
this should be fixing all the callers to supply unsigned quantities.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-04-23 17:57:07

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 03/10] EDAC/ghes: Remove unused members of struct ghes_edac_pvt, rename it to ghes_mci

On Wed, Apr 22, 2020 at 01:58:07PM +0200, Robert Richter wrote:
> The struct members list and ghes of struct ghes_edac_pvt are unused,
> remove them. On that occasion, rename it to struct ghes_mci. This is
> shorter and aligns better with the current naming scheme.
>
> Signed-off-by: Robert Richter <[email protected]>
> ---
> drivers/edac/ghes_edac.c | 13 +++++--------
> 1 file changed, 5 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> index cb3dab56a875..39efce0df881 100644
> --- a/drivers/edac/ghes_edac.c
> +++ b/drivers/edac/ghes_edac.c
> @@ -15,9 +15,7 @@
> #include "edac_module.h"
> #include <ras/ras_event.h>
>
> -struct ghes_edac_pvt {
> - struct list_head list;
> - struct ghes *ghes;
> +struct ghes_mci {

No, that should be "ghes_pvt" because it *is* ghes_edac's private
structure and there's also an mci pointer in it.

> struct mem_ctl_info *mci;
>
> /* Buffers for the error handling routine */
> @@ -32,7 +30,7 @@ static refcount_t ghes_refcount = REFCOUNT_INIT(0);
> * also provides the necessary (implicit) memory barrier for the SMP
> * case to make the pointer visible on another CPU.
> */
> -static struct ghes_edac_pvt *ghes_pvt;
> +static struct ghes_mci *ghes_pvt;
>
> /* GHES registration mutex */
> static DEFINE_MUTEX(ghes_reg_mutex);
> @@ -203,7 +201,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
> {
> struct edac_raw_error_desc *e;
> struct mem_ctl_info *mci;
> - struct ghes_edac_pvt *pvt;
> + struct ghes_mci *pvt;
> unsigned long flags;
> char *p;
>
> @@ -457,7 +455,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
> bool fake = false;
> int rc = 0, num_dimm = 0;
> struct mem_ctl_info *mci;
> - struct ghes_edac_pvt *pvt;
> + struct ghes_mci *pvt;
> struct edac_mc_layer layers[1];
> struct ghes_edac_dimm_fill dimm_fill;
> unsigned long flags;
> @@ -494,7 +492,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
> layers[0].size = num_dimm;
> layers[0].is_virt_csrow = true;
>
> - mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, sizeof(struct ghes_edac_pvt));
> + mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, sizeof(*pvt));

The sizeof() change doesn't make it better because now I have to go look
up what pvt is.

sizeof(struct ghes_pvt) tells you what size you're getting here.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-04-24 12:17:27

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 04/10] EDAC/ghes: Make SMBIOS handle private data to ghes

Hi Robert,

I love your patch! Yet something to improve:

[auto build test ERROR on ras/edac-for-next]
[also build test ERROR on linus/master v5.7-rc2 next-20200423]
[cannot apply to bp/for-next linux/master]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url: https://github.com/0day-ci/linux/commits/Robert-Richter/EDAC-mc-ghes-Fixes-cleanup-and-reworks/20200424-042828
base: https://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git edac-for-next
config: i386-allyesconfig (attached as .config)
compiler: gcc-7 (Ubuntu 7.5.0-6ubuntu2) 7.5.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <[email protected]>

All error/warnings (new ones prefixed by >>):

In file included from arch/x86/include/asm/bug.h:83:0,
from include/linux/bug.h:5,
from include/linux/debug_locks.h:7,
from include/linux/lockdep.h:44,
from include/linux/spinlock_types.h:18,
from include/linux/mutex.h:16,
from include/linux/kernfs.h:12,
from include/linux/sysfs.h:16,
from include/linux/kobject.h:20,
from include/linux/of.h:17,
from include/linux/irqdomain.h:35,
from include/linux/acpi.h:13,
from include/acpi/apei.h:9,
from include/acpi/ghes.h:5,
from drivers/edac/ghes_edac.c:12:
drivers/edac/ghes_edac.c: In function 'ghes_dimm_pool_create':
>> include/linux/lockdep.h:409:52: error: invalid type argument of '->' (have 'struct mutex')
#define lockdep_is_held(lock) lock_is_held(&(lock)->dep_map)
^
include/asm-generic/bug.h:113:25: note: in definition of macro 'WARN_ON'
int __ret_warn_on = !!(condition); \
^~~~~~~~~
include/linux/lockdep.h:435:27: note: in expansion of macro 'lockdep_is_held'
WARN_ON(debug_locks && !lockdep_is_held(l)); \
^~~~~~~~~~~~~~~
>> drivers/edac/ghes_edac.c:104:2: note: in expansion of macro 'lockdep_assert_held'
lockdep_assert_held(ghes_reg_mutex);
^~~~~~~~~~~~~~~~~~~
drivers/edac/ghes_edac.c: In function 'ghes_dimm_pool_destroy':
>> include/linux/lockdep.h:409:52: error: invalid type argument of '->' (have 'struct mutex')
#define lockdep_is_held(lock) lock_is_held(&(lock)->dep_map)
^
include/asm-generic/bug.h:113:25: note: in definition of macro 'WARN_ON'
int __ret_warn_on = !!(condition); \
^~~~~~~~~
include/linux/lockdep.h:435:27: note: in expansion of macro 'lockdep_is_held'
WARN_ON(debug_locks && !lockdep_is_held(l)); \
^~~~~~~~~~~~~~~
drivers/edac/ghes_edac.c:118:2: note: in expansion of macro 'lockdep_assert_held'
lockdep_assert_held(ghes_reg_mutex);
^~~~~~~~~~~~~~~~~~~
drivers/edac/ghes_edac.c: In function 'ghes_dimm_alloc':
>> include/linux/lockdep.h:409:52: error: invalid type argument of '->' (have 'struct mutex')
#define lockdep_is_held(lock) lock_is_held(&(lock)->dep_map)
^
include/asm-generic/bug.h:113:25: note: in definition of macro 'WARN_ON'
int __ret_warn_on = !!(condition); \
^~~~~~~~~
include/linux/lockdep.h:435:27: note: in expansion of macro 'lockdep_is_held'
WARN_ON(debug_locks && !lockdep_is_held(l)); \
^~~~~~~~~~~~~~~
drivers/edac/ghes_edac.c:127:2: note: in expansion of macro 'lockdep_assert_held'
lockdep_assert_held(ghes_reg_mutex);
^~~~~~~~~~~~~~~~~~~
drivers/edac/ghes_edac.c: In function 'ghes_dimm_release':
>> include/linux/lockdep.h:409:52: error: invalid type argument of '->' (have 'struct mutex')
#define lockdep_is_held(lock) lock_is_held(&(lock)->dep_map)
^
include/asm-generic/bug.h:113:25: note: in definition of macro 'WARN_ON'
int __ret_warn_on = !!(condition); \
^~~~~~~~~
include/linux/lockdep.h:435:27: note: in expansion of macro 'lockdep_is_held'
WARN_ON(debug_locks && !lockdep_is_held(l)); \
^~~~~~~~~~~~~~~
drivers/edac/ghes_edac.c:144:2: note: in expansion of macro 'lockdep_assert_held'
lockdep_assert_held(ghes_reg_mutex);
^~~~~~~~~~~~~~~~~~~

vim +/lockdep_assert_held +104 drivers/edac/ghes_edac.c

96
97 static int ghes_dimm_pool_create(int num_dimm)
98 {
99 struct ghes_dimm *ghes_dimm;
100
101 if (!num_dimm)
102 return 0;
103
> 104 lockdep_assert_held(ghes_reg_mutex);
105
106 dimms = kcalloc(num_dimm, sizeof(*dimms), GFP_KERNEL);
107 if (!dimms)
108 return -ENOMEM;
109
110 for (ghes_dimm = dimms; ghes_dimm < dimms + num_dimm; ghes_dimm++)
111 list_add(&ghes_dimm->entry, &ghes_dimm_pool);
112
113 return 0;
114 }
115

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (5.53 kB)
.config.gz (70.57 kB)
Download all attachments

2020-04-24 16:28:02

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 04/10] EDAC/ghes: Make SMBIOS handle private data to ghes

On Wed, Apr 22, 2020 at 01:58:08PM +0200, Robert Richter wrote:
> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> index 39efce0df881..23adb7674f9b 100644
> --- a/drivers/edac/ghes_edac.c
> +++ b/drivers/edac/ghes_edac.c
> @@ -15,6 +15,12 @@
> #include "edac_module.h"
> #include <ras/ras_event.h>
>
> +struct ghes_dimm {

Simply struct dimm

> + struct list_head entry;
> + struct dimm_info *dimm;
> + u16 handle;
> +};
> +
> struct ghes_mci {
> struct mem_ctl_info *mci;
>
> @@ -42,6 +48,16 @@ static DEFINE_MUTEX(ghes_reg_mutex);
> */
> static DEFINE_SPINLOCK(ghes_lock);
>
> +/*
> + * Locking:
> + *
> + * dimms, ghes_dimm_pool: ghes_reg_mutex
> + * ghes_dimm_list: ghes_lock
> + */
> +static struct ghes_dimm *dimms;
> +static LIST_HEAD(ghes_dimm_list);
> +static LIST_HEAD(ghes_dimm_pool);

Those are static lists, no need to prefix them with "ghes_". There's too
much "ghes" in that code. :)

> +
> /* "ghes_edac.force_load=1" skips the platform check */
> static bool __read_mostly force_load;
> module_param(force_load, bool, 0);
> @@ -72,11 +88,63 @@ struct memdev_dmi_entry {
> u16 conf_mem_clk_speed;
> } __attribute__((__packed__));
>
> -struct ghes_edac_dimm_fill {
> +struct dimm_fill {
> + struct list_head dimms;
> struct mem_ctl_info *mci;
> unsigned int count;
> };
>
> +static int ghes_dimm_pool_create(int num_dimm)

Yeah, drop "ghes_" here too. I'm not going to comment on this in the
rest of the patchset but for the next version, please drop the "ghes_"
prefix from static functions and members - it unnecessarily gets in the
way when reading the code.

> +{
> + struct ghes_dimm *ghes_dimm;
> +
> + if (!num_dimm)
> + return 0;
> +
> + lockdep_assert_held(ghes_reg_mutex);
> +
> + dimms = kcalloc(num_dimm, sizeof(*dimms), GFP_KERNEL);
> + if (!dimms)
> + return -ENOMEM;
> +
> + for (ghes_dimm = dimms; ghes_dimm < dimms + num_dimm; ghes_dimm++)

And with the above shortening of names, this loop becomes:

for (d = dimms; d < dimms + num_dimms; d++)
list_add(&d->entry, &dimm_pool);

Simple.

> +
> + return 0;
> +}
> +
> +static void ghes_dimm_pool_destroy(void)
> +{
> + lockdep_assert_held(ghes_reg_mutex);
> + INIT_LIST_HEAD(&ghes_dimm_pool);
> + kfree(dimms);
> +}
> +
> +static struct ghes_dimm *ghes_dimm_alloc(struct dimm_info *dimm, u16 handle)
> +{
> + struct ghes_dimm *ghes_dimm;
> +
> + lockdep_assert_held(ghes_reg_mutex);

The 0day bot caught it already - this needs to be a ptr. Please test
with PROVE_LOCKING enabled before sending next time.

> +
> + ghes_dimm = list_first_entry_or_null(&ghes_dimm_pool,
> + struct ghes_dimm, entry);

Let that line stick out.

> +
> + /* should be always non-zero */
> + if (!WARN_ON_ONCE(!ghes_dimm)) {
> + ghes_dimm->dimm = dimm;
> + ghes_dimm->handle = handle;
> + list_del(&ghes_dimm->entry);
> + }
> +
> + return ghes_dimm;
> +}
> +
> +static void ghes_dimm_release(struct list_head *dimms)
> +{
> + lockdep_assert_held(ghes_reg_mutex);
> + list_splice(dimms, &ghes_dimm_pool);
> +}
> +
> static void ghes_edac_count_dimms(const struct dmi_header *dh, void *arg)
> {
> int *num_dimm = arg;

...

> @@ -547,12 +626,18 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
>
> spin_lock_irqsave(&ghes_lock, flags);
> ghes_pvt = pvt;
> + list_splice_tail(&dimm_fill.dimms, &ghes_dimm_list);
> spin_unlock_irqrestore(&ghes_lock, flags);
>
> /* only set on success */
> refcount_set(&ghes_refcount, 1);
>
> unlock:
> + if (rc < 0) {
> + ghes_dimm_pool_destroy();
> + pr_err("Can't register at EDAC core: %d\n", rc);

with the EDAC core:

> + }
> +
> mutex_unlock(&ghes_reg_mutex);
>
> return rc;
> @@ -562,6 +647,7 @@ void ghes_edac_unregister(struct ghes *ghes)
> {
> struct mem_ctl_info *mci;
> unsigned long flags;
> + LIST_HEAD(dimm_list);
>
> mutex_lock(&ghes_reg_mutex);
>
> @@ -574,14 +660,19 @@ void ghes_edac_unregister(struct ghes *ghes)
> spin_lock_irqsave(&ghes_lock, flags);
> mci = ghes_pvt ? ghes_pvt->mci : NULL;
> ghes_pvt = NULL;
> + list_splice_init(&ghes_dimm_list, &dimm_list);

Why do you need to do this?

Can't you simply do:

ghes_dimm_release(&ghes_dimm_list);

here?

Btw, please add an explanation above ghes_dimm_list and ghes_dimm_pool
what those are and what the rules are: stuff gets added on register to
what list and freed on unreg from what list, etc. So that it is clear
upon a quick glance.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-04-27 07:09:52

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 06/10] EDAC/ghes: Remove local variable rdr_mask in ghes_edac_dmidecode()

On Wed, Apr 22, 2020 at 01:58:10PM +0200, Robert Richter wrote:
> The local variable rdr_mask serves as a static constant here. It hides
> what the code is doing. Remove it and replace it with the actual logic
> that checks some bits.
>
> Signed-off-by: Robert Richter <[email protected]>
> ---
> drivers/edac/ghes_edac.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> index a5890afa9c71..038e560fd332 100644
> --- a/drivers/edac/ghes_edac.c
> +++ b/drivers/edac/ghes_edac.c
> @@ -191,7 +191,6 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
> if (dh->type == DMI_ENTRY_MEM_DEVICE) {
> struct memdev_dmi_entry *entry = (struct memdev_dmi_entry *)dh;
> struct dimm_info *dimm = edac_get_dimm(mci, dimm_fill->count, 0, 0);
> - u16 rdr_mask = BIT(7) | BIT(13);
>
> if (entry->size == 0xffff) {
> pr_info("Can't get DIMM%i size\n",
> @@ -241,7 +240,8 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
> default:
> if (entry->type_detail & BIT(6))
> dimm->mtype = MEM_RMBS;
> - else if ((entry->type_detail & rdr_mask) == rdr_mask)
> + else if ((entry->type_detail & BIT(7)) &&
> + (entry->type_detail & BIT(13)))

Well, "checks some bits" doesn't make it more telling than checking a
descriptive name like "rdr_mask" but ok, since we're assigning MEM_RDR
here, it is still clear what the check does.

Btw, please write it like this:

else if (entry->type_detail & (BIT(7) | BIT(13)))

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-04-27 14:03:44

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 07/10] EDAC/ghes: Cleanup struct ghes_edac_dimm_fill, rename it to ghes_dimm_fill

On Wed, Apr 22, 2020 at 01:58:11PM +0200, Robert Richter wrote:
> The struct is used to store temporary data for the dmidecode callback.
> Clean this up a bit:
>
> 1) Rename member count to index since this is what it is used for.
>
> 2) Move code close to ghes_edac_dmidecode() where it is used.
>
> 3) While at it, use edac_get_dimm_by_index().
>
> Signed-off-by: Robert Richter <[email protected]>
> ---
> drivers/edac/ghes_edac.c | 24 ++++++++++++------------
> 1 file changed, 12 insertions(+), 12 deletions(-)

Ok except the commit title is wrong. And yes, pls keep it "dimm_fill" -
short and sweet and without yet another "ghes" in the name. :)

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-04-27 16:43:35

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 08/10] EDAC/ghes: Carve out MC device handling into separate functions

On Wed, Apr 22, 2020 at 01:58:12PM +0200, Robert Richter wrote:
> The functions are too long, carve out code that handles MC devices
> into the new functions ghes_mc_create(), ghes_mc_add_or_free() and
> ghes_mc_free(). Apart from better code readability the functions can
> be reused and the implementation of the error paths becomes easier.
>
> Signed-off-by: Robert Richter <[email protected]>
> ---
> drivers/edac/ghes_edac.c | 141 +++++++++++++++++++++++----------------
> 1 file changed, 83 insertions(+), 58 deletions(-)
>
> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> index 4eadc5b344c8..af0a769071f4 100644
> --- a/drivers/edac/ghes_edac.c
> +++ b/drivers/edac/ghes_edac.c
> @@ -535,16 +535,88 @@ static struct acpi_platform_list plat_list[] = {
> { } /* End */
> };
>
> -int ghes_edac_register(struct ghes *ghes, struct device *dev)
> +static struct mem_ctl_info *ghes_mc_create(struct device *dev, int mc_idx,
> + int num_dimm)

Align arguments on the opening brace. The other functions need that too.

> {
> - bool fake = false;
> - int rc = 0, num_dimm = 0;
> + struct edac_mc_layer layers[1];
> struct mem_ctl_info *mci;
> struct ghes_mci *pvt;
> - struct edac_mc_layer layers[1];
> - struct dimm_fill dimm_fill;
> +
> + layers[0].type = EDAC_MC_LAYER_ALL_MEM;
> + layers[0].size = num_dimm;
> + layers[0].is_virt_csrow = true;
> +
> + mci = edac_mc_alloc(mc_idx, ARRAY_SIZE(layers), layers, sizeof(*pvt));
> + if (!mci)
> + return NULL;
> +
> + pvt = mci->pvt_info;
> + pvt->mci = mci;
> +
> + mci->pdev = dev;
> + mci->mtype_cap = MEM_FLAG_EMPTY;
> + mci->edac_ctl_cap = EDAC_FLAG_NONE;
> + mci->edac_cap = EDAC_FLAG_NONE;
> + mci->mod_name = "ghes_edac.c";
> + mci->ctl_name = "ghes_edac";
> + mci->dev_name = "ghes";
> +
> + return mci;
> +}
> +
> +static int ghes_mc_add_or_free(struct mem_ctl_info *mci,
> + struct list_head *dimm_list)

No, I think we talked about this already. This function should be
called:

ghes_mc_add()

and should do one thing and one thing only in good old unix tradition:
add the MC.

> +{
> unsigned long flags;
> - int idx = -1;
> + int rc;
> +
> + rc = edac_mc_add_mc(mci);
> + if (rc < 0) {

> + ghes_dimm_release(dimm_list);
> + edac_mc_free(mci);
> + return rc;

Those last three lines should be called by the *caller* of
ghes_mc_add(), when latter returns an error value.

> + }
> +
> + spin_lock_irqsave(&ghes_lock, flags);
> + ghes_pvt = mci->pvt_info;
> + list_splice_tail(dimm_list, &ghes_dimm_list);
> + spin_unlock_irqrestore(&ghes_lock, flags);
> +
> + return 0;
> +}
> +
> +static void ghes_mc_free(void)
> +{
> + struct mem_ctl_info *mci;
> + unsigned long flags;
> + LIST_HEAD(dimm_list);
> +
> + /*
> + * Wait for the irq handler being finished.
> + */
> + spin_lock_irqsave(&ghes_lock, flags);
> + mci = ghes_pvt ? ghes_pvt->mci : NULL;
> + ghes_pvt = NULL;
> + list_splice_init(&ghes_dimm_list, &dimm_list);
> + spin_unlock_irqrestore(&ghes_lock, flags);
> +
> + ghes_dimm_release(&dimm_list);
> +
> + if (!mci)
> + return;
> +
> + mci = edac_mc_del_mc(mci->pdev);
> + if (mci)
> + edac_mc_free(mci);
> +}

This function needs to do only freeing of the mc. The list splicing and
dimm releasing needs to be done by its caller, before calling it.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-04-27 17:28:16

by Luck, Tony

[permalink] [raw]
Subject: Re: [PATCH v2 06/10] EDAC/ghes: Remove local variable rdr_mask in ghes_edac_dmidecode()

On Mon, Apr 27, 2020 at 09:08:02AM +0200, Borislav Petkov wrote:
> > if (entry->type_detail & BIT(6))
> > dimm->mtype = MEM_RMBS;
> > - else if ((entry->type_detail & rdr_mask) == rdr_mask)
> > + else if ((entry->type_detail & BIT(7)) &&
> > + (entry->type_detail & BIT(13)))
>
> Well, "checks some bits" doesn't make it more telling than checking a
> descriptive name like "rdr_mask" but ok, since we're assigning MEM_RDR
> here, it is still clear what the check does.
>
> Btw, please write it like this:
>
> else if (entry->type_detail & (BIT(7) | BIT(13)))

That isn't the same. The previous version checked that BOTH bits
7 and 13 were set. Your version checks for either bit.

Looks like the original with the local variable was checking for both
bits set.

-Tony

2020-04-27 17:37:37

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 06/10] EDAC/ghes: Remove local variable rdr_mask in ghes_edac_dmidecode()

On Mon, Apr 27, 2020 at 10:24:08AM -0700, Luck, Tony wrote:
> That isn't the same. The previous version checked that BOTH bits
> 7 and 13 were set. Your version checks for either bit.

Whoops, I'm confused again. ;-\

> Looks like the original with the local variable was checking for both
> bits set.

Yeah, let's leave it as it is. I prefer the rdr_mask thing.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-05-05 07:55:02

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH v2 03/10] EDAC/ghes: Remove unused members of struct ghes_edac_pvt, rename it to ghes_mci

On 23.04.20 19:55:17, Borislav Petkov wrote:
> On Wed, Apr 22, 2020 at 01:58:07PM +0200, Robert Richter wrote:

> > diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> > index cb3dab56a875..39efce0df881 100644
> > --- a/drivers/edac/ghes_edac.c
> > +++ b/drivers/edac/ghes_edac.c
> > @@ -15,9 +15,7 @@
> > #include "edac_module.h"
> > #include <ras/ras_event.h>
> >
> > -struct ghes_edac_pvt {
> > - struct list_head list;
> > - struct ghes *ghes;
> > +struct ghes_mci {
>
> No, that should be "ghes_pvt" because it *is* ghes_edac's private
> structure and there's also an mci pointer in it.

The ghes driver will use private data for both structs, mci and
dimm_info. Thus I named it ghes_mci and ghes_dimm (see next patch) as
they are counterparts. I could name it "ghes_pvt", but the meaning
would be less obvious. Same for your suggestion in the next patch,
struct dimm is too general and could cause namespace conflicts with
other code (think of cscope etc.).

-Robert

2020-05-05 12:50:48

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH v2 04/10] EDAC/ghes: Make SMBIOS handle private data to ghes

On 24.04.20 18:21:57, Borislav Petkov wrote:
> On Wed, Apr 22, 2020 at 01:58:08PM +0200, Robert Richter wrote:

> > @@ -562,6 +647,7 @@ void ghes_edac_unregister(struct ghes *ghes)
> > {
> > struct mem_ctl_info *mci;
> > unsigned long flags;
> > + LIST_HEAD(dimm_list);
> >
> > mutex_lock(&ghes_reg_mutex);
> >
> > @@ -574,14 +660,19 @@ void ghes_edac_unregister(struct ghes *ghes)
> > spin_lock_irqsave(&ghes_lock, flags);
> > mci = ghes_pvt ? ghes_pvt->mci : NULL;
> > ghes_pvt = NULL;
> > + list_splice_init(&ghes_dimm_list, &dimm_list);
>
> Why do you need to do this?
>
> Can't you simply do:
>
> ghes_dimm_release(&ghes_dimm_list);
>
> here?

This decouples the locking. Otherwise ghes_dimm_release() would be
called with ghes_lock held which I want to avoid to keep the locking
simple.

-Robert

2020-05-06 08:50:14

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH v2 08/10] EDAC/ghes: Carve out MC device handling into separate functions

On 27.04.20 18:38:56, Borislav Petkov wrote:
> On Wed, Apr 22, 2020 at 01:58:12PM +0200, Robert Richter wrote:

> > +static int ghes_mc_add_or_free(struct mem_ctl_info *mci,
> > + struct list_head *dimm_list)
>
> No, I think we talked about this already. This function should be
> called:
>
> ghes_mc_add()
>
> and should do one thing and one thing only in good old unix tradition:
> add the MC.
>
> > +{
> > unsigned long flags;
> > - int idx = -1;
> > + int rc;
> > +
> > + rc = edac_mc_add_mc(mci);
> > + if (rc < 0) {
>
> > + ghes_dimm_release(dimm_list);
> > + edac_mc_free(mci);
> > + return rc;
>
> Those last three lines should be called by the *caller* of
> ghes_mc_add(), when latter returns an error value.

These direct operations are nothing a caller should deal with.

The caller does now:

mci = ghes_mc_create(...);
... /* prepare dimms */
return ghes_mc_add_or_free(...);

To shut it down we just use:

ghes_mc_free();

Pretty simple.

Now, lets look at your suggestion to put it out of the function. A
caller always needs to free the mci and dimms, so we will get:

int rc;
mci = ghes_mc_create(...);
... /* prepare dimms */
rc = ghes_mc_add(...);
if (rc < 0) {
/* free mci */
/* free dimms */
...
}
return rc;

We loose the tail call and simplicity here. Note this duplicates code
as there are 2 users of ghes_mc_add().

Now, the caller does not know the implementation details, so we need
to provide another release function (let's call it *_release() here):

mci = ghes_mc_create(...);
... /* prepare dimms */
rc = ghes_mc_add(...);
if (rc < 0) {
ghes_mc_release(mci);
ghes_dimm_release(dimm_list);
}
return rc;

Ok, now there is another function needed to release everything.

This design also impacts ghes_mc_free(). So the shutdown
implementation would turn to:

struct mem_ctl_info *mci;
...
mci = ghes_mc_del();
ghes_mc_release(mci);
...

I don't see any benefit. See also below for the delta of an
implementation of the suggested changes.

>
> > + }
> > +
> > + spin_lock_irqsave(&ghes_lock, flags);
> > + ghes_pvt = mci->pvt_info;
> > + list_splice_tail(dimm_list, &ghes_dimm_list);
> > + spin_unlock_irqrestore(&ghes_lock, flags);
> > +
> > + return 0;
> > +}
> > +
> > +static void ghes_mc_free(void)
> > +{
> > + struct mem_ctl_info *mci;
> > + unsigned long flags;
> > + LIST_HEAD(dimm_list);
> > +
> > + /*
> > + * Wait for the irq handler being finished.
> > + */
> > + spin_lock_irqsave(&ghes_lock, flags);
> > + mci = ghes_pvt ? ghes_pvt->mci : NULL;
> > + ghes_pvt = NULL;
> > + list_splice_init(&ghes_dimm_list, &dimm_list);
> > + spin_unlock_irqrestore(&ghes_lock, flags);
> > +
> > + ghes_dimm_release(&dimm_list);
> > +
> > + if (!mci)
> > + return;
> > +
> > + mci = edac_mc_del_mc(mci->pdev);
> > + if (mci)
> > + edac_mc_free(mci);
> > +}
>
> This function needs to do only freeing of the mc. The list splicing and
> dimm releasing needs to be done by its caller, before calling it.

ghes_mc_free() is the counterpart to ghes_mc_add() and thus needs to
also handle the dimm_list here. This cannot be left to the caller.

Considering all the above, I don't see how your suggestions to the
interface could improve the code. Hmm...

Below an implementation that illustrates the changes.

Thanks,

-Robert

---
drivers/edac/ghes_edac.c | 41 +++++++++++++++++++++++++++-------------
1 file changed, 28 insertions(+), 13 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 7f39346d895b..896d7b488fc2 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -576,18 +576,14 @@ static struct mem_ctl_info *ghes_mc_create(struct device *dev, int mc_idx,
return mci;
}

-static int ghes_mc_add_or_free(struct mem_ctl_info *mci,
- struct list_head *dimms)
+static int ghes_mc_add(struct mem_ctl_info *mci, struct list_head *dimms)
{
unsigned long flags;
int rc;

rc = edac_mc_add_mc(mci);
- if (rc < 0) {
- dimm_release(dimms);
- edac_mc_free(mci);
+ if (rc < 0)
return rc;
- }

spin_lock_irqsave(&ghes_lock, flags);
ghes_pvt = mci->pvt_info;
@@ -597,7 +593,7 @@ static int ghes_mc_add_or_free(struct mem_ctl_info *mci,
return 0;
}

-static void ghes_mc_free(void)
+static struct mem_ctl_info *ghes_mc_del(void)
{
struct mem_ctl_info *mci;
unsigned long flags;
@@ -614,10 +610,14 @@ static void ghes_mc_free(void)

dimm_release(&dimms);

- if (!mci)
- return;
+ if (mci)
+ mci = edac_mc_del_mc(mci->pdev);

- mci = edac_mc_del_mc(mci->pdev);
+ return mci;
+}
+
+static void ghes_mc_release(struct mem_ctl_info *mci)
+{
if (mci)
edac_mc_free(mci);
}
@@ -627,6 +627,7 @@ static int ghes_edac_register_fake(struct device *dev)
struct mem_ctl_info *mci;
struct dimm_info *dimm;
LIST_HEAD(empty);
+ int rc;

mci = ghes_mc_create(dev, 0, 1);
if (!mci)
@@ -642,13 +643,18 @@ static int ghes_edac_register_fake(struct device *dev)

snprintf(dimm->label, sizeof(dimm->label), "unknown memory");

- return ghes_mc_add_or_free(mci, &empty);
+ rc = ghes_mc_add(mci, &empty);
+ if (rc < 0)
+ ghes_mc_free(mci);
+
+ return rc;
}

static int ghes_edac_register_one(struct device *dev, int mc_idx, int num_dimm)
{
struct dimm_fill dimm_fill;
struct mem_ctl_info *mci;
+ int rc;

mci = ghes_mc_create(dev, mc_idx, num_dimm);
if (!mci)
@@ -660,7 +666,13 @@ static int ghes_edac_register_one(struct device *dev, int mc_idx, int num_dimm)

dmi_walk(ghes_edac_dmidecode, &dimm_fill);

- return ghes_mc_add_or_free(mci, &dimm_fill.dimms);
+ rc = ghes_mc_add(mci, &dimm_fill.dimms);
+ if (rc < 0) {
+ dimm_release(&dimm_fill.dimms);
+ ghes_mc_release(mci);
+ }
+
+ return rc;
}

int ghes_edac_register(struct ghes *ghes, struct device *dev)
@@ -740,10 +752,13 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)

void ghes_edac_unregister(struct ghes *ghes)
{
+ struct mem_ctl_info *mci;
+
mutex_lock(&ghes_reg_mutex);

if (refcount_dec_and_test(&ghes_refcount)) {
- ghes_mc_free();
+ mci = ghes_mc_del();
+ ghes_mc_release(mci);
dimm_pool_destroy();
}

--
2.20.1

2020-05-06 08:55:21

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH v2 00/10] EDAC/mc/ghes: Fixes, cleanup and reworks

Boris,

On 22.04.20 13:58:04, Robert Richter wrote:
> This series contains edac fixes and a significant cleanup and rework
> of the ghes driver:
>
> * fixes and updates for edac_mc (patches #1, #2),
>
> * removed smbios_handle from struct dimm_info (patch #4),
>
> * fix of DIMM label in error reports (patch #5),
>
> * general ghes_edac cleanup and rework (patches #3, #6-#10).
>
> First 2 patches for edac_mc are individual patches, the remaining
> patches do not depend on them.
>
> Tested on a Marvell/Cavium ThunderX2 Sabre (dual socket) system.
>
> v1:
> * https://lore.kernel.org/patchwork/cover/1205901/
>
> v2:
> * reordered patches to have fixes and struct changes first, code
> refactoring patches last,
> * dropped v1 patches #9 to #11 (multiple conrollers) to handle them
> in a separate series,
> * rewrote patch to remove smbios_handle (based on v1 #9): EDAC/ghes:
> Move smbios_handle from struct dimm_info to ghes private data,
> * added lockdep_assert_held() checkers,
> * renamed struct ghes_dimm_fill to struct dimm_fill,
> * renamed local variable dimms to dimm_list to avoid conflict with
> the global variable,
> * removed dimm list for "fake" controller,
> * fixed return code check to use (rc < 0),
> * added: EDAC/mc: Fix usage of snprintf() and dimm location setup

thanks for review.

I have addressed all of your review comments if not otherwise noted.
Please take a look at my replies to you. I am a bit unsure on how to
proceed with 08/10. I have sent you a detailed explanation and hope we
can find a solution soon. I could send a v3 then.

Thanks,

-Robert

2020-05-11 13:36:25

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 08/10] EDAC/ghes: Carve out MC device handling into separate functions

Hi,

see below. It probably doesn't work but this is what it should do -
straightforward and simple.

And now that I've looked at this in more detail, this whole DIMM
counting is still not doing what it should do.

So lemme try again:

1. Counting the DIMMs.

You should add a function which is called

ghes_count_dimms()

or so, which does:

/* Get the number of DIMMs */
dmi_walk(ghes_edac_count_dimms, &num_dimm);

That function should run once at entry of ghes_edac_register().

When that function is done, it should spit out a data structure which
has this mapping:

memory controller DIMMs
0 0-x
1 x+1 - y
2 y+1 - ...

and so on.

This basically will contain the DIMM layout of the system along with the
mapping which memory controller has which DIMMs.

So that you don't have to do edac_get_dimm_by_index() and rely on having
called edac_mc_alloc() *prior* to calling edac_get_dimm_by_index() so
that mci->dimms[] is properly populated and former can give you a proper
dimm_info pointer.

You can also merge the functionality of ghes_edac_dmidecode() together
and so on so that you scan all the DIMM information needed at *once* so
that you can allocate an mci properly.

2. Memory controller allocation. Once the parsing is done, you do

edac_mc_alloc()

for the respective memory controller using the mapping above which you
have parsed at init.

3. When the allocation is done, you set the proper DIMM handles of
the dimms of this mci so that find_dimm_by_handle() can find the DIMM
correctly.

Now, here's the important part: if ghes_edac_report_mem_error() can get
the memory controller which is in error - struct cper_sec_mem_err has a
node member but that all depends on whether your BIOS even populates it
properly - then find_dimm_by_handle() can even take an mci and search
only through the DIMMs of that memory controller by the handle.

If the BIOS does not populate them properly, then you can simply search
that mapping list of *all* DIMMs which ghes_count_dimms() has created
before.

4. You lastly add edac_mc_add_mc() and all good.

The benefits of this whole approach is that you will have a single data
structure representating the DIMMs in the system and you can traverse
it back'n'forth.

The code will be simple and straight-forward and not with those DMI
callbacks here and there, picking out pieces of needed info instead of
doing a whole system scan *once* at *init* and then working with the
parsed info.

I sincerely hope that this makes sense.

Thx.

---
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 4eadc5b344c8..396945634a2a 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -535,15 +535,83 @@ static struct acpi_platform_list plat_list[] = {
{ } /* End */
};

+static struct mem_ctl_info *ghes_mc_alloc(struct device *dev, int mc_idx,
+ int num_dimm)
+{
+ struct edac_mc_layer layers[1];
+ struct mem_ctl_info *mci;
+ struct ghes_mci *pvt;
+
+ layers[0].type = EDAC_MC_LAYER_ALL_MEM;
+ layers[0].size = num_dimm;
+ layers[0].is_virt_csrow = true;
+
+ mci = edac_mc_alloc(mc_idx, ARRAY_SIZE(layers), layers, sizeof(*pvt));
+ if (!mci)
+ return NULL;
+
+ pvt = mci->pvt_info;
+ pvt->mci = mci;
+
+ mci->pdev = dev;
+ mci->mtype_cap = MEM_FLAG_EMPTY;
+ mci->edac_ctl_cap = EDAC_FLAG_NONE;
+ mci->edac_cap = EDAC_FLAG_NONE;
+ mci->mod_name = "ghes_edac.c";
+ mci->ctl_name = "ghes_edac";
+ mci->dev_name = "ghes";
+
+ return mci;
+}
+
+static int ghes_mc_add(struct mem_ctl_info *mci, struct list_head *dimm_list)
+{
+ unsigned long flags;
+ int rc;
+
+ rc = edac_mc_add_mc(mci);
+ if (rc < 0)
+ return rc;
+
+ spin_lock_irqsave(&ghes_lock, flags);
+ ghes_pvt = mci->pvt_info;
+ list_splice_tail(dimm_list, &ghes_dimm_list);
+ spin_unlock_irqrestore(&ghes_lock, flags);
+
+ return 0;
+}
+
+static void ghes_mc_free(void)
+{
+ struct mem_ctl_info *mci;
+ unsigned long flags;
+ LIST_HEAD(dimm_list);
+
+ /*
+ * Wait for the irq handler being finished.
+ */
+ spin_lock_irqsave(&ghes_lock, flags);
+ mci = ghes_pvt ? ghes_pvt->mci : NULL;
+ ghes_pvt = NULL;
+ list_splice_init(&ghes_dimm_list, &dimm_list);
+ spin_unlock_irqrestore(&ghes_lock, flags);
+
+ ghes_dimm_release(&dimm_list);
+
+ if (!mci)
+ return;
+
+ mci = edac_mc_del_mc(mci->pdev);
+ if (mci)
+ edac_mc_free(mci);
+}
+
int ghes_edac_register(struct ghes *ghes, struct device *dev)
{
bool fake = false;
int rc = 0, num_dimm = 0;
struct mem_ctl_info *mci;
- struct ghes_mci *pvt;
- struct edac_mc_layer layers[1];
struct dimm_fill dimm_fill;
- unsigned long flags;
int idx = -1;

if (IS_ENABLED(CONFIG_X86)) {
@@ -577,27 +645,12 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
num_dimm = 1;
}

- layers[0].type = EDAC_MC_LAYER_ALL_MEM;
- layers[0].size = num_dimm;
- layers[0].is_virt_csrow = true;
-
- mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, sizeof(*pvt));
+ mci = ghes_mc_alloc(dev, 0, num_dimm);
if (!mci) {
rc = -ENOMEM;
- goto unlock;
+ goto release;
}

- pvt = mci->pvt_info;
- pvt->mci = mci;
-
- mci->pdev = dev;
- mci->mtype_cap = MEM_FLAG_EMPTY;
- mci->edac_ctl_cap = EDAC_FLAG_NONE;
- mci->edac_cap = EDAC_FLAG_NONE;
- mci->mod_name = "ghes_edac.c";
- mci->ctl_name = "ghes_edac";
- mci->dev_name = "ghes";
-
if (fake) {
pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
@@ -627,21 +680,21 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
dimm->edac_mode = EDAC_SECDED;
}

- rc = edac_mc_add_mc(mci);
+ rc = ghes_mc_add(mci, &dimm_fill.dimms);
if (rc < 0) {
- ghes_dimm_release(&dimm_fill.dimms);
- edac_mc_free(mci);
rc = -ENODEV;
- goto unlock;
+ goto mc_free;
}

- spin_lock_irqsave(&ghes_lock, flags);
- ghes_pvt = pvt;
- list_splice_tail(&dimm_fill.dimms, &ghes_dimm_list);
- spin_unlock_irqrestore(&ghes_lock, flags);
-
/* only set on success */
refcount_set(&ghes_refcount, 1);
+ goto unlock;
+
+mc_free:
+ edac_mc_free(mci);
+
+release:
+ ghes_dimm_release(&dimm_fill.dimms);

unlock:
if (rc < 0) {
@@ -656,35 +709,12 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)

void ghes_edac_unregister(struct ghes *ghes)
{
- struct mem_ctl_info *mci;
- unsigned long flags;
- LIST_HEAD(dimm_list);
-
mutex_lock(&ghes_reg_mutex);

- if (!refcount_dec_and_test(&ghes_refcount))
- goto unlock;
-
- /*
- * Wait for the irq handler being finished.
- */
- spin_lock_irqsave(&ghes_lock, flags);
- mci = ghes_pvt ? ghes_pvt->mci : NULL;
- ghes_pvt = NULL;
- list_splice_init(&ghes_dimm_list, &dimm_list);
- spin_unlock_irqrestore(&ghes_lock, flags);
-
- ghes_dimm_release(&dimm_list);
-
- if (!mci)
- goto unlock;
-
- mci = edac_mc_del_mc(mci->pdev);
- if (mci) {
- edac_mc_free(mci);
+ if (refcount_dec_and_test(&ghes_refcount)) {
+ ghes_mc_free();
ghes_dimm_pool_destroy();
}

-unlock:
mutex_unlock(&ghes_reg_mutex);
}

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-05-19 09:30:10

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH v2 01/10] EDAC/mc: Fix usage of snprintf() and dimm location setup

On 22.04.20 22:52:43, Borislav Petkov wrote:
> On Wed, Apr 22, 2020 at 01:58:05PM +0200, Robert Richter wrote:
> > The setup of the dimm->location may be incomplete in case writing to
> > dimm->label fails due to small buffer size. Fix this by iterating
> > through all existing layers.
> >
> > Also, the return value of snprintf() can be higher than the number of
> > bytes written to the buffer in case it is to small. Fix usage of
> > snprintf() by either porting it to scnprintf() or fixing the handling
> > of the return code.
> >
> > It is very unlikely the buffer is too small in practice, but fixing it
> > anyway.
> >
> > Signed-off-by: Robert Richter <[email protected]>
> > ---
> > drivers/edac/edac_mc.c | 20 ++++++++++----------
> > 1 file changed, 10 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
> > index 75ede27bdf6a..107d7c4de933 100644
> > --- a/drivers/edac/edac_mc.c
> > +++ b/drivers/edac/edac_mc.c
> > @@ -130,11 +130,11 @@ unsigned int edac_dimm_info_location(struct dimm_info *dimm, char *buf,
> > n = snprintf(p, len, "%s %d ",
> > edac_layer_name[mci->layers[i].type],
> > dimm->location[i]);
> > + if (len <= n)
> > + return count + len - 1;
> > p += n;
> > len -= n;
> > count += n;
> > - if (!len)
> > - break;
> > }
> >
> > return count;
> > @@ -397,19 +397,19 @@ static int edac_mc_alloc_dimms(struct mem_ctl_info *mci)
> > */
> > len = sizeof(dimm->label);
> > p = dimm->label;
> > - n = snprintf(p, len, "mc#%u", mci->mc_idx);
> > + n = scnprintf(p, len, "mc#%u", mci->mc_idx);
> > p += n;
> > len -= n;
> > +
> > for (layer = 0; layer < mci->n_layers; layer++) {
> > - n = snprintf(p, len, "%s#%u",
> > - edac_layer_name[mci->layers[layer].type],
> > - pos[layer]);
>
> The edac_layer_name[]'s are single words of a couple of letters and the
> pos is a number. The buffer we pass in is at least 80 chars and in one
> place even a PAGE_SIZE.
>
> But in general, this is just silly with the buffers on stack and
> printing into them.
>
> It would be much better to opencode that loop in
> edac_dimm_info_location() and simply dump those layer names at the call
> sites. And then kill that silly edac_dimm_info_location() function. See
> below for example.
>
> And then since two call sites do edac_dbg(), you can put that in a
> function edac_dbg_dump_dimm_location() or so and call it and not care
> about any buffer lengths and s*printf's and so on.
>
> Right?

The aim of this patch is just to fix snprintf() users. Anything else
would involve a larger cleanup. It is not only about edac_dbg(), there
are other users of edac_layer_name[] which implement similar things
that need to be looked at.

So I am dropping this patch from the series.

Thanks,

-Robert

2020-05-19 09:35:35

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH v2 02/10] EDAC/mc: Use int type for parameters of edac_mc_alloc()

On 23.04.20 19:49:34, Borislav Petkov wrote:
> On Wed, Apr 22, 2020 at 01:58:06PM +0200, Robert Richter wrote:
> > Most iterators use int type as index. mci->mc_idx is also type int. So
> > use int type for parameters of edac_mc_alloc(). Extend the range check
> > to check for negative values. There is a type cast now when assigning
> > variable n_layers to mci->n_layer, it is safe due to the range check.
> >
> > While at it, rename the users of edac_mc_alloc() to mc_idx as this
> > fits better here.
> >
> > Signed-off-by: Robert Richter <[email protected]>
> > ---
> > drivers/edac/edac_mc.c | 7 +++----
> > drivers/edac/edac_mc.h | 6 +++---
> > 2 files changed, 6 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
> > index 107d7c4de933..57d1d356d69c 100644
> > --- a/drivers/edac/edac_mc.c
> > +++ b/drivers/edac/edac_mc.c
> > @@ -444,8 +444,7 @@ static int edac_mc_alloc_dimms(struct mem_ctl_info *mci)
> > return 0;
> > }
> >
> > -struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
> > - unsigned int n_layers,
> > +struct mem_ctl_info *edac_mc_alloc(int mc_idx, int n_layers,
> > struct edac_mc_layer *layers,
> > unsigned int sz_pvt)
> > {
> > @@ -456,7 +455,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
> > void *pvt, *ptr = NULL;
> > bool per_rank = false;
> >
> > - if (WARN_ON(n_layers > EDAC_MAX_LAYERS || n_layers == 0))
> > + if (WARN_ON(mc_idx < 0 || n_layers < 1 || n_layers > EDAC_MAX_LAYERS))
> > return NULL;
>
> Yeah, no, this doesn't make sense to me. The memory controller number
> and the number of layers can never ever be negative and thus signed.
>
> And some drivers supply unsigned types and some signed. So if anything,
> this should be fixing all the callers to supply unsigned quantities.

mci->mc_idx is of type int and there is a cast here that should be
fixed. IMO that should be a signed int as some interfaces (esp. if you
search for an index) that require a negative value to report errors or
something could not be found.

So let's take this patch out of this series if you want have it
different.

Thanks,

-Robert

2020-05-19 09:36:51

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH v2 06/10] EDAC/ghes: Remove local variable rdr_mask in ghes_edac_dmidecode()

On 27.04.20 19:34:02, Borislav Petkov wrote:
> On Mon, Apr 27, 2020 at 10:24:08AM -0700, Luck, Tony wrote:
> > That isn't the same. The previous version checked that BOTH bits
> > 7 and 13 were set. Your version checks for either bit.
>
> Whoops, I'm confused again. ;-\
>
> > Looks like the original with the local variable was checking for both
> > bits set.
>
> Yeah, let's leave it as it is. I prefer the rdr_mask thing.

I am dropping this patch from the series.

Thanks,

-Robert

2020-05-19 09:37:27

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH v2 07/10] EDAC/ghes: Cleanup struct ghes_edac_dimm_fill, rename it to ghes_dimm_fill

On 27.04.20 16:00:43, Borislav Petkov wrote:
> On Wed, Apr 22, 2020 at 01:58:11PM +0200, Robert Richter wrote:
> > The struct is used to store temporary data for the dmidecode callback.
> > Clean this up a bit:
> >
> > 1) Rename member count to index since this is what it is used for.
> >
> > 2) Move code close to ghes_edac_dmidecode() where it is used.
> >
> > 3) While at it, use edac_get_dimm_by_index().
> >
> > Signed-off-by: Robert Richter <[email protected]>
> > ---
> > drivers/edac/ghes_edac.c | 24 ++++++++++++------------
> > 1 file changed, 12 insertions(+), 12 deletions(-)
>
> Ok except the commit title is wrong. And yes, pls keep it "dimm_fill" -
> short and sweet and without yet another "ghes" in the name. :)

Fixed the title.

Thanks,

-Robert

2020-05-19 10:03:54

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH v2 08/10] EDAC/ghes: Carve out MC device handling into separate functions

On 11.05.20 15:32:03, Borislav Petkov wrote:
> see below. It probably doesn't work but this is what it should do -
> straightforward and simple.
>
> And now that I've looked at this in more detail, this whole DIMM
> counting is still not doing what it should do.
>
> So lemme try again:

As you have major concerns with my code that deals with ghes private
dimm data, let's just keep smbios_handle in struct dimm_info. ATM I
do not see any alternative solution suggested how this could be
implemented. So I am going to drop patch '[PATCH v2 04/10] EDAC/ghes:
Make SMBIOS handle private data to ghes' from this series.

Thanks,

-Robert