2020-03-06 15:15:52

by Robert Richter

[permalink] [raw]
Subject: [PATCH 00/11] EDAC/ghes: Cleanup, rework and improvement of memory reporting

This series contains a significant cleanup and rework of the ghes
driver and improves the memory reporting as follows:

* fix of DIMM label in error reports (patch #2),

* creation of multiple memory controllers to group DIMMs depending on
the physical memory array (patches #9-#11). This should reflect the
memory topology of a system in sysfs. Esp. multi-node systems show
up with one memory controller per node now.

The changes base on the remaining patches that are a general cleanup
and rework:

* small change to edac_mc, not really dependent on the rest of the
series (patch #1),

* general cleanup and rework of the ghes driver (patches #3-#8).

The implementation of multiple memory controllers bases on the
suggestion from James (see patch #11), thank you James for your
valuable input here. The patches are created newly from scratch and
obsolete the GHES part of my previous postings a while ago that have
not been accepted upstream:

https://lore.kernel.org/patchwork/cover/1093488/

Tested on a Marvell/Cavium ThunderX2 Sabre (dual socket) system.


Robert Richter (11):
EDAC/mc: Use int type for parameters of edac_mc_alloc()
EDAC/ghes: Setup DIMM label from DMI and use it in error reports
EDAC/ghes: Remove local variable rdr_mask in ghes_edac_dmidecode()
EDAC/ghes: Remove unused members of struct ghes_edac_pvt, rename it to
ghes_mci
EDAC/ghes: Cleanup struct ghes_edac_dimm_fill, rename it to
ghes_dimm_fill
EDAC/ghes: Carve out MC device handling into separate functions
EDAC/ghes: Have a separate code path for creating the fake MC
EDAC/ghes: Carve out code into ghes_edac_register_{one,fake}()
EDAC/ghes: Implement DIMM mapping table for SMBIOS handles
EDAC/ghes: Create an own device for each mci
EDAC/ghes: Create one memory controller per physical memory array

drivers/edac/edac_mc.c | 7 +-
drivers/edac/edac_mc.h | 6 +-
drivers/edac/ghes_edac.c | 514 +++++++++++++++++++++++++++++----------
include/linux/edac.h | 2 -
4 files changed, 390 insertions(+), 139 deletions(-)

--
2.20.1


2020-03-06 15:17:02

by Robert Richter

[permalink] [raw]
Subject: [PATCH 01/11] EDAC/mc: Use int type for parameters of edac_mc_alloc()

Most iterators use int type as index. mci->mc_idx is also type int. So
use int type for parameters of edac_mc_alloc(). Extend the range check
to check for negative values. There is a type cast now when assigning
variable n_layers to mci->n_layer, it is safe due to the range check.

While at it, rename the users of edac_mc_alloc() to mc_idx as this
fits better here.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/edac/edac_mc.c | 7 +++----
drivers/edac/edac_mc.h | 6 +++---
2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 75ede27bdf6a..8bd3d00b4385 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -444,8 +444,7 @@ static int edac_mc_alloc_dimms(struct mem_ctl_info *mci)
return 0;
}

-struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
- unsigned int n_layers,
+struct mem_ctl_info *edac_mc_alloc(int mc_idx, int n_layers,
struct edac_mc_layer *layers,
unsigned int sz_pvt)
{
@@ -456,7 +455,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
void *pvt, *ptr = NULL;
bool per_rank = false;

- if (WARN_ON(n_layers > EDAC_MAX_LAYERS || n_layers == 0))
+ if (WARN_ON(mc_idx < 0 || n_layers < 1 || n_layers > EDAC_MAX_LAYERS))
return NULL;

/*
@@ -505,7 +504,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
pvt = sz_pvt ? (((char *)mci) + ((unsigned long)pvt)) : NULL;

/* setup index and various internal pointers */
- mci->mc_idx = mc_num;
+ mci->mc_idx = mc_idx;
mci->tot_dimms = tot_dimms;
mci->pvt_info = pvt;
mci->n_layers = n_layers;
diff --git a/drivers/edac/edac_mc.h b/drivers/edac/edac_mc.h
index 881b00eadf7a..4815f50afea0 100644
--- a/drivers/edac/edac_mc.h
+++ b/drivers/edac/edac_mc.h
@@ -98,7 +98,7 @@ do { \
/**
* edac_mc_alloc() - Allocate and partially fill a struct &mem_ctl_info.
*
- * @mc_num: Memory controller number
+ * @mc_idx: Memory controller number
* @n_layers: Number of MC hierarchy layers
* @layers: Describes each layer as seen by the Memory Controller
* @sz_pvt: size of private storage needed
@@ -122,8 +122,8 @@ do { \
* On success, return a pointer to struct mem_ctl_info pointer;
* %NULL otherwise
*/
-struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
- unsigned int n_layers,
+struct mem_ctl_info *edac_mc_alloc(int mc_idx,
+ int n_layers,
struct edac_mc_layer *layers,
unsigned int sz_pvt);

--
2.20.1

2020-03-10 20:20:41

by Aristeu Rozanski

[permalink] [raw]
Subject: Re: [PATCH 00/11] EDAC/ghes: Cleanup, rework and improvement of memory reporting

On Fri, Mar 06, 2020 at 04:13:07PM +0100, Robert Richter wrote:
> This series contains a significant cleanup and rework of the ghes
> driver and improves the memory reporting as follows:
>
> * fix of DIMM label in error reports (patch #2),
>
> * creation of multiple memory controllers to group DIMMs depending on
> the physical memory array (patches #9-#11). This should reflect the
> memory topology of a system in sysfs. Esp. multi-node systems show
> up with one memory controller per node now.
>
> The changes base on the remaining patches that are a general cleanup
> and rework:
>
> * small change to edac_mc, not really dependent on the rest of the
> series (patch #1),
>
> * general cleanup and rework of the ghes driver (patches #3-#8).
>
> The implementation of multiple memory controllers bases on the
> suggestion from James (see patch #11), thank you James for your
> valuable input here. The patches are created newly from scratch and
> obsolete the GHES part of my previous postings a while ago that have
> not been accepted upstream:
>
> https://lore.kernel.org/patchwork/cover/1093488/
>
> Tested on a Marvell/Cavium ThunderX2 Sabre (dual socket) system.

Acked-by: Aristeu Rozanski <[email protected]>

--
Aristeu

2020-04-06 11:39:25

by Matthias Brugger

[permalink] [raw]
Subject: Re: [PATCH 01/11] EDAC/mc: Use int type for parameters of edac_mc_alloc()

On 06/03/2020 16:13, Robert Richter wrote:
> Most iterators use int type as index. mci->mc_idx is also type int. So
> use int type for parameters of edac_mc_alloc(). Extend the range check
> to check for negative values. There is a type cast now when assigning
> variable n_layers to mci->n_layer, it is safe due to the range check.
>
> While at it, rename the users of edac_mc_alloc() to mc_idx as this
> fits better here.
>
> Signed-off-by: Robert Richter <[email protected]>

Reviewed-by: Matthias Brugger <[email protected]>

> ---
> drivers/edac/edac_mc.c | 7 +++----
> drivers/edac/edac_mc.h | 6 +++---
> 2 files changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
> index 75ede27bdf6a..8bd3d00b4385 100644
> --- a/drivers/edac/edac_mc.c
> +++ b/drivers/edac/edac_mc.c
> @@ -444,8 +444,7 @@ static int edac_mc_alloc_dimms(struct mem_ctl_info *mci)
> return 0;
> }
>
> -struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
> - unsigned int n_layers,
> +struct mem_ctl_info *edac_mc_alloc(int mc_idx, int n_layers,
> struct edac_mc_layer *layers,
> unsigned int sz_pvt)
> {
> @@ -456,7 +455,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
> void *pvt, *ptr = NULL;
> bool per_rank = false;
>
> - if (WARN_ON(n_layers > EDAC_MAX_LAYERS || n_layers == 0))
> + if (WARN_ON(mc_idx < 0 || n_layers < 1 || n_layers > EDAC_MAX_LAYERS))
> return NULL;
>
> /*
> @@ -505,7 +504,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
> pvt = sz_pvt ? (((char *)mci) + ((unsigned long)pvt)) : NULL;
>
> /* setup index and various internal pointers */
> - mci->mc_idx = mc_num;
> + mci->mc_idx = mc_idx;
> mci->tot_dimms = tot_dimms;
> mci->pvt_info = pvt;
> mci->n_layers = n_layers;
> diff --git a/drivers/edac/edac_mc.h b/drivers/edac/edac_mc.h
> index 881b00eadf7a..4815f50afea0 100644
> --- a/drivers/edac/edac_mc.h
> +++ b/drivers/edac/edac_mc.h
> @@ -98,7 +98,7 @@ do { \
> /**
> * edac_mc_alloc() - Allocate and partially fill a struct &mem_ctl_info.
> *
> - * @mc_num: Memory controller number
> + * @mc_idx: Memory controller number
> * @n_layers: Number of MC hierarchy layers
> * @layers: Describes each layer as seen by the Memory Controller
> * @sz_pvt: size of private storage needed
> @@ -122,8 +122,8 @@ do { \
> * On success, return a pointer to struct mem_ctl_info pointer;
> * %NULL otherwise
> */
> -struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
> - unsigned int n_layers,
> +struct mem_ctl_info *edac_mc_alloc(int mc_idx,
> + int n_layers,
> struct edac_mc_layer *layers,
> unsigned int sz_pvt);
>
>