Subject: [PATCH v7 03/12] EDAC/mce_amd: Extract node id from MCA_IPID

On SMCA banks of the GPU nodes, the node id information is
available in register MCA_IPID[47:44](InstanceIdHi).

Convert the hardware node ID to a value used by Linux
where GPU nodes are sequentially after the CPU nodes.

Co-developed-by: Muralidhara M K <[email protected]>
Signed-off-by: Muralidhara M K <[email protected]>
Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
---
Link:
https://lkml.kernel.org/r/[email protected]

v6->v7
* None

v5->v6:
* Called amd_get_gpu_node_id function to get node_id

v4->v5:
* None

v3->v4:
* Add reviewed by Yazen

v2->v3:
* Use APIs from amd_nb to identify the gpu_node_start_id and cpu_node_count.
Which is required to map the hardware node id to node id enumerated by Linux.

v1->v2:
* Modified subject and commit message
* Added Reviewed by Yazen Ghannam

v0->v1:
* Modified the commit message
* Rearranged the conditions before calling decode_dram_ecc()


drivers/edac/mce_amd.c | 23 +++++++++++++++++++++--
1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index cc5c63feb26a..865a925ccef0 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -2,6 +2,7 @@
#include <linux/module.h>
#include <linux/slab.h>

+#include <asm/amd_nb.h>
#include <asm/cpu.h>

#include "mce_amd.h"
@@ -1186,8 +1187,26 @@ static void decode_smca_error(struct mce *m)
if (xec < smca_mce_descs[bank_type].num_descs)
pr_cont(", %s.\n", smca_mce_descs[bank_type].descs[xec]);

- if (bank_type == SMCA_UMC && xec == 0 && decode_dram_ecc)
- decode_dram_ecc(topology_die_id(m->extcpu), m);
+ if (xec == 0 && decode_dram_ecc) {
+ int node_id = 0;
+
+ if (bank_type == SMCA_UMC) {
+ node_id = topology_die_id(m->extcpu);
+ } else if (bank_type == SMCA_UMC_V2) {
+ /*
+ * SMCA_UMC_V2 exists on GPU nodes, extract the node id
+ * from register MCA_IPID[47:44](InstanceIdHi).
+ * The InstanceIdHi field represents the instance ID of the GPU.
+ * Which needs to be mapped to a value used by Linux,
+ * where GPU nodes are simply numerically after the CPU nodes.
+ */
+ node_id = amd_get_gpu_node_system_id(m->ipid);
+ } else {
+ return;
+ }
+
+ decode_dram_ecc(node_id, m);
+ }
}

static inline void amd_decode_err_code(u16 ec)
--
2.25.1


2022-02-09 23:54:18

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH v7 03/12] EDAC/mce_amd: Extract node id from MCA_IPID

On Thu, Feb 03, 2022 at 11:49:33AM -0600, Naveen Krishna Chatradhi wrote:
> On SMCA banks of the GPU nodes, the node id information is
> available in register MCA_IPID[47:44](InstanceIdHi).
>
> Convert the hardware node ID to a value used by Linux
> where GPU nodes are sequentially after the CPU nodes.
>

Terminology should be consistent. I see "node id" and "node ID" here.

...

> + } else if (bank_type == SMCA_UMC_V2) {
> + /*
> + * SMCA_UMC_V2 exists on GPU nodes, extract the node id
> + * from register MCA_IPID[47:44](InstanceIdHi).
> + * The InstanceIdHi field represents the instance ID of the GPU.
> + * Which needs to be mapped to a value used by Linux,
> + * where GPU nodes are simply numerically after the CPU nodes.
> + */
> + node_id = amd_get_gpu_node_system_id(m->ipid);

As mentioned for the previous patch, why not define this function in EDAC?

Thanks,
Yazen

Subject: Re: [PATCH v7 03/12] EDAC/mce_amd: Extract node id from MCA_IPID

Hi Yazen

On 2/10/2022 5:01 AM, Yazen Ghannam wrote:
> On Thu, Feb 03, 2022 at 11:49:33AM -0600, Naveen Krishna Chatradhi wrote:
>> On SMCA banks of the GPU nodes, the node id information is
>> available in register MCA_IPID[47:44](InstanceIdHi).
>>
>> Convert the hardware node ID to a value used by Linux
>> where GPU nodes are sequentially after the CPU nodes.
>>
> Terminology should be consistent. I see "node id" and "node ID" here.
Will keep it consistent.
>
> ...
>
>> + } else if (bank_type == SMCA_UMC_V2) {
>> + /*
>> + * SMCA_UMC_V2 exists on GPU nodes, extract the node id
>> + * from register MCA_IPID[47:44](InstanceIdHi).
>> + * The InstanceIdHi field represents the instance ID of the GPU.
>> + * Which needs to be mapped to a value used by Linux,
>> + * where GPU nodes are simply numerically after the CPU nodes.
>> + */
>> + node_id = amd_get_gpu_node_system_id(m->ipid);
> As mentioned for the previous patch, why not define this function in EDAC?

Sure, with recent changes we can move this function to edac. Will wait
for comments on other patches

in the series and submit next version with feedback addressed.

Regards,

Naveenk

>
> Thanks,
> Yazen