2023-07-25 10:00:30

by Qiuxu Zhuo

[permalink] [raw]
Subject: [PATCH 1/1] EDAC/igen6: Fix the issue of no error events

Current igen6_edac checks for pending errors before the registration
of the error handler. However, there is a possibility that the error
occurs during the registration process, leading to unhandled pending
errors and no future error events. This issue can be reproduced by
repeatedly injecting errors during the loading of the igen6_edac.

Fix this issue by moving the pending error handler after the registration
of the error handler, ensuring that no pending errors are left unhandled.

Fixes: 10590a9d4f23 ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC")
Reported-by: Ee Wey Lim <[email protected]>
Tested-by: Ee Wey Lim <[email protected]>
Signed-off-by: Qiuxu Zhuo <[email protected]>
---
drivers/edac/igen6_edac.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c
index 544dd19072ea..1a18693294db 100644
--- a/drivers/edac/igen6_edac.c
+++ b/drivers/edac/igen6_edac.c
@@ -27,7 +27,7 @@
#include "edac_mc.h"
#include "edac_module.h"

-#define IGEN6_REVISION "v2.5"
+#define IGEN6_REVISION "v2.5.1"

#define EDAC_MOD_STR "igen6_edac"
#define IGEN6_NMI_NAME "igen6_ibecc"
@@ -1216,9 +1216,6 @@ static int igen6_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
INIT_WORK(&ecclog_work, ecclog_work_cb);
init_irq_work(&ecclog_irq_work, ecclog_irq_work_cb);

- /* Check if any pending errors before registering the NMI handler */
- ecclog_handler();
-
rc = register_err_handler();
if (rc)
goto fail3;
@@ -1230,6 +1227,9 @@ static int igen6_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
goto fail4;
}

+ /* Check if any pending errors before/during the registration of the error handler */
+ ecclog_handler();
+
igen6_debug_setup();
return 0;
fail4:
--
2.17.1



2023-07-25 16:20:06

by Tony Luck

[permalink] [raw]
Subject: RE: [PATCH 1/1] EDAC/igen6: Fix the issue of no error events

> Fix this issue by moving the pending error handler after the registration
> of the error handler, ensuring that no pending errors are left unhandled.

Do you think drivers/edac/e7xxx_edac.c has the same issue?

491
492 /* clear any pending errors, or initial state bits */
493 e7xxx_get_error_info(mci, &discard);
494
495 /* Here we assume that we will never see multiple instances of this
496 * type of memory controller. The ID is therefore hardcoded to 0.
497 */
498 if (edac_mc_add_mc(mci)) {
499 edac_dbg(3, "failed edac_mc_add_mc()\n");
500 goto fail1;
501 }
502
503 /* allocating generic PCI control info */
504 e7xxx_pci = edac_pci_create_generic_ctl(&pdev->dev, EDAC_MOD_STR);

Though it might be hard to find such an old system to test.

-Tony


2023-07-26 03:36:21

by Qiuxu Zhuo

[permalink] [raw]
Subject: RE: [PATCH 1/1] EDAC/igen6: Fix the issue of no error events

> From: Luck, Tony <[email protected]>
> ...
> Subject: RE: [PATCH 1/1] EDAC/igen6: Fix the issue of no error events
>
> > Fix this issue by moving the pending error handler after the
> > registration of the error handler, ensuring that no pending errors are left
> unhandled.
>
> Do you think drivers/edac/e7xxx_edac.c has the same issue?

Hi Tony,

Based on the code [1], the e7xxx_edac works in polling mode and the pending
errors can be handled within one period after the error handler registration.

So, I don't think the e7xxx_edac has the same issue.

[1] e7xxx_probe1()-> mci->edac_check = e7xxx_check;
-> edac_mc_add_mc() -> if (mci->edac_check) { mci->op_state = OP_RUNNING_POLL; ... }

> 491
> 492 /* clear any pending errors, or initial state bits */
> 493 e7xxx_get_error_info(mci, &discard);

This function is also invoked periodically in polling mode:
mci->edac_check() -> e7xxx_check() -> e7xxx_get_error_info()

> 494
> 495 /* Here we assume that we will never see multiple instances of this
> 496 * type of memory controller. The ID is therefore hardcoded to 0.
> 497 */
> 498 if (edac_mc_add_mc(mci)) {
> 499 edac_dbg(3, "failed edac_mc_add_mc()\n");
> 500 goto fail1;
> 501 }
> 502
> 503 /* allocating generic PCI control info */
> 504 e7xxx_pci = edac_pci_create_generic_ctl(&pdev->dev,
> EDAC_MOD_STR);
>
> Though it might be hard to find such an old system to test.
>
> -Tony


2023-08-02 20:48:27

by Tony Luck

[permalink] [raw]
Subject: RE: [PATCH 1/1] EDAC/igen6: Fix the issue of no error events

> Current igen6_edac checks for pending errors before the registration
> of the error handler. However, there is a possibility that the error
> occurs during the registration process, leading to unhandled pending
> errors and no future error events. This issue can be reproduced by
> repeatedly injecting errors during the loading of the igen6_edac.
>
> Fix this issue by moving the pending error handler after the registration
> of the error handler, ensuring that no pending errors are left unhandled.
>
> Fixes: 10590a9d4f23 ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC")
> Reported-by: Ee Wey Lim <[email protected]>
> Tested-by: Ee Wey Lim <[email protected]>
> Signed-off-by: Qiuxu Zhuo <[email protected]>

Applied to RAS tree for next merge window.

Thanks

-Tony