2023-12-06 22:42:49

by Bjorn Helgaas

[permalink] [raw]
Subject: [PATCH 1/3] PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors

From: Bjorn Helgaas <[email protected]>

The PCIe spec classifies errors as either "Correctable" or "Uncorrectable".
Previously we printed these as "Corrected" or "Uncorrected". To avoid
confusion, use the same terms as the spec.

One confusing situation is when one agent detects an error, but another
agent is responsible for recovery, e.g., by re-attempting the operation.
The first agent may log a "correctable" error but it has not yet been
corrected. The recovery agent must report an uncorrectable error if it is
unable to recover. If we print the first agent's error as "Corrected", it
gives the false impression that it has already been resolved.

Sample message change:

- pcieport 0000:00:1c.5: AER: Corrected error received: 0000:00:1c.5
+ pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5

Signed-off-by: Bjorn Helgaas <[email protected]>
---
drivers/pci/pcie/aer.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 42a3bd35a3e1..20db80018b5d 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -436,9 +436,9 @@ void pci_aer_exit(struct pci_dev *dev)
* AER error strings
*/
static const char *aer_error_severity_string[] = {
- "Uncorrected (Non-Fatal)",
- "Uncorrected (Fatal)",
- "Corrected"
+ "Uncorrectable (Non-Fatal)",
+ "Uncorrectable (Fatal)",
+ "Correctable"
};

static const char *aer_error_layer[] = {
--
2.34.1


2023-12-12 15:00:57

by Terry Bowman

[permalink] [raw]
Subject: Re: [PATCH 1/3] PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors

Hi Bjorn,

Will help prevent confusion. LGTM.

On 12/6/23 16:42, Bjorn Helgaas wrote:
> From: Bjorn Helgaas <[email protected]>
>
> The PCIe spec classifies errors as either "Correctable" or "Uncorrectable".
> Previously we printed these as "Corrected" or "Uncorrected". To avoid
> confusion, use the same terms as the spec.
>
> One confusing situation is when one agent detects an error, but another
> agent is responsible for recovery, e.g., by re-attempting the operation.
> The first agent may log a "correctable" error but it has not yet been
> corrected. The recovery agent must report an uncorrectable error if it is
> unable to recover. If we print the first agent's error as "Corrected", it
> gives the false impression that it has already been resolved.
>
> Sample message change:
>
> - pcieport 0000:00:1c.5: AER: Corrected error received: 0000:00:1c.5
> + pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5
>
> Signed-off-by: Bjorn Helgaas <[email protected]>
> ---
> drivers/pci/pcie/aer.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 42a3bd35a3e1..20db80018b5d 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -436,9 +436,9 @@ void pci_aer_exit(struct pci_dev *dev)
> * AER error strings
> */
> static const char *aer_error_severity_string[] = {
> - "Uncorrected (Non-Fatal)",
> - "Uncorrected (Fatal)",
> - "Corrected"
> + "Uncorrectable (Non-Fatal)",
> + "Uncorrectable (Fatal)",
> + "Correctable"
> };
>
> static const char *aer_error_layer[] = {

2023-12-12 21:23:35

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 1/3] PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors

On Tue, Dec 12, 2023 at 09:00:24AM -0600, Terry Bowman wrote:
> Hi Bjorn,
>
> Will help prevent confusion. LGTM.

Thanks a lot for taking a look at these! I'd like to give you credit
in the log, e.g., "Reviewed-by: Terry Bowman <[email protected]>",
but I'm OCD enough that I don't want to translate "LGTM" into that all
by myself.

If you want that credit (and, I guess, the privilege of being cc'd
when we find that these patches break something :)), just reply again
with that actual "Reviewed-by:" text in it.

Bjorn

Subject: Re: [PATCH 1/3] PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors



On 12/6/2023 2:42 PM, Bjorn Helgaas wrote:
> From: Bjorn Helgaas <[email protected]>
>
> The PCIe spec classifies errors as either "Correctable" or "Uncorrectable".
> Previously we printed these as "Corrected" or "Uncorrected". To avoid
> confusion, use the same terms as the spec.
>
> One confusing situation is when one agent detects an error, but another
> agent is responsible for recovery, e.g., by re-attempting the operation.
> The first agent may log a "correctable" error but it has not yet been
> corrected. The recovery agent must report an uncorrectable error if it is
> unable to recover. If we print the first agent's error as "Corrected", it
> gives the false impression that it has already been resolved.
>
> Sample message change:
>
> - pcieport 0000:00:1c.5: AER: Corrected error received: 0000:00:1c.5
> + pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5
>
> Signed-off-by: Bjorn Helgaas <[email protected]>
> ---

Looks good to me.

Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>

> drivers/pci/pcie/aer.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 42a3bd35a3e1..20db80018b5d 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -436,9 +436,9 @@ void pci_aer_exit(struct pci_dev *dev)
> * AER error strings
> */
> static const char *aer_error_severity_string[] = {
> - "Uncorrected (Non-Fatal)",
> - "Uncorrected (Fatal)",
> - "Corrected"
> + "Uncorrectable (Non-Fatal)",
> + "Uncorrectable (Fatal)",
> + "Correctable"
> };
>
> static const char *aer_error_layer[] = {

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer