2023-12-06 22:42:52

by Bjorn Helgaas

[permalink] [raw]
Subject: [PATCH 0/3] PCI/AER: Clean up logging

From: Bjorn Helgaas <[email protected]>

Clean up some minor AER logging issues:

- Log as "Correctable errors", not "Corrected errors"

- Decode the Requester ID when we couldn't find detail error info

Bjorn Helgaas (3):
PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
PCI/AER: Decode Requester ID when no error info found
PCI/AER: Use explicit register sizes for struct members

drivers/pci/pcie/aer.c | 19 ++++++++++++-------
include/linux/aer.h | 8 ++++----
2 files changed, 16 insertions(+), 11 deletions(-)

--
2.34.1


2023-12-06 22:42:57

by Bjorn Helgaas

[permalink] [raw]
Subject: [PATCH 3/3] PCI/AER: Use explicit register sizes for struct members

From: Bjorn Helgaas <[email protected]>

aer_irq() reads the AER Root Error Status and Error Source Identification
(PCI_ERR_ROOT_STATUS and PCI_ERR_ROOT_ERR_SRC) registers directly into
struct aer_err_source. Both registers are 32 bits, so declare the members
explicitly as "u32" instead of "unsigned int".

Similarly, aer_get_device_error_info() reads the AER Header Log
(PCI_ERR_HEADER_LOG) registers, which are also 32 bits, into struct
aer_header_log_regs. Declare those members as "u32" as well.

No functional changes intended.

Signed-off-by: Bjorn Helgaas <[email protected]>
---
drivers/pci/pcie/aer.c | 4 ++--
include/linux/aer.h | 8 ++++----
2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 2ff6bac9979f..60f84414ec2a 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -41,8 +41,8 @@
#define AER_MAX_TYPEOF_UNCOR_ERRS 27 /* as per PCI_ERR_UNCOR_STATUS*/

struct aer_err_source {
- unsigned int status;
- unsigned int id;
+ u32 status; /* PCI_ERR_ROOT_STATUS */
+ u32 id; /* PCI_ERR_ROOT_ERR_SRC */
};

struct aer_rpc {
diff --git a/include/linux/aer.h b/include/linux/aer.h
index f6ea2f57d808..ae0fae70d4bd 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -19,10 +19,10 @@
struct pci_dev;

struct aer_header_log_regs {
- unsigned int dw0;
- unsigned int dw1;
- unsigned int dw2;
- unsigned int dw3;
+ u32 dw0;
+ u32 dw1;
+ u32 dw2;
+ u32 dw3;
};

struct aer_capability_regs {
--
2.34.1

2023-12-06 22:43:00

by Bjorn Helgaas

[permalink] [raw]
Subject: [PATCH 2/3] PCI/AER: Decode Requester ID when no error info found

From: Bjorn Helgaas <[email protected]>

When a device with AER detects an error, it logs error information in its
own AER Error Status registers. It may send an Error Message to the Root
Port (RCEC in the case of an RCiEP), which logs the fact that an Error
Message was received (Root Error Status) and the Requester ID of the
message source (Error Source Identification).

aer_print_port_info() prints the Requester ID from the Root Port Error
Source in the usual Linux "bb:dd.f" format, but when find_source_device()
finds no error details in the hierarchy below the Root Port, it printed the
raw Requester ID without decoding it.

Decode the Requester ID in the usual Linux format so it matches other
messages.

Sample message changes:

- pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5
- pcieport 0000:00:1c.5: AER: can't find device of ID00e5
+ pcieport 0000:00:1c.5: AER: Correctable error message received from 0000:00:1c.5
+ pcieport 0000:00:1c.5: AER: found no error details for 0000:00:1c.5

Signed-off-by: Bjorn Helgaas <[email protected]>
---
drivers/pci/pcie/aer.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 20db80018b5d..2ff6bac9979f 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -740,7 +740,7 @@ static void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info)
u8 bus = info->id >> 8;
u8 devfn = info->id & 0xff;

- pci_info(dev, "%s%s error received: %04x:%02x:%02x.%d\n",
+ pci_info(dev, "%s%s error message received from %04x:%02x:%02x.%d\n",
info->multi_error_valid ? "Multiple " : "",
aer_error_severity_string[info->severity],
pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn),
@@ -929,7 +929,12 @@ static bool find_source_device(struct pci_dev *parent,
pci_walk_bus(parent->subordinate, find_device_iter, e_info);

if (!e_info->error_dev_num) {
- pci_info(parent, "can't find device of ID%04x\n", e_info->id);
+ u8 bus = e_info->id >> 8;
+ u8 devfn = e_info->id & 0xff;
+
+ pci_info(parent, "found no error details for %04x:%02x:%02x.%d\n",
+ pci_domain_nr(parent->bus), bus, PCI_SLOT(devfn),
+ PCI_FUNC(devfn));
return false;
}
return true;
--
2.34.1

2023-12-08 16:54:17

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 0/3] PCI/AER: Clean up logging

[+cc Jonathan]

On Wed, Dec 06, 2023 at 04:42:28PM -0600, Bjorn Helgaas wrote:
> From: Bjorn Helgaas <[email protected]>
>
> Clean up some minor AER logging issues:
>
> - Log as "Correctable errors", not "Corrected errors"
>
> - Decode the Requester ID when we couldn't find detail error info
>
> Bjorn Helgaas (3):
> PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
> PCI/AER: Decode Requester ID when no error info found
> PCI/AER: Use explicit register sizes for struct members
>
> drivers/pci/pcie/aer.c | 19 ++++++++++++-------
> include/linux/aer.h | 8 ++++----
> 2 files changed, 16 insertions(+), 11 deletions(-)

Applied to pci/aer for v6.8. Thanks, Jonathan, for your time in
taking a look.

2023-12-12 15:01:18

by Terry Bowman

[permalink] [raw]
Subject: Re: [PATCH 2/3] PCI/AER: Decode Requester ID when no error info found

LGTM

On 12/6/23 16:42, Bjorn Helgaas wrote:
> From: Bjorn Helgaas <[email protected]>
>
> When a device with AER detects an error, it logs error information in its
> own AER Error Status registers. It may send an Error Message to the Root
> Port (RCEC in the case of an RCiEP), which logs the fact that an Error
> Message was received (Root Error Status) and the Requester ID of the
> message source (Error Source Identification).
>
> aer_print_port_info() prints the Requester ID from the Root Port Error
> Source in the usual Linux "bb:dd.f" format, but when find_source_device()
> finds no error details in the hierarchy below the Root Port, it printed the
> raw Requester ID without decoding it.
>
> Decode the Requester ID in the usual Linux format so it matches other
> messages.
>
> Sample message changes:
>
> - pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5
> - pcieport 0000:00:1c.5: AER: can't find device of ID00e5
> + pcieport 0000:00:1c.5: AER: Correctable error message received from 0000:00:1c.5
> + pcieport 0000:00:1c.5: AER: found no error details for 0000:00:1c.5
>
> Signed-off-by: Bjorn Helgaas <[email protected]>
> ---
> drivers/pci/pcie/aer.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 20db80018b5d..2ff6bac9979f 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -740,7 +740,7 @@ static void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info)
> u8 bus = info->id >> 8;
> u8 devfn = info->id & 0xff;
>
> - pci_info(dev, "%s%s error received: %04x:%02x:%02x.%d\n",
> + pci_info(dev, "%s%s error message received from %04x:%02x:%02x.%d\n",
> info->multi_error_valid ? "Multiple " : "",
> aer_error_severity_string[info->severity],
> pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn),
> @@ -929,7 +929,12 @@ static bool find_source_device(struct pci_dev *parent,
> pci_walk_bus(parent->subordinate, find_device_iter, e_info);
>
> if (!e_info->error_dev_num) {
> - pci_info(parent, "can't find device of ID%04x\n", e_info->id);
> + u8 bus = e_info->id >> 8;
> + u8 devfn = e_info->id & 0xff;
> +
> + pci_info(parent, "found no error details for %04x:%02x:%02x.%d\n",
> + pci_domain_nr(parent->bus), bus, PCI_SLOT(devfn),
> + PCI_FUNC(devfn));
> return false;
> }
> return true;

Subject: Re: [PATCH 2/3] PCI/AER: Decode Requester ID when no error info found



On 12/6/2023 2:42 PM, Bjorn Helgaas wrote:
> From: Bjorn Helgaas <[email protected]>
>
> When a device with AER detects an error, it logs error information in its
> own AER Error Status registers. It may send an Error Message to the Root
> Port (RCEC in the case of an RCiEP), which logs the fact that an Error
> Message was received (Root Error Status) and the Requester ID of the
> message source (Error Source Identification).
>
> aer_print_port_info() prints the Requester ID from the Root Port Error
> Source in the usual Linux "bb:dd.f" format, but when find_source_device()
> finds no error details in the hierarchy below the Root Port, it printed the
> raw Requester ID without decoding it.
>
> Decode the Requester ID in the usual Linux format so it matches other
> messages.
>
> Sample message changes:
>
> - pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5
> - pcieport 0000:00:1c.5: AER: can't find device of ID00e5
> + pcieport 0000:00:1c.5: AER: Correctable error message received from 0000:00:1c.5
> + pcieport 0000:00:1c.5: AER: found no error details for 0000:00:1c.5
>
> Signed-off-by: Bjorn Helgaas <[email protected]>

Except for the suggestion given below, it looks good to me.

Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>

> ---
> drivers/pci/pcie/aer.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 20db80018b5d..2ff6bac9979f 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -740,7 +740,7 @@ static void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info)
> u8 bus = info->id >> 8;
> u8 devfn = info->id & 0xff;
>
> - pci_info(dev, "%s%s error received: %04x:%02x:%02x.%d\n",
> + pci_info(dev, "%s%s error message received from %04x:%02x:%02x.%d\n",
> info->multi_error_valid ? "Multiple " : "",
> aer_error_severity_string[info->severity],
> pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn),
> @@ -929,7 +929,12 @@ static bool find_source_device(struct pci_dev *parent,
> pci_walk_bus(parent->subordinate, find_device_iter, e_info);
>
> if (!e_info->error_dev_num) {
> - pci_info(parent, "can't find device of ID%04x\n", e_info->id);
> + u8 bus = e_info->id >> 8;
> + u8 devfn = e_info->id & 0xff;

You can use PCI_BUS_NUM(e_info->id) for getting bus number. Since you are
extracting this info in more than one place, maybe you can also define a
macro PCI_DEVFN(id) (following PCI_BUS_NUM()).

> +
> + pci_info(parent, "found no error details for %04x:%02x:%02x.%d\n",
> + pci_domain_nr(parent->bus), bus, PCI_SLOT(devfn),
> + PCI_FUNC(devfn));
> return false;
> }
> return true;

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

2024-01-02 22:53:39

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 2/3] PCI/AER: Decode Requester ID when no error info found

On Tue, Jan 02, 2024 at 11:22:53AM -0800, Kuppuswamy Sathyanarayanan wrote:
> On 12/6/2023 2:42 PM, Bjorn Helgaas wrote:
> > From: Bjorn Helgaas <[email protected]>
> >
> > When a device with AER detects an error, it logs error information in its
> > own AER Error Status registers. It may send an Error Message to the Root
> > Port (RCEC in the case of an RCiEP), which logs the fact that an Error
> > Message was received (Root Error Status) and the Requester ID of the
> > message source (Error Source Identification).
> >
> > aer_print_port_info() prints the Requester ID from the Root Port Error
> > Source in the usual Linux "bb:dd.f" format, but when find_source_device()
> > finds no error details in the hierarchy below the Root Port, it printed the
> > raw Requester ID without decoding it.
> >
> > Decode the Requester ID in the usual Linux format so it matches other
> > messages.
> >
> > Sample message changes:
> >
> > - pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5
> > - pcieport 0000:00:1c.5: AER: can't find device of ID00e5
> > + pcieport 0000:00:1c.5: AER: Correctable error message received from 0000:00:1c.5
> > + pcieport 0000:00:1c.5: AER: found no error details for 0000:00:1c.5
> >
> > Signed-off-by: Bjorn Helgaas <[email protected]>
>
> Except for the suggestion given below, it looks good to me.
>
> Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>

Thanks for taking a look!

> > @@ -740,7 +740,7 @@ static void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info)
> > u8 bus = info->id >> 8;
> > u8 devfn = info->id & 0xff;
> >
> > - pci_info(dev, "%s%s error received: %04x:%02x:%02x.%d\n",
> > + pci_info(dev, "%s%s error message received from %04x:%02x:%02x.%d\n",
> > info->multi_error_valid ? "Multiple " : "",
> > aer_error_severity_string[info->severity],
> > pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn),
> > @@ -929,7 +929,12 @@ static bool find_source_device(struct pci_dev *parent,
> > pci_walk_bus(parent->subordinate, find_device_iter, e_info);
> >
> > if (!e_info->error_dev_num) {
> > - pci_info(parent, "can't find device of ID%04x\n", e_info->id);
> > + u8 bus = e_info->id >> 8;
> > + u8 devfn = e_info->id & 0xff;
>
> You can use PCI_BUS_NUM(e_info->id) for getting bus number. Since
> you are extracting this info in more than one place, maybe you can
> also define a macro PCI_DEVFN(id) (following PCI_BUS_NUM()).

Thanks, both good ideas.

We already have a PCI_DEVFN() that *combines* slot + func into devfn,
so we'd have to come up with a different name.

I'll add a patch to use PCI_BUS_NUM() in the two places here and in
pme.c.

I think I'll wait with these until after the v6.7 release.

> > + pci_info(parent, "found no error details for %04x:%02x:%02x.%d\n",
> > + pci_domain_nr(parent->bus), bus, PCI_SLOT(devfn),
> > + PCI_FUNC(devfn));
> > return false;
> > }
> > return true;
>
> --
> Sathyanarayanan Kuppuswamy
> Linux Kernel Developer