2009-10-09 05:56:09

by Huang, Ying

[permalink] [raw]
Subject: Re: Fwd: [PATCH] PCIe AER: honor ACPI HEST FIRMWARE FIRST mode

Hi, Matt,

Thanks for your work.

Matt Domsch wrote:
> For review and comment.
>
> Today, the PCIe Advanced Error Reporting (AER) driver attaches itself
> to every PCIe root port for which BIOS reports it should, via ACPI
> _OSC.
>
> However, _OSC alone is insufficient for newer BIOSes. Part of ACPI
> 4.0 is the new Platform Environment Control Interface (PECI), which is

I can not find Platform Environment Control Interface in ACPI 4.0. There
is something about that here:
http://en.wikipedia.org/wiki/Platform_Environment_Control_Interface. But
it seems have nothing to do with OS/BIOS interface.

Can you tell me where can I find more about PECI? Or you mean APEI (ACPI
Platform Error Interfaces)?

We are working on APEI supporting now too, mainly on the general part.
We will release the code after it passes our internal testing.

> a way for OS and BIOS to handshake over which errors for which
> components each will handle. One table in ACPI 4.0 is the Hardware
> Error Source Table (HEST), where BIOS can define that errors for
> certain PCIe devices (or all devices), should be handled by BIOS
> ("Firmware First mode"), rather than be handled by the OS.
>
> Dell PowerEdge 11G server BIOS defines Firmware First mode in HEST, so
> that it may manage such errors, log them to the System Event Log, and
> possibly take other actions. The aer driver should honor this, and
> not attach itself to devices noted as such.
>
>
> Signed-off-by: Matt Domsch <[email protected]>
>
> --
> Matt Domsch
> Technology Strategist, Dell Office of the CTO
> linux.dell.com & http://www.dell.com/linux
>
>
> ---
> drivers/pci/pcie/aer/aerdrv.h | 4 +-
> drivers/pci/pcie/aer/aerdrv_acpi.c | 106 +++++++++++++++++++++++++++++++++++-
> drivers/pci/pcie/aer/aerdrv_core.c | 2 +-
> include/acpi/actbl1.h | 8 ++-
> 4 files changed, 112 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
> index bbd7428..2e00a22 100644
> --- a/drivers/pci/pcie/aer/aerdrv.h
> +++ b/drivers/pci/pcie/aer/aerdrv.h
> @@ -128,9 +128,9 @@ extern void aer_print_error(struct pci_dev *dev,
> struct aer_err_info *info);
> extern irqreturn_t aer_irq(int irq, void *context);
>
> #ifdef CONFIG_ACPI
> -extern int aer_osc_setup(struct pcie_device *pciedev);
> +extern int aer_osc_setup(struct pcie_device *pciedev, int forceload);
> #else
> -static inline int aer_osc_setup(struct pcie_device *pciedev)
> +static inline int aer_osc_setup(struct pcie_device *pciedev, int forceload)
> {
> return 0;
> }
> diff --git a/drivers/pci/pcie/aer/aerdrv_acpi.c
> b/drivers/pci/pcie/aer/aerdrv_acpi.c
> index 8edb2f3..10bd83c 100644
> --- a/drivers/pci/pcie/aer/aerdrv_acpi.c
> +++ b/drivers/pci/pcie/aer/aerdrv_acpi.c
> @@ -18,20 +18,112 @@
> #include <linux/delay.h>
> #include "aerdrv.h"
>
> +static unsigned long parse_aer_hest_xpf_machine_check(struct
> acpi_hest_xpf_machine_check *p)
> +{
> + return sizeof(*p) +
> + (sizeof(struct acpi_hest_xpf_error_bank) *
> p->num_hardware_banks);
> +}
> +
> +static unsigned long
> parse_aer_hest_xpf_corrected_machine_check(struct
> acpi_table_hest_xpf_corrected *p)
> +{
> + return sizeof(*p) +
> + (sizeof(struct acpi_hest_xpf_error_bank) *
> p->num_hardware_banks);
> +}
> +
> +static unsigned long parse_aer_hest_xpf_nmi(struct acpi_hest_xpf_nmi *p)
> +{
> + return sizeof(*p);
> +}
> +
> +static unsigned long parse_hest_generic(struct acpi_hest_generic *p)
> +{
> + return sizeof(*p);
> +}
> +
> +static unsigned long parse_hest_aer(void *hdr, int type, struct
> pcie_device *pciedev, int *firmware_first)
> +{
> + struct acpi_hest_aer_common *p = hdr + sizeof(struct acpi_hest_header);
> + unsigned long rc=0;
> + switch (type) {
> + case ACPI_HEST_TYPE_AER_ROOT_PORT:
> + rc = sizeof(struct acpi_hest_aer_root);
> + break;
> + case ACPI_HEST_TYPE_AER_ENDPOINT:
> + rc = sizeof(struct acpi_hest_aer);
> + break;
> + case ACPI_HEST_TYPE_AER_BRIDGE:
> + rc = sizeof(struct acpi_hest_aer_bridge);
> + break;
> + }
> +
> + if (p->flags & ACPI_HEST_AER_FIRMWARE_FIRST &&
> + (p->flags & ACPI_HEST_AER_GLOBAL ||
> + (p->bus == pciedev->port->bus->number &&
> + p->device == PCI_SLOT(pciedev->port->devfn) &&
> + p->function == PCI_FUNC(pciedev->port->devfn))))
> + *firmware_first = 1;
> + return rc;
> +}
> +
> +static int aer_hest_firmware_first(struct acpi_table_header
> *stdheader, struct pcie_device *pciedev)
> +{
> + struct acpi_table_hest *hest = (struct acpi_table_hest *)stdheader;
> + void *p = (void *)hest + sizeof(*hest); /* defined by the ACPI
> 4.0 spec */
> + struct acpi_hest_header *hdr = p;
> +
> + int i;
> + int firmware_first = 0;
> +
> + for (i=0, hdr=p; p < (((void *)hest) + hest->header.length) &&
> i < hest->error_source_count; i++) {
> + switch (hdr->type) {
> + case ACPI_HEST_TYPE_XPF_MACHINE_CHECK:
> + p += parse_aer_hest_xpf_machine_check(p);
> + break;
> + case ACPI_HEST_TYPE_XPF_CORRECTED_MACHINE_CHECK:
> + p += parse_aer_hest_xpf_corrected_machine_check(p);
> + break;
> + case ACPI_HEST_TYPE_XPF_NON_MASKABLE_INTERRUPT:
> + p += parse_aer_hest_xpf_nmi(p);
> + break;
> + /* These three should never appear */
> + case ACPI_HEST_TYPE_XPF_UNUSED:
> + case ACPI_HEST_TYPE_IPF_CORRECTED_MACHINE_CHECK:
> + case ACPI_HEST_TYPE_IPF_CORRECTED_PLATFORM_ERROR:
> + break;
> + case ACPI_HEST_TYPE_AER_ROOT_PORT:
> + case ACPI_HEST_TYPE_AER_ENDPOINT:
> + case ACPI_HEST_TYPE_AER_BRIDGE:
> + p += parse_hest_aer(p, hdr->type, pciedev,
> &firmware_first);
> + break;
> + case ACPI_HEST_TYPE_GENERIC_HARDWARE_ERROR_SOURCE:
> + p += parse_hest_generic(p);
> + break;
> + /* These should never appear either */
> + case ACPI_HEST_TYPE_RESERVED:
> + default:
> + break;
> + }
> + }
> + return firmware_first;
> +}

As H.Seto said, HEST table parsing code should go the general APEI
supporting code. We have some HEST table parsing code, hope that can be
used by your code too.

Best Regards,
Huang Ying


2009-10-09 14:32:13

by Matt Domsch

[permalink] [raw]
Subject: Re: Fwd: [PATCH] PCIe AER: honor ACPI HEST FIRMWARE FIRST mode

On Fri, Oct 09, 2009 at 01:55:27PM +0800, Huang Ying wrote:
> Can you tell me where can I find more about PECI? Or you mean APEI (ACPI
> Platform Error Interfaces)?

Yes, APEI. My mistake. Too many acronyms for my little mind.

> We are working on APEI supporting now too, mainly on the general part.
> We will release the code after it passes our internal testing.

Excellent. If you could post even the untested code, that would be
helpful. Do you have a timeline for publication?

My patch is fundamentally in response to the fact that your APEI code
is not present yet, and there is a problem seen at customer sites now
(particularly SLES 11, as that's the only "Enterprise"-class distro
version with the AER driver, but also any distro or kernel build that
includes the AER driver). But without knowing when the rest of the
APEI code will land in mainline, I feel it would be safe to do the
minimum amount of HEST parsing, just enough to know if AER should be
disabled or not. My patch can be considered "throw-away" code - to be
dropped when your APEI code lands in mainline and the distros.

> As H.Seto said, HEST table parsing code should go the general APEI
> supporting code. We have some HEST table parsing code, hope that can be
> used by your code too.

I agree completely. But I need a solution in the short-term, both for
mainline and which can be backported, until your code is available.

I'm reworking my patch based on Seto-san's comments, and will repost
soon as linux-next stops crashing on me. :-)

Thanks,
Matt

--
Matt Domsch
Technology Strategist, Dell Office of the CTO
linux.dell.com & http://www.dell.com/linux