2017-03-28 20:14:41

by Tyler Baicar

[permalink] [raw]
Subject: [PATCH] acpi: apei: check for pending errors when probing HED type GHES entries

If a HED type error occurs prior to GHES probing, the kernel will
never report the error. The HED driver will see that no notifiers
are registers, and clear the interrupt.

This becomes a more serious problem with firmware that supports
GHESv2 acknowledgements from the kernel. The firmware will populate
the error and wait for the kernel ack. But since the kernel will
never process the error we get into a state that the firmware will
not send any more errors and the kernel will never see or ack the
original error.

Check for pending errors when probing HED type GHES entries to
avoid the above situation.

This patch is based on Shiju's patch that adds support for GSIV
and GPIO notification types:
https://patchwork.kernel.org/patch/9628817/

Signed-off-by: Tyler Baicar <[email protected]>
---
drivers/acpi/apei/ghes.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index fd39929..cf5e938 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -1035,6 +1035,7 @@ static int ghes_probe(struct platform_device *ghes_dev)
register_acpi_hed_notifier(&ghes_notifier_hed);
list_add_rcu(&ghes->list, &ghes_hed);
mutex_unlock(&ghes_list_mutex);
+ ghes_proc(ghes);
break;
case ACPI_HEST_NOTIFY_NMI:
ghes_nmi_add(ghes);
--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.


2017-03-28 22:00:47

by Al Stone

[permalink] [raw]
Subject: Re: [PATCH] acpi: apei: check for pending errors when probing HED type GHES entries

On 03/28/2017 02:14 PM, Tyler Baicar wrote:
> If a HED type error occurs prior to GHES probing, the kernel will
> never report the error. The HED driver will see that no notifiers
> are registers, and clear the interrupt.

..."registers" or "registered"?

> This becomes a more serious problem with firmware that supports
> GHESv2 acknowledgements from the kernel. The firmware will populate
> the error and wait for the kernel ack. But since the kernel will
> never process the error we get into a state that the firmware will
> not send any more errors and the kernel will never see or ack the
> original error.
>
> Check for pending errors when probing HED type GHES entries to
> avoid the above situation.
>
> This patch is based on Shiju's patch that adds support for GSIV
> and GPIO notification types:
> https://patchwork.kernel.org/patch/9628817/
>
> Signed-off-by: Tyler Baicar <[email protected]>
> ---
> drivers/acpi/apei/ghes.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index fd39929..cf5e938 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -1035,6 +1035,7 @@ static int ghes_probe(struct platform_device *ghes_dev)
> register_acpi_hed_notifier(&ghes_notifier_hed);
> list_add_rcu(&ghes->list, &ghes_hed);
> mutex_unlock(&ghes_list_mutex);
> + ghes_proc(ghes);
> break;
> case ACPI_HEST_NOTIFY_NMI:
> ghes_nmi_add(ghes);
>


--
ciao,
al
-----------------------------------
Al Stone
Software Engineer
Red Hat, Inc.
[email protected]
-----------------------------------

2017-03-28 22:02:45

by Tyler Baicar

[permalink] [raw]
Subject: Re: [PATCH] acpi: apei: check for pending errors when probing HED type GHES entries

On 3/28/2017 4:00 PM, Al Stone wrote:
> On 03/28/2017 02:14 PM, Tyler Baicar wrote:
>> If a HED type error occurs prior to GHES probing, the kernel will
>> never report the error. The HED driver will see that no notifiers
>> are registers, and clear the interrupt.
> ..."registers" or "registered"?
Oops :) I'll fix that to say registered.

Thanks,
Tyler
>
>> This becomes a more serious problem with firmware that supports
>> GHESv2 acknowledgements from the kernel. The firmware will populate
>> the error and wait for the kernel ack. But since the kernel will
>> never process the error we get into a state that the firmware will
>> not send any more errors and the kernel will never see or ack the
>> original error.
>>
>> Check for pending errors when probing HED type GHES entries to
>> avoid the above situation.
>>
>> This patch is based on Shiju's patch that adds support for GSIV
>> and GPIO notification types:
>> https://patchwork.kernel.org/patch/9628817/
>>
>> Signed-off-by: Tyler Baicar <[email protected]>
>> ---
>> drivers/acpi/apei/ghes.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index fd39929..cf5e938 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -1035,6 +1035,7 @@ static int ghes_probe(struct platform_device *ghes_dev)
>> register_acpi_hed_notifier(&ghes_notifier_hed);
>> list_add_rcu(&ghes->list, &ghes_hed);
>> mutex_unlock(&ghes_list_mutex);
>> + ghes_proc(ghes);
>> break;
>> case ACPI_HEST_NOTIFY_NMI:
>> ghes_nmi_add(ghes);
>>
>

--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.