2011-05-25 06:05:45

by Huang, Ying

[permalink] [raw]
Subject: [PATCH] ACPI, APEI, Add APEI _OSC support

In APEI firmware first mode, hardware error is reported by hardware to
firmware firstly, then firmware reports the error to Linux in a GHES
error record via POLL/SCI/IRQ/NMI etc.

This may result in some issues if OS has no full APEI support. So
some firmware implementation will work in a back-compatible mode by
default. Where firmware will only notify OS in old-fashion, without
GHES record. For example, for a fatal hardware error, only NMI is
signaled, no GHES record.

To gain full APEI power on these machines, a special APEI _OSC needs
to be evaluated to tell firmware that Linux has full APEI support.
This patch add the APEI _OSC support.

Signed-off-by: Huang Ying <[email protected]>
---
drivers/acpi/apei/apei-base.c | 42 ++++++++++++++++++++++++++++++++++++++
drivers/acpi/apei/apei-internal.h | 2 +
drivers/acpi/apei/ghes.c | 8 +++++++
3 files changed, 52 insertions(+)

--- a/drivers/acpi/apei/apei-base.c
+++ b/drivers/acpi/apei/apei-base.c
@@ -603,3 +603,45 @@ struct dentry *apei_get_debugfs_dir(void
return dapei;
}
EXPORT_SYMBOL_GPL(apei_get_debugfs_dir);
+
+enum {
+ APEI_OSC_SETUP_UNKNOWN,
+ APEI_OSC_SETUP_FAILED,
+ APEI_OSC_SETUP_SUCCEEDED,
+};
+
+int apei_osc_setup(void)
+{
+ /* Prevent _OSC to be evaluated simultaneously */
+ static DEFINE_MUTEX(mutex);
+ static int status = APEI_OSC_SETUP_UNKNOWN;
+ static u8 apei_uuid_str[] = "ed855e0c-6c90-47bf-a62a-26de0fc5ad5c";
+ acpi_handle handle;
+ u32 capbuf[3];
+ struct acpi_osc_context context = {
+ .uuid_str = apei_uuid_str,
+ .rev = 1,
+ .cap.length = sizeof(capbuf),
+ .cap.pointer = capbuf,
+ };
+
+ mutex_lock(&mutex);
+ if (status == APEI_OSC_SETUP_UNKNOWN) {
+ capbuf[OSC_QUERY_TYPE] = OSC_QUERY_ENABLE;
+ capbuf[OSC_SUPPORT_TYPE] = 0;
+ capbuf[OSC_CONTROL_TYPE] = 0;
+
+ if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB", &handle))
+ || ACPI_FAILURE(acpi_run_osc(handle, &context))) {
+ pr_err(APEI_PFX "APEI _OSC failed!\n");
+ status = APEI_OSC_SETUP_FAILED;
+ } else {
+ kfree(context.ret.pointer);
+ status = APEI_OSC_SETUP_SUCCEEDED;
+ }
+ }
+ mutex_unlock(&mutex);
+
+ return status == APEI_OSC_SETUP_SUCCEEDED ? 0 : -EIO;
+}
+EXPORT_SYMBOL_GPL(apei_osc_setup);
--- a/drivers/acpi/apei/apei-internal.h
+++ b/drivers/acpi/apei/apei-internal.h
@@ -113,4 +113,6 @@ void apei_estatus_print(const char *pfx,
const struct acpi_hest_generic_status *estatus);
int apei_estatus_check_header(const struct acpi_hest_generic_status *estatus);
int apei_estatus_check(const struct acpi_hest_generic_status *estatus);
+
+int apei_osc_setup(void);
#endif
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -687,6 +687,8 @@ static unsigned long ghes_esource_preall
return prealloc_size;
}

+static int ghes_remove(struct platform_device *ghes_dev);
+
static int __devinit ghes_probe(struct platform_device *ghes_dev)
{
struct acpi_hest_generic *generic;
@@ -770,6 +772,12 @@ static int __devinit ghes_probe(struct p
}
platform_set_drvdata(ghes_dev, ghes);

+ rc = apei_osc_setup();
+ if (rc) {
+ ghes_remove(ghes_dev);
+ return rc;
+ }
+
return 0;
err:
if (ghes) {


2011-06-13 14:50:21

by Don Zickus

[permalink] [raw]
Subject: Re: [PATCH] ACPI, APEI, Add APEI _OSC support

On Wed, May 25, 2011 at 02:05:38PM +0800, Huang Ying wrote:
> In APEI firmware first mode, hardware error is reported by hardware to
> firmware firstly, then firmware reports the error to Linux in a GHES
> error record via POLL/SCI/IRQ/NMI etc.
>
> This may result in some issues if OS has no full APEI support. So
> some firmware implementation will work in a back-compatible mode by
> default. Where firmware will only notify OS in old-fashion, without
> GHES record. For example, for a fatal hardware error, only NMI is
> signaled, no GHES record.
>
> To gain full APEI power on these machines, a special APEI _OSC needs
> to be evaluated to tell firmware that Linux has full APEI support.
> This patch add the APEI _OSC support.

Using an Intel box I have over at RedHat, I was able to use this patch to
get error injection (EINJ) to provide me a GHES record. Prior to this
patch I would just get unknown NMIs.

Talking with Matthew Garret, I guess it seems that uuids like this are
typical for ACPI.

I can't speak for all the ACPI parts, but the patch looks simple and
corret from my perspective.

Reviewed-and-Tested-by: Don Zickus <[email protected]>

2011-06-14 06:33:23

by Chen, Gong

[permalink] [raw]
Subject: Re: [PATCH] ACPI, APEI, Add APEI _OSC support

于 6/13/2011 10:50 PM, Don Zickus 写道:
> On Wed, May 25, 2011 at 02:05:38PM +0800, Huang Ying wrote:
>> In APEI firmware first mode, hardware error is reported by hardware to
>> firmware firstly, then firmware reports the error to Linux in a GHES
>> error record via POLL/SCI/IRQ/NMI etc.
>>
>> This may result in some issues if OS has no full APEI support. So
>> some firmware implementation will work in a back-compatible mode by
>> default. Where firmware will only notify OS in old-fashion, without
>> GHES record. For example, for a fatal hardware error, only NMI is
>> signaled, no GHES record.
>>
>> To gain full APEI power on these machines, a special APEI _OSC needs
>> to be evaluated to tell firmware that Linux has full APEI support.
>> This patch add the APEI _OSC support.
>
> Using an Intel box I have over at RedHat, I was able to use this patch to
> get error injection (EINJ) to provide me a GHES record. Prior to this
> patch I would just get unknown NMIs.
>
> Talking with Matthew Garret, I guess it seems that uuids like this are
> typical for ACPI.
>
> I can't speak for all the ACPI parts, but the patch looks simple and
> corret from my perspective.
>
Interesting, but I can't reproduce it on our machines by now. Would you
please share your test information such as machine type, BIOS
version, and test method (such as what type error you inject) ?

2011-06-14 12:11:14

by Don Zickus

[permalink] [raw]
Subject: Re: [PATCH] ACPI, APEI, Add APEI _OSC support

On Tue, Jun 14, 2011 at 02:33:19PM +0800, Chen Gong wrote:
> >Using an Intel box I have over at RedHat, I was able to use this patch to
> >get error injection (EINJ) to provide me a GHES record. Prior to this
> >patch I would just get unknown NMIs.
> >
> >Talking with Matthew Garret, I guess it seems that uuids like this are
> >typical for ACPI.
> >
> >I can't speak for all the ACPI parts, but the patch looks simple and
> >corret from my perspective.
> >
> Interesting, but I can't reproduce it on our machines by now. Would you
> please share your test information such as machine type, BIOS
> version, and test method (such as what type error you inject) ?

Sure,

#cat /proc/cpuinfo

processor : 23
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU X5670 @ 2.93GHz
stepping : 2
cpu MHz : 1596.000

#dmidecode

Handle 0x0002, DMI type 1, 27 bytes
System Information
Manufacturer: Intel Corporation
Product Name: S5520UR
Version: ....................
Serial Number: ............
UUID: 57890342-6DA6-11DF-BD83-001517EE9AE8
Wake-up Type: AC Power Restored
SKU Number: Not Specified
Family: Not Specified

Handle 0x0003, DMI type 2, 16 bytes
Base Board Information
Manufacturer: Intel Corporation
Product Name: S5520UR
Version: E81084-751
Serial Number: BZUB02211743
Asset Tag: ....................
Features:
Board is a hosting board
Board is replaceable
Location In Chassis: Not Specified
Chassis Handle: 0x0004
Type: Motherboard
Contained Object Handles: 0

Handle 0x0005, DMI type 0, 24 bytes
BIOS Information
Vendor: Intel Corp.
Version: S5500.86B.01.00.0055.112620101923
Release Date: 11/26/2010
Address: 0xF0000
Runtime Size: 64 kB
ROM Size: 8192 kB
Characteristics:
PCI is supported
PNP is supported
BIOS is upgradeable
BIOS shadowing is allowed
Boot from CD is supported
Selectable boot is supported
EDD is supported
3.5"/2.88 MB floppy services are supported (int 13h)
Print screen service is supported (int 5h)
8042 keyboard services are supported (int 9h)
:
CGA/mono video services are supported (int 10h)
ACPI is supported
USB legacy is supported
LS-120 boot is supported
ATAPI Zip drive boot is supported
Function key-initiated network boot is supported
Targeted content distribution is supported
BIOS Revision: 17.18
Firmware Revision: 0.0

#instructions
mount -t debugfs debugfs /sys/kernel/debug
cd /sys/kernel/debug/apei/einj
echo 0x20 > error_type
echo 1 > error_inject
#should fail within one second

Let me know if you need more info.
By the way this is the only machine where this works. Other Intel
machines stopped producing NMIs when their BIOS was updated (before I got
the chance to validate this patch on them).

Cheers,
Don

2011-06-14 14:53:00

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH] ACPI, APEI, Add APEI _OSC support

On Wed, May 25, 2011 at 02:05:38PM +0800, Huang Ying wrote:

> To gain full APEI power on these machines, a special APEI _OSC needs
> to be evaluated to tell firmware that Linux has full APEI support.
> This patch add the APEI _OSC support.

(snip)

> + static DEFINE_MUTEX(mutex);
> + static int status = APEI_OSC_SETUP_UNKNOWN;
> + static u8 apei_uuid_str[] = "ed855e0c-6c90-47bf-a62a-26de0fc5ad5c";

This is the WHEA UUID, right?

> + u32 capbuf[3];
> + struct acpi_osc_context context = {
> + .uuid_str = apei_uuid_str,
> + .rev = 1,
> + .cap.length = sizeof(capbuf),
> + .cap.pointer = capbuf,
> + };
> +
> + mutex_lock(&mutex);
> + if (status == APEI_OSC_SETUP_UNKNOWN) {
> + capbuf[OSC_QUERY_TYPE] = OSC_QUERY_ENABLE;
> + capbuf[OSC_SUPPORT_TYPE] = 0;
> + capbuf[OSC_CONTROL_TYPE] = 0;
> +
> + if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB", &handle))
> + || ACPI_FAILURE(acpi_run_osc(handle, &context))) {
> + pr_err(APEI_PFX "APEI _OSC failed!\n");
> + status = APEI_OSC_SETUP_FAILED;
> + } else {
> + kfree(context.ret.pointer);
> + status = APEI_OSC_SETUP_SUCCEEDED;
> + }
> + }
> + mutex_unlock(&mutex);
> +
> + return status == APEI_OSC_SETUP_SUCCEEDED ? 0 : -EIO;

So we fail if the platform doesn't implement WHEA...

> + rc = apei_osc_setup();
> + if (rc) {
> + ghes_remove(ghes_dev);
> + return rc;
> + }
> +

And then tear down GHES. This seems wrong. A platform could predicate
APEI functionality on the ACPI spec APEI indication (which we currently
don't pass) without implementing WHEA, but with this patch we'd refuse
to enable GHES support? We should probably try both the standard method
and the WHEA method and only disable GHES if both fail.

(Also, are there any other sideeffects of indicating that we support
WHEA?)

--
Matthew Garrett | [email protected]

2011-06-15 03:53:48

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH] ACPI, APEI, Add APEI _OSC support

Hi, Matthew,

On 06/14/2011 10:52 PM, Matthew Garrett wrote:
> On Wed, May 25, 2011 at 02:05:38PM +0800, Huang Ying wrote:
>
>> To gain full APEI power on these machines, a special APEI _OSC needs
>> to be evaluated to tell firmware that Linux has full APEI support.
>> This patch add the APEI _OSC support.
>
> (snip)
>
>> + static DEFINE_MUTEX(mutex);
>> + static int status = APEI_OSC_SETUP_UNKNOWN;
>> + static u8 apei_uuid_str[] = "ed855e0c-6c90-47bf-a62a-26de0fc5ad5c";
>
> This is the WHEA UUID, right?

Yes.

>> + u32 capbuf[3];
>> + struct acpi_osc_context context = {
>> + .uuid_str = apei_uuid_str,
>> + .rev = 1,
>> + .cap.length = sizeof(capbuf),
>> + .cap.pointer = capbuf,
>> + };
>> +
>> + mutex_lock(&mutex);
>> + if (status == APEI_OSC_SETUP_UNKNOWN) {
>> + capbuf[OSC_QUERY_TYPE] = OSC_QUERY_ENABLE;
>> + capbuf[OSC_SUPPORT_TYPE] = 0;
>> + capbuf[OSC_CONTROL_TYPE] = 0;
>> +
>> + if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB", &handle))
>> + || ACPI_FAILURE(acpi_run_osc(handle, &context))) {
>> + pr_err(APEI_PFX "APEI _OSC failed!\n");
>> + status = APEI_OSC_SETUP_FAILED;
>> + } else {
>> + kfree(context.ret.pointer);
>> + status = APEI_OSC_SETUP_SUCCEEDED;
>> + }
>> + }
>> + mutex_unlock(&mutex);
>> +
>> + return status == APEI_OSC_SETUP_SUCCEEDED ? 0 : -EIO;
>
> So we fail if the platform doesn't implement WHEA...
>
>> + rc = apei_osc_setup();
>> + if (rc) {
>> + ghes_remove(ghes_dev);
>> + return rc;
>> + }
>> +
>
> And then tear down GHES. This seems wrong. A platform could predicate
> APEI functionality on the ACPI spec APEI indication (which we currently
> don't pass) without implementing WHEA, but with this patch we'd refuse
> to enable GHES support? We should probably try both the standard method
> and the WHEA method and only disable GHES if both fail.

You means the "APEI Support" bit for standard UUID? Do you know which
machine uses this bit? I can write the code, but I have no machine to
test it.

BTW, it is better for us to enable APEI firmware first mode (that is,
what is enabled by evaluating the WHEA UUID) after GHES reporting is
ready (that is, after GHES module is successfully loaded). That is
later than current ACPI _OSC evaluation with standard UUID. Is it
possible to evaluate _OSC with standard UUID twice? So that we can
enable APEI firmware first mode later.

> (Also, are there any other sideeffects of indicating that we support
> WHEA?)

After evaluating _OSC with this UUID, firmware will produce error record
to OS, otherwise only unknown NMI.

Best Regards,
Huang Ying

2011-06-15 12:17:19

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH] ACPI, APEI, Add APEI _OSC support

On Wed, Jun 15, 2011 at 11:53:32AM +0800, Huang Ying wrote:
> Hi, Matthew,
> On 06/14/2011 10:52 PM, Matthew Garrett wrote:
> > And then tear down GHES. This seems wrong. A platform could predicate
> > APEI functionality on the ACPI spec APEI indication (which we currently
> > don't pass) without implementing WHEA, but with this patch we'd refuse
> > to enable GHES support? We should probably try both the standard method
> > and the WHEA method and only disable GHES if both fail.
>
> You means the "APEI Support" bit for standard UUID? Do you know which
> machine uses this bit? I can write the code, but I have no machine to
> test it.

I have access to a Dell system that uses this.

> BTW, it is better for us to enable APEI firmware first mode (that is,
> what is enabled by evaluating the WHEA UUID) after GHES reporting is
> ready (that is, after GHES module is successfully loaded). That is
> later than current ACPI _OSC evaluation with standard UUID. Is it
> possible to evaluate _OSC with standard UUID twice? So that we can
> enable APEI firmware first mode later.

Urgh. One machine I've looked at enables APEI if the WHEA _OSC call is
made, and then clears a flag if any other _OSC call is made. In that
specific case it doesn't seem to matter (the flag never actually gets
checked in any of the other codepaths), but it seems that the intention
is for the generic call to be made and the WHEA one to be made after
that.

> > (Also, are there any other sideeffects of indicating that we support
> > WHEA?)
>
> After evaluating _OSC with this UUID, firmware will produce error record
> to OS, otherwise only unknown NMI.

Ok, that sounds fine.

--
Matthew Garrett | [email protected]

2011-06-16 00:40:18

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH] ACPI, APEI, Add APEI _OSC support

On 06/15/2011 08:17 PM, Matthew Garrett wrote:
> On Wed, Jun 15, 2011 at 11:53:32AM +0800, Huang Ying wrote:
>> Hi, Matthew,
>> On 06/14/2011 10:52 PM, Matthew Garrett wrote:
>>> And then tear down GHES. This seems wrong. A platform could predicate
>>> APEI functionality on the ACPI spec APEI indication (which we currently
>>> don't pass) without implementing WHEA, but with this patch we'd refuse
>>> to enable GHES support? We should probably try both the standard method
>>> and the WHEA method and only disable GHES if both fail.
>>
>> You means the "APEI Support" bit for standard UUID? Do you know which
>> machine uses this bit? I can write the code, but I have no machine to
>> test it.
>
> I have access to a Dell system that uses this.

Great! Can you help us to test the code?

>> BTW, it is better for us to enable APEI firmware first mode (that is,
>> what is enabled by evaluating the WHEA UUID) after GHES reporting is
>> ready (that is, after GHES module is successfully loaded). That is
>> later than current ACPI _OSC evaluation with standard UUID. Is it
>> possible to evaluate _OSC with standard UUID twice? So that we can
>> enable APEI firmware first mode later.
>
> Urgh. One machine I've looked at enables APEI if the WHEA _OSC call is
> made, and then clears a flag if any other _OSC call is made. In that
> specific case it doesn't seem to matter (the flag never actually gets
> checked in any of the other codepaths), but it seems that the intention
> is for the generic call to be made and the WHEA one to be made after
> that.

Yes. The WHEA call should be made after the generic one. Another
situation is as follow:

- Generic _OSC call without "APEI Support" bit is called (in
acpi_bus_osc_support).

- After some time, when we think it is good to turn on firmware first
mode fully, usually after we checking HEST and initializing
corresponding module, we make generic _OSC call with "APEI Support" bit
to turn on firmware first mode fully in standard way.

Is it a good idea to make generic _OSC call twice, one without "APEI
Support" bit, the other with "APEI Support" bit?

Best Regards,
Huang Ying

2011-06-16 01:38:27

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH] ACPI, APEI, Add APEI _OSC support

On Thu, Jun 16, 2011 at 08:40:11AM +0800, Huang Ying wrote:
> On 06/15/2011 08:17 PM, Matthew Garrett wrote:
> >> You means the "APEI Support" bit for standard UUID? Do you know which
> >> machine uses this bit? I can write the code, but I have no machine to
> >> test it.
> >
> > I have access to a Dell system that uses this.
>
> Great! Can you help us to test the code?

Yup, no problem.

> > Urgh. One machine I've looked at enables APEI if the WHEA _OSC call is
> > made, and then clears a flag if any other _OSC call is made. In that
> > specific case it doesn't seem to matter (the flag never actually gets
> > checked in any of the other codepaths), but it seems that the intention
> > is for the generic call to be made and the WHEA one to be made after
> > that.
>
> Yes. The WHEA call should be made after the generic one. Another
> situation is as follow:
>
> - Generic _OSC call without "APEI Support" bit is called (in
> acpi_bus_osc_support).
>
> - After some time, when we think it is good to turn on firmware first
> mode fully, usually after we checking HEST and initializing
> corresponding module, we make generic _OSC call with "APEI Support" bit
> to turn on firmware first mode fully in standard way.
>
> Is it a good idea to make generic _OSC call twice, one without "APEI
> Support" bit, the other with "APEI Support" bit?

I think we probably need to make the HEST decision early, and use that
to decide how to make the generic call. Our experience has been that
many firmware vendors only expect _OSC to be called once with any given
UUID - multiple calls can result in unexpected behaviour.

--
Matthew Garrett | [email protected]

2011-06-16 01:56:03

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH] ACPI, APEI, Add APEI _OSC support

On 06/16/2011 09:38 AM, Matthew Garrett wrote:
> On Thu, Jun 16, 2011 at 08:40:11AM +0800, Huang Ying wrote:
>> On 06/15/2011 08:17 PM, Matthew Garrett wrote:
>>>> You means the "APEI Support" bit for standard UUID? Do you know which
>>>> machine uses this bit? I can write the code, but I have no machine to
>>>> test it.
>>>
>>> I have access to a Dell system that uses this.
>>
>> Great! Can you help us to test the code?
>
> Yup, no problem.
>
>>> Urgh. One machine I've looked at enables APEI if the WHEA _OSC call is
>>> made, and then clears a flag if any other _OSC call is made. In that
>>> specific case it doesn't seem to matter (the flag never actually gets
>>> checked in any of the other codepaths), but it seems that the intention
>>> is for the generic call to be made and the WHEA one to be made after
>>> that.
>>
>> Yes. The WHEA call should be made after the generic one. Another
>> situation is as follow:
>>
>> - Generic _OSC call without "APEI Support" bit is called (in
>> acpi_bus_osc_support).
>>
>> - After some time, when we think it is good to turn on firmware first
>> mode fully, usually after we checking HEST and initializing
>> corresponding module, we make generic _OSC call with "APEI Support" bit
>> to turn on firmware first mode fully in standard way.
>>
>> Is it a good idea to make generic _OSC call twice, one without "APEI
>> Support" bit, the other with "APEI Support" bit?
>
> I think we probably need to make the HEST decision early, and use that
> to decide how to make the generic call. Our experience has been that
> many firmware vendors only expect _OSC to be called once with any given
> UUID - multiple calls can result in unexpected behaviour.

acpi_bus_osc_support is called via subsys_initcall. It is a little hard
to do all checking before that. Is it possible to call
acpi_bus_osc_support later?

Best Regards,
Huang Ying

2011-06-16 01:57:51

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH] ACPI, APEI, Add APEI _OSC support

On Thu, Jun 16, 2011 at 09:55:58AM +0800, Huang Ying wrote:
> On 06/16/2011 09:38 AM, Matthew Garrett wrote:
> > I think we probably need to make the HEST decision early, and use that
> > to decide how to make the generic call. Our experience has been that
> > many firmware vendors only expect _OSC to be called once with any given
> > UUID - multiple calls can result in unexpected behaviour.
>
> acpi_bus_osc_support is called via subsys_initcall. It is a little hard
> to do all checking before that. Is it possible to call
> acpi_bus_osc_support later?

Yeah, this is going to be a problem. We have the HEST available at this
point so we ought to be able to parse it, though. I'll take a look
tomorrow.

--
Matthew Garrett | [email protected]

2011-06-16 09:00:23

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] ACPI, APEI, Add APEI _OSC support

On Thursday, June 16, 2011, Matthew Garrett wrote:
> On Thu, Jun 16, 2011 at 08:40:11AM +0800, Huang Ying wrote:
> > On 06/15/2011 08:17 PM, Matthew Garrett wrote:
> > >> You means the "APEI Support" bit for standard UUID? Do you know which
> > >> machine uses this bit? I can write the code, but I have no machine to
> > >> test it.
> > >
> > > I have access to a Dell system that uses this.
> >
> > Great! Can you help us to test the code?
>
> Yup, no problem.
>
> > > Urgh. One machine I've looked at enables APEI if the WHEA _OSC call is
> > > made, and then clears a flag if any other _OSC call is made. In that
> > > specific case it doesn't seem to matter (the flag never actually gets
> > > checked in any of the other codepaths), but it seems that the intention
> > > is for the generic call to be made and the WHEA one to be made after
> > > that.
> >
> > Yes. The WHEA call should be made after the generic one. Another
> > situation is as follow:
> >
> > - Generic _OSC call without "APEI Support" bit is called (in
> > acpi_bus_osc_support).
> >
> > - After some time, when we think it is good to turn on firmware first
> > mode fully, usually after we checking HEST and initializing
> > corresponding module, we make generic _OSC call with "APEI Support" bit
> > to turn on firmware first mode fully in standard way.
> >
> > Is it a good idea to make generic _OSC call twice, one without "APEI
> > Support" bit, the other with "APEI Support" bit?
>
> I think we probably need to make the HEST decision early, and use that
> to decide how to make the generic call. Our experience has been that
> many firmware vendors only expect _OSC to be called once with any given
> UUID - multiple calls can result in unexpected behaviour.

Not really, as long as the "query enable" bit is set.

Thanks,
Rafael

2011-06-17 00:57:15

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH] ACPI, APEI, Add APEI _OSC support

On 06/16/2011 09:57 AM, Matthew Garrett wrote:
> On Thu, Jun 16, 2011 at 09:55:58AM +0800, Huang Ying wrote:
>> On 06/16/2011 09:38 AM, Matthew Garrett wrote:
>>> I think we probably need to make the HEST decision early, and use that
>>> to decide how to make the generic call. Our experience has been that
>>> many firmware vendors only expect _OSC to be called once with any given
>>> UUID - multiple calls can result in unexpected behaviour.
>>
>> acpi_bus_osc_support is called via subsys_initcall. It is a little hard
>> to do all checking before that. Is it possible to call
>> acpi_bus_osc_support later?
>
> Yeah, this is going to be a problem. We have the HEST available at this
> point so we ought to be able to parse it, though. I'll take a look
> tomorrow.

We can check the HEST table before _OSC evaluating. But it is much
harder to check software part, because we have implemented GHES support
(Generic Hardware Error Source, the handler of firmware first mode
hardware error notification) as device driver and module.

So I think we can do that in 2 steps. At first, we just enable WHEA
UUID, because that is easier to do. Then we find a way to implement
"APEI bit" in generic _OSC call. Do you think that is a good idea?

Best Regards,
Huang Ying

2011-06-17 01:35:02

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH] ACPI, APEI, Add APEI _OSC support

On Fri, Jun 17, 2011 at 08:57:09AM +0800, Huang Ying wrote:
> On 06/16/2011 09:57 AM, Matthew Garrett wrote:
> > Yeah, this is going to be a problem. We have the HEST available at this
> > point so we ought to be able to parse it, though. I'll take a look
> > tomorrow.
>
> We can check the HEST table before _OSC evaluating. But it is much
> harder to check software part, because we have implemented GHES support
> (Generic Hardware Error Source, the handler of firmware first mode
> hardware error notification) as device driver and module.

If the kernel has been configured with support for the feature then I
think we ought to be able to assume that the kernel will support it at
runtime.

> So I think we can do that in 2 steps. At first, we just enable WHEA
> UUID, because that is easier to do. Then we find a way to implement
> "APEI bit" in generic _OSC call. Do you think that is a good idea?

I'm fine with that, providing that GHES isn't disabled purely because
the WHEA UUID call wasn't successful.

--
Matthew Garrett | [email protected]

2011-06-17 01:40:25

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH] ACPI, APEI, Add APEI _OSC support

On 06/17/2011 09:34 AM, Matthew Garrett wrote:
> On Fri, Jun 17, 2011 at 08:57:09AM +0800, Huang Ying wrote:
>> On 06/16/2011 09:57 AM, Matthew Garrett wrote:
>>> Yeah, this is going to be a problem. We have the HEST available at this
>>> point so we ought to be able to parse it, though. I'll take a look
>>> tomorrow.
>>
>> We can check the HEST table before _OSC evaluating. But it is much
>> harder to check software part, because we have implemented GHES support
>> (Generic Hardware Error Source, the handler of firmware first mode
>> hardware error notification) as device driver and module.
>
> If the kernel has been configured with support for the feature then I
> think we ought to be able to assume that the kernel will support it at
> runtime.

There may be error during driver initialization. That is what I am
concerned.

>> So I think we can do that in 2 steps. At first, we just enable WHEA
>> UUID, because that is easier to do. Then we find a way to implement
>> "APEI bit" in generic _OSC call. Do you think that is a good idea?
>
> I'm fine with that, providing that GHES isn't disabled purely because
> the WHEA UUID call wasn't successful.

Because we have not added the code to make generic _OSC call with "APEI
bit" now, so if WHEA UUID call failed, we have no firmware first mode
enabled. So I think it is safe to disable GHES if WHEA UUID call
failed. But in another hand, keeping GHES has no harm too. So I am OK
to keep GHES if WHEA UUID call failed.

Best Regards,
Huang Ying

2011-06-17 01:42:26

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH] ACPI, APEI, Add APEI _OSC support

On Fri, Jun 17, 2011 at 09:40:17AM +0800, Huang Ying wrote:
> On 06/17/2011 09:34 AM, Matthew Garrett wrote:
> > If the kernel has been configured with support for the feature then I
> > think we ought to be able to assume that the kernel will support it at
> > runtime.
>
> There may be error during driver initialization. That is what I am
> concerned.

That's true of any _OSC functionality.

> >> So I think we can do that in 2 steps. At first, we just enable WHEA
> >> UUID, because that is easier to do. Then we find a way to implement
> >> "APEI bit" in generic _OSC call. Do you think that is a good idea?
> >
> > I'm fine with that, providing that GHES isn't disabled purely because
> > the WHEA UUID call wasn't successful.
>
> Because we have not added the code to make generic _OSC call with "APEI
> bit" now, so if WHEA UUID call failed, we have no firmware first mode
> enabled. So I think it is safe to disable GHES if WHEA UUID call
> failed. But in another hand, keeping GHES has no harm too. So I am OK
> to keep GHES if WHEA UUID call failed.

I see your point. But this does need to be fixed in the long run.

--
Matthew Garrett | [email protected]

2011-06-17 01:53:17

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH] ACPI, APEI, Add APEI _OSC support

On 06/17/2011 09:42 AM, Matthew Garrett wrote:
> On Fri, Jun 17, 2011 at 09:40:17AM +0800, Huang Ying wrote:
>> On 06/17/2011 09:34 AM, Matthew Garrett wrote:
>>> If the kernel has been configured with support for the feature then I
>>> think we ought to be able to assume that the kernel will support it at
>>> runtime.
>>
>> There may be error during driver initialization. That is what I am
>> concerned.
>
> That's true of any _OSC functionality.

I see. This should be fixed in the long run. So I will add "APEI bit"
support if GHES is configured.

Best Regards,
Huang Ying