[PATCH 1/4] ACPI, APEI, GHES, Prevent GHES to be built as module
[PATCH 2/4] ACPI, APEI, GHES, Support disable GHES at boot time
[PATCH 3/4] ACPI, APEI, Add APEI bit support in generic _OSC call
[PATCH 4/4] ACPI, APEI, Add APEI _OSC support
GHES (Generic Hardware Error Source) is used to process hardware error
notification in firmware first mode. But because firmware first mode
can be turned on but can not be turned off, it is unreasonable to
unload the GHES module with firmware first mode turned on. To avoid
confusion, this patch makes GHES can be enable/disable in
configuration time, but not built as module and unload at run time.
Signed-off-by: Huang Ying <[email protected]>
---
drivers/acpi/apei/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -10,7 +10,7 @@ config ACPI_APEI
error injection.
config ACPI_APEI_GHES
- tristate "APEI Generic Hardware Error Source"
+ bool "APEI Generic Hardware Error Source"
depends on ACPI_APEI && X86
select ACPI_HED
help
Some machine may have broken firmware so that GHES and firmware first
mode should be disabled. This patch adds support to that.
Signed-off-by: Huang Ying <[email protected]>
---
drivers/acpi/apei/ghes.c | 8 ++++++++
drivers/acpi/apei/hest.c | 17 +++++++++--------
include/acpi/apei.h | 1 +
3 files changed, 18 insertions(+), 8 deletions(-)
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -77,6 +77,9 @@ struct ghes {
};
};
+int ghes_disable;
+module_param_named(disable, ghes_disable, bool, 0);
+
static int ghes_panic_timeout __read_mostly = 30;
/*
@@ -662,6 +665,11 @@ static int __init ghes_init(void)
return -EINVAL;
}
+ if (ghes_disable) {
+ pr_info(GHES_PFX "GHES is not enabled!\n");
+ return -EINVAL;
+ }
+
rc = ghes_ioremap_init();
if (rc)
goto err;
--- a/drivers/acpi/apei/hest.c
+++ b/drivers/acpi/apei/hest.c
@@ -230,16 +230,17 @@ void __init acpi_hest_init(void)
goto err;
}
- rc = apei_hest_parse(hest_parse_ghes_count, &ghes_count);
- if (rc)
- goto err;
-
- rc = hest_ghes_dev_register(ghes_count);
- if (!rc) {
- pr_info(HEST_PFX "Table parsing has been initialized.\n");
- return;
+ if (!ghes_disable) {
+ rc = apei_hest_parse(hest_parse_ghes_count, &ghes_count);
+ if (rc)
+ goto err;
+ rc = hest_ghes_dev_register(ghes_count);
+ if (rc)
+ goto err;
}
+ pr_info(HEST_PFX "Table parsing has been initialized.\n");
+ return;
err:
hest_disable = 1;
}
--- a/include/acpi/apei.h
+++ b/include/acpi/apei.h
@@ -18,6 +18,7 @@
extern int hest_disable;
extern int erst_disable;
+extern int ghes_disable;
#ifdef CONFIG_ACPI_APEI
void __init acpi_hest_init(void);
In APEI firmware first mode, hardware error is reported by hardware to
firmware firstly, then firmware reports the error to Linux in a GHES
error record via POLL/SCI/IRQ/NMI etc.
This may result in some issues if OS has no full APEI support. So
some firmware implementation will work in a back-compatible mode by
default. Where firmware will only notify OS in old-fashion, without
GHES record. For example, for a fatal hardware error, only NMI is
signaled, no GHES record.
To gain full APEI power on these machines, APEI bit in generic _OSC
call should be specified to tell firmware that Linux has full APEI
support. This patch adds the APEI bit support in generic _OSC call.
Signed-off-by: Huang Ying <[email protected]>
---
drivers/acpi/bus.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -39,6 +39,7 @@
#include <linux/pci.h>
#include <acpi/acpi_bus.h>
#include <acpi/acpi_drivers.h>
+#include <acpi/apei.h>
#include <linux/dmi.h>
#include <linux/suspend.h>
@@ -541,6 +542,11 @@ static void acpi_bus_osc_support(void)
#if defined(CONFIG_ACPI_PROCESSOR) || defined(CONFIG_ACPI_PROCESSOR_MODULE)
capbuf[OSC_SUPPORT_TYPE] |= OSC_SB_PPC_OST_SUPPORT;
#endif
+
+#ifdef CONFIG_ACPI_APEI_GHES
+ if (!ghes_disable)
+ capbuf[OSC_SUPPORT_TYPE] |= OSC_SB_APEI_SUPPORT;
+#endif
if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB", &handle)))
return;
if (ACPI_SUCCESS(acpi_run_osc(handle, &context)))
APEI firmware first mode must be turned on explicitly on some
machines, otherwise they may be no GHES hardware error record for
hardware error notification. APEI bit in generic _OSC call can be
used to do that, but on some machine, a special APEI _OSC call must
be used. This patch adds the support to that APEI _OSC call.
Signed-off-by: Huang Ying <[email protected]>
---
drivers/acpi/apei/apei-base.c | 26 ++++++++++++++++++++++++++
drivers/acpi/apei/apei-internal.h | 2 ++
drivers/acpi/apei/ghes.c | 4 ++++
3 files changed, 32 insertions(+)
--- a/drivers/acpi/apei/apei-base.c
+++ b/drivers/acpi/apei/apei-base.c
@@ -604,3 +604,29 @@ struct dentry *apei_get_debugfs_dir(void
return dapei;
}
EXPORT_SYMBOL_GPL(apei_get_debugfs_dir);
+
+int apei_osc_setup(void)
+{
+ static u8 apei_uuid_str[] = "ed855e0c-6c90-47bf-a62a-26de0fc5ad5c";
+ acpi_handle handle;
+ u32 capbuf[3];
+ struct acpi_osc_context context = {
+ .uuid_str = apei_uuid_str,
+ .rev = 1,
+ .cap.length = sizeof(capbuf),
+ .cap.pointer = capbuf,
+ };
+
+ capbuf[OSC_QUERY_TYPE] = OSC_QUERY_ENABLE;
+ capbuf[OSC_SUPPORT_TYPE] = 0;
+ capbuf[OSC_CONTROL_TYPE] = 0;
+
+ if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB", &handle))
+ || ACPI_FAILURE(acpi_run_osc(handle, &context)))
+ return -EIO;
+ else {
+ kfree(context.ret.pointer);
+ return 0;
+ }
+}
+EXPORT_SYMBOL_GPL(apei_osc_setup);
--- a/drivers/acpi/apei/apei-internal.h
+++ b/drivers/acpi/apei/apei-internal.h
@@ -124,4 +124,6 @@ void apei_estatus_print(const char *pfx,
const struct acpi_hest_generic_status *estatus);
int apei_estatus_check_header(const struct acpi_hest_generic_status *estatus);
int apei_estatus_check(const struct acpi_hest_generic_status *estatus);
+
+int apei_osc_setup(void);
#endif
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -678,6 +678,10 @@ static int __init ghes_init(void)
if (rc)
goto err_ioremap_exit;
+ rc = apei_osc_setup();
+ if (rc)
+ pr_info(GHES_PFX "Evaluate APEI _OSC failed!\n");
+
return 0;
err_ioremap_exit:
ghes_ioremap_exit();
On Tue, Jun 21, 2011 at 03:16:24PM +0800, Huang Ying wrote:
> GHES (Generic Hardware Error Source) is used to process hardware error
> notification in firmware first mode. But because firmware first mode
> can be turned on but can not be turned off, it is unreasonable to
> unload the GHES module with firmware first mode turned on. To avoid
> confusion, this patch makes GHES can be enable/disable in
> configuration time, but not built as module and unload at run time.
It's better to keep it as a module, but disable unloading.
You can do that with a __module_get(THIS_MODULE) in the init code
when FFM is detected.
-Andi
Hi, Andi,
Thanks for comments!
On 06/21/2011 03:23 PM, Andi Kleen wrote:
> On Tue, Jun 21, 2011 at 03:16:24PM +0800, Huang Ying wrote:
>> GHES (Generic Hardware Error Source) is used to process hardware error
>> notification in firmware first mode. But because firmware first mode
>> can be turned on but can not be turned off, it is unreasonable to
>> unload the GHES module with firmware first mode turned on. To avoid
>> confusion, this patch makes GHES can be enable/disable in
>> configuration time, but not built as module and unload at run time.
>
> It's better to keep it as a module, but disable unloading.
> You can do that with a __module_get(THIS_MODULE) in the init code
> when FFM is detected.
There are two ways to turn on firmware first mode
1) APEI bit in generic _OSC call
2) Special APEI _OSC call
1) is run before GHES module loading. If keeping GHES as a module, it
is possible that firmware first mode has been turned on with APEI bit in
generic _OSC call, but GHES module is prevented to be loaded via some
kind of module blacklist. So I think it is better to prevent GHES to be
built as module.
Best Regards,
Huang Ying
On Tue, Jun 21, 2011 at 09:23:01AM +0200, Andi Kleen wrote:
> On Tue, Jun 21, 2011 at 03:16:24PM +0800, Huang Ying wrote:
> > GHES (Generic Hardware Error Source) is used to process hardware error
> > notification in firmware first mode. But because firmware first mode
> > can be turned on but can not be turned off, it is unreasonable to
> > unload the GHES module with firmware first mode turned on. To avoid
> > confusion, this patch makes GHES can be enable/disable in
> > configuration time, but not built as module and unload at run time.
>
> It's better to keep it as a module, but disable unloading.
> You can do that with a __module_get(THIS_MODULE) in the init code
> when FFM is detected.
Anything that's enabled by an _OSC call is expected to be available
immediately. So the choices are either to ensure that GHES support is
built in, or to make a second _OSC call when the GHES code is loaded.
We've seen in the PCIe case that many firmware implementations
misinterpret multiple attempts to set _OSC with the same UUID, and a
cursory examination of some implementations of the systemwide one
suggest that we'd see the same issue there.
--
Matthew Garrett | [email protected]
On Tue, Jun 21, 2011 at 03:16:27PM +0800, Huang Ying wrote:
> + rc = apei_osc_setup();
> + if (rc)
> + pr_info(GHES_PFX "Evaluate APEI _OSC failed!\n");
Hm. This is maybe a little strong. It'd be valid for a machine to return
an error here but still have the GHES functionality enabled via the
generic call, but this message would still show up and potentially
confuse the user. Can we keep a flag to check whether the generic method
gave us control, and only give the error if both fail to enable it?
--
Matthew Garrett | [email protected]
Other than my proviso in 4/4, this set looks good.
--
Matthew Garrett | [email protected]
On 06/21/2011 09:22 PM, Matthew Garrett wrote:
> On Tue, Jun 21, 2011 at 03:16:27PM +0800, Huang Ying wrote:
>
>> + rc = apei_osc_setup();
>> + if (rc)
>> + pr_info(GHES_PFX "Evaluate APEI _OSC failed!\n");
>
> Hm. This is maybe a little strong. It'd be valid for a machine to return
> an error here but still have the GHES functionality enabled via the
> generic call, but this message would still show up and potentially
> confuse the user. Can we keep a flag to check whether the generic method
> gave us control, and only give the error if both fail to enable it?
At least on some of my testing machine, generic _OSC call will not
return any error even it does not support APEI bit. So I think
sometimes it may be helpful to printk something here. To avoid
confusion, can we change the message as follow.
- generic _OSC succeeded, APEI _OSC failed: APEI firmware first mode is
enabled by APEI bit.
- generic _OSC failed, APEI _OSC succeeded: APEI firmware first mode is
enabled by APEI _OSC.
- both succeeded: APEI firmware first mode is enabled by APEI bit and
APEI _OSC.
- both failed: Failed to enable APEI firmware first mode!
Best Regards,
Huang Ying
On Wed, Jun 22, 2011 at 10:21:20AM +0800, Huang Ying wrote:
> At least on some of my testing machine, generic _OSC call will not
> return any error even it does not support APEI bit. So I think
> sometimes it may be helpful to printk something here. To avoid
> confusion, can we change the message as follow.
>
> - generic _OSC succeeded, APEI _OSC failed: APEI firmware first mode is
> enabled by APEI bit.
> - generic _OSC failed, APEI _OSC succeeded: APEI firmware first mode is
> enabled by APEI _OSC.
"WHEA _OSC", I think. The UUID here is part of the Microsoft spec, not
the APEI one.
> - both succeeded: APEI firmware first mode is enabled by APEI bit and
> APEI _OSC.
> - both failed: Failed to enable APEI firmware first mode!
Otherwise, looks good.
--
Matthew Garrett | [email protected]