2013-10-22 17:56:46

by Tony Luck

[permalink] [raw]
Subject: [GIT PULL] For x86/mce ... enhanced error logs

Ingo,

Ultimate plan is to use these enhanced error logs to feed a
perf/trace event ... but we are still discussing the exact
format of that, and also how it should interact/complement/replace
the existing EDAC trace event. Meanwhile all this precursor work
has been reviewed and agreed on by Mauro, Boris & others ... so
it can be queued for the next merge window while we continue work
on the trace event.

-Tony

The following changes since commit 31d141e3a666269a3b6fcccddb0351caf7454240:

Linux 3.12-rc6 (2013-10-19 12:28:15 -0700)

are available in the git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git tags/please-pull-eMCA

for you to fetch changes up to 7dcb5248a2593539d40b1bcd8da0158e7c3967b5:

EDAC, GHES: Update ghes error record info (2013-10-21 15:12:02 -0700)

----------------------------------------------------------------
There is a enhanced error logging mechanism for Xeon processors.
Full description is here:
http://www.intel.com/content/www/us/en/architecture-and-technology/enhanced-mca-logging-xeon-paper.html
This patch series provides a module (and support code) to
check for an extended error log and print extra details about
the error on the console.

----------------------------------------------------------------
Chen, Gong (9):
ACPI, APEI, CPER: Fix status check during error printing
ACPI, CPER: Update cper info
bitops: Introduce a more generic BITMASK macro
ACPI, x86: Extended error log driver for x86 platform
DMI: Parse memory device (type 17) in SMBIOS
ACPI, APEI, CPER: Add UEFI 2.4 support for memory error
ACPI, APEI, CPER: Enhance memory reporting capability
ACPI, APEI, CPER: Cleanup CPER memory error output format
EDAC, GHES: Update ghes error record info

arch/ia64/kernel/setup.c | 1 +
arch/x86/include/asm/mce.h | 1 +
arch/x86/kernel/cpu/mcheck/mce-apei.c | 3 +-
arch/x86/kernel/setup.c | 1 +
drivers/acpi/Kconfig | 19 ++
drivers/acpi/Makefile | 2 +
drivers/acpi/acpi_extlog.c | 326 ++++++++++++++++++++++++++++++++++
drivers/acpi/apei/apei-internal.h | 12 +-
drivers/acpi/apei/cper.c | 132 +++++++-------
drivers/acpi/apei/ghes.c | 58 +++---
drivers/acpi/bus.c | 3 +-
drivers/edac/amd64_edac.c | 46 ++---
drivers/edac/amd64_edac.h | 8 -
drivers/edac/ghes_edac.c | 16 +-
drivers/edac/sb_edac.c | 2 +-
drivers/firmware/dmi_scan.c | 60 +++++++
drivers/video/sis/init.c | 5 +-
include/acpi/actbl1.h | 14 +-
include/acpi/ghes.h | 2 +-
include/linux/acpi.h | 1 +
include/linux/bitops.h | 8 +
include/linux/cper.h | 13 +-
include/linux/dmi.h | 5 +
include/linux/edac.h | 2 +-
24 files changed, 591 insertions(+), 149 deletions(-)
create mode 100644 drivers/acpi/acpi_extlog.c


2013-10-23 17:42:57

by Tony Luck

[permalink] [raw]
Subject: [GIT PULLv2] For x86/mce ... enhanced error logs

Replacement for yesterday's pull request - fixes a build bug when CONFIG_SMP=n
found by Fengguang's zero-day auto-build robot army. If you pulled (and pushed)
that one before finding this in your mailbox - then I can send the one-line
patch to be applied on top of yesterday's version.

-Tony

The following changes since commit 31d141e3a666269a3b6fcccddb0351caf7454240:

Linux 3.12-rc6 (2013-10-19 12:28:15 -0700)

are available in the git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git tags/please-pull-eMCA-fix

for you to fetch changes up to 0f6d98727f0eea8cdeea4a3eacff266cad7dc764:

UEFI, CPER: Move cper.c to a more proper place (2013-10-23 10:11:08 -0700)

----------------------------------------------------------------
There is a enhanced error logging mechanism for Xeon processors.
Full description is here:
http://www.intel.com/content/www/us/en/architecture-and-technology/enhanced-mca-logging-xeon-paper.html
This patch series provides a module (and support code) to
check for an extended error log and print extra details about
the error on the console.

----------------------------------------------------------------
Chen, Gong (10):
ACPI, APEI, CPER: Fix status check during error printing
ACPI, CPER: Update cper info
bitops: Introduce a more generic BITMASK macro
ACPI, x86: Extended error log driver for x86 platform
DMI: Parse memory device (type 17) in SMBIOS
ACPI, APEI, CPER: Add UEFI 2.4 support for memory error
ACPI, APEI, CPER: Enhance memory reporting capability
ACPI, APEI, CPER: Cleanup CPER memory error output format
EDAC, GHES: Update ghes error record info
UEFI, CPER: Move cper.c to a more proper place

arch/ia64/kernel/setup.c | 1 +
arch/x86/include/asm/mce.h | 1 +
arch/x86/kernel/cpu/mcheck/mce-apei.c | 3 +-
arch/x86/kernel/setup.c | 1 +
drivers/acpi/Kconfig | 20 +++
drivers/acpi/Makefile | 2 +
drivers/acpi/acpi_extlog.c | 327 ++++++++++++++++++++++++++++++++++
drivers/acpi/apei/Kconfig | 1 +
drivers/acpi/apei/Makefile | 2 +-
drivers/acpi/apei/apei-internal.h | 12 +-
drivers/acpi/apei/ghes.c | 58 +++---
drivers/acpi/bus.c | 3 +-
drivers/edac/amd64_edac.c | 46 ++---
drivers/edac/amd64_edac.h | 8 -
drivers/edac/ghes_edac.c | 16 +-
drivers/edac/sb_edac.c | 2 +-
drivers/firmware/dmi_scan.c | 60 +++++++
drivers/video/sis/init.c | 5 +-
include/acpi/actbl1.h | 14 +-
include/acpi/ghes.h | 2 +-
include/linux/acpi.h | 1 +
include/linux/bitops.h | 8 +
include/linux/cper.h | 13 +-
include/linux/dmi.h | 5 +
include/linux/edac.h | 2 +-
lib/Kconfig | 4 +
lib/Makefile | 1 +
{drivers/acpi/apei => lib}/cper.c | 132 +++++++-------
28 files changed, 600 insertions(+), 150 deletions(-)
create mode 100644 drivers/acpi/acpi_extlog.c
rename {drivers/acpi/apei => lib}/cper.c (76%)

2013-10-23 18:13:39

by Tony Luck

[permalink] [raw]
Subject: Re: [GIT PULLv2] For x86/mce ... enhanced error logs

On Wed, Oct 23, 2013 at 10:42 AM, Luck, Tony <[email protected]> wrote:
> Replacement for yesterday's pull request - fixes a build bug when CONFIG_SMP=n
> found by Fengguang's zero-day auto-build robot army. If you pulled (and pushed)
> that one before finding this in your mailbox - then I can send the one-line
> patch to be applied on top of yesterday's version.

Well - I might as well put the brown paper bag over my head and go sit
in the corner :-(

I accidentally applied an extra patch "UEFI, CPER: Move cper.c to a
more proper place"
that Chen Gong had sent to me internally but has only been mentioned
in concept on the mailing list.

So feel free to pop the 10th patch off and drop it.

-Tony

2013-10-26 10:06:34

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULLv2] For x86/mce ... enhanced error logs


* Luck, Tony <[email protected]> wrote:

> Replacement for yesterday's pull request - fixes a build bug when CONFIG_SMP=n
> found by Fengguang's zero-day auto-build robot army. If you pulled (and pushed)
> that one before finding this in your mailbox - then I can send the one-line
> patch to be applied on top of yesterday's version.
>
> -Tony
>
> The following changes since commit 31d141e3a666269a3b6fcccddb0351caf7454240:
>
> Linux 3.12-rc6 (2013-10-19 12:28:15 -0700)
>
> are available in the git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git tags/please-pull-eMCA-fix
>
> for you to fetch changes up to 0f6d98727f0eea8cdeea4a3eacff266cad7dc764:
>
> UEFI, CPER: Move cper.c to a more proper place (2013-10-23 10:11:08 -0700)
>
> ----------------------------------------------------------------
> There is a enhanced error logging mechanism for Xeon processors.
> Full description is here:
> http://www.intel.com/content/www/us/en/architecture-and-technology/enhanced-mca-logging-xeon-paper.html
> This patch series provides a module (and support code) to
> check for an extended error log and print extra details about
> the error on the console.
>
> ----------------------------------------------------------------
> Chen, Gong (10):
> ACPI, APEI, CPER: Fix status check during error printing
> ACPI, CPER: Update cper info
> bitops: Introduce a more generic BITMASK macro
> ACPI, x86: Extended error log driver for x86 platform
> DMI: Parse memory device (type 17) in SMBIOS
> ACPI, APEI, CPER: Add UEFI 2.4 support for memory error
> ACPI, APEI, CPER: Enhance memory reporting capability
> ACPI, APEI, CPER: Cleanup CPER memory error output format
> EDAC, GHES: Update ghes error record info
> UEFI, CPER: Move cper.c to a more proper place
>
> arch/ia64/kernel/setup.c | 1 +
> arch/x86/include/asm/mce.h | 1 +
> arch/x86/kernel/cpu/mcheck/mce-apei.c | 3 +-
> arch/x86/kernel/setup.c | 1 +
> drivers/acpi/Kconfig | 20 +++
> drivers/acpi/Makefile | 2 +
> drivers/acpi/acpi_extlog.c | 327 ++++++++++++++++++++++++++++++++++
> drivers/acpi/apei/Kconfig | 1 +
> drivers/acpi/apei/Makefile | 2 +-
> drivers/acpi/apei/apei-internal.h | 12 +-
> drivers/acpi/apei/ghes.c | 58 +++---
> drivers/acpi/bus.c | 3 +-
> drivers/edac/amd64_edac.c | 46 ++---
> drivers/edac/amd64_edac.h | 8 -
> drivers/edac/ghes_edac.c | 16 +-
> drivers/edac/sb_edac.c | 2 +-
> drivers/firmware/dmi_scan.c | 60 +++++++
> drivers/video/sis/init.c | 5 +-
> include/acpi/actbl1.h | 14 +-
> include/acpi/ghes.h | 2 +-
> include/linux/acpi.h | 1 +
> include/linux/bitops.h | 8 +
> include/linux/cper.h | 13 +-
> include/linux/dmi.h | 5 +
> include/linux/edac.h | 2 +-
> lib/Kconfig | 4 +
> lib/Makefile | 1 +
> {drivers/acpi/apei => lib}/cper.c | 132 +++++++-------
> 28 files changed, 600 insertions(+), 150 deletions(-)
> create mode 100644 drivers/acpi/acpi_extlog.c
> rename {drivers/acpi/apei => lib}/cper.c (76%)

Pulled, thanks Tony!

Ingo

2013-10-26 10:11:00

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULLv2] For x86/mce ... enhanced error logs


* Tony Luck <[email protected]> wrote:

> On Wed, Oct 23, 2013 at 10:42 AM, Luck, Tony <[email protected]> wrote:
> > Replacement for yesterday's pull request - fixes a build bug when CONFIG_SMP=n
> > found by Fengguang's zero-day auto-build robot army. If you pulled (and pushed)
> > that one before finding this in your mailbox - then I can send the one-line
> > patch to be applied on top of yesterday's version.
>
> Well - I might as well put the brown paper bag over my head and go sit
> in the corner :-(
>
> I accidentally applied an extra patch "UEFI, CPER: Move cper.c to a
> more proper place"
> that Chen Gong had sent to me internally but has only been mentioned
> in concept on the mailing list.

Hm, I'm not sure we should move something named after a hardware
feature into lib/. It's not really generic C library functionality,
is it?

> So feel free to pop the 10th patch off and drop it.

Yeah, I did that.

Thanks,

Ingo

2013-10-26 21:34:59

by Tony Luck

[permalink] [raw]
Subject: Re: [GIT PULLv2] For x86/mce ... enhanced error logs


> Hm, I'm not sure we should move something named after a hardware
> feature into lib/. It's not really generic C library functionality,
>
Not a hardware feature. CPER stands for Common Platform Error Record from the UEFI standard. So applicable to three? architectures.

As Chen Gong points out, drivers/acpi isn't the right place ... so if not lib/ ... then where?

-Tony-

2013-10-26 21:36:20

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [GIT PULLv2] For x86/mce ... enhanced error logs

On Sat, Oct 26, 2013 at 02:34:52PM -0700, Tony Luck wrote:
> As Chen Gong points out, drivers/acpi isn't the right place ... so if not lib/ ... then where?

A to be created drivers/efi? There should be a lot more shared EFI code
coming sooner or later, shouldn't it?

2013-10-27 07:00:39

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULLv2] For x86/mce ... enhanced error logs


* Tony Luck <[email protected]> wrote:

> > Hm, I'm not sure we should move something named after a hardware
> > feature into lib/. It's not really generic C library functionality,
>
>
> Not a hardware feature. CPER stands for Common Platform Error Record
> from the UEFI standard. [...]

By all means UEFI can be considered platform dependent at the moment:

comet:~/tip> git grep -i uefi arch/arm/
comet:~/tip> git grep -i uefi arch/arm64/
comet:~/tip> git grep -i uefi arch/powerpc/
comet:~/tip> git grep -i uefi arch/mips/
comet:~/tip>

If a committee says that a name of some standard is 'common platform' does
not make it so. lib/ is mostly kept for mathematical, C-library alike
functionality you see in CS textbooks.

> As Chen Gong points out, drivers/acpi isn't the right place ... so if
> not lib/ ... then where?

drivers/uefi/?

Thanks,

Ingo

2013-10-27 11:01:55

by Borislav Petkov

[permalink] [raw]
Subject: Re: [GIT PULLv2] For x86/mce ... enhanced error logs

On Sun, Oct 27, 2013 at 08:00:35AM +0100, Ingo Molnar wrote:
>
> * Tony Luck <[email protected]> wrote:
>
> > > Hm, I'm not sure we should move something named after a hardware
> > > feature into lib/. It's not really generic C library functionality,
> >
> >
> > Not a hardware feature. CPER stands for Common Platform Error Record
> > from the UEFI standard. [...]
>
> By all means UEFI can be considered platform dependent at the moment:
>
> comet:~/tip> git grep -i uefi arch/arm/
> comet:~/tip> git grep -i uefi arch/arm64/
> comet:~/tip> git grep -i uefi arch/powerpc/
> comet:~/tip> git grep -i uefi arch/mips/
> comet:~/tip>
>
> If a committee says that a name of some standard is 'common platform' does
> not make it so. lib/ is mostly kept for mathematical, C-library alike
> functionality you see in CS textbooks.
>
> > As Chen Gong points out, drivers/acpi isn't the right place ... so if
> > not lib/ ... then where?
>
> drivers/uefi/?

Hmm, we do have drivers/firmware/, even drivers/firmware/efi/ subdir and
since this thing is part of the UEFI spec, we probably should stick it
there...

Matt, heads up^^^.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-10-27 20:22:20

by Matt Fleming

[permalink] [raw]
Subject: Re: [GIT PULLv2] For x86/mce ... enhanced error logs

On Sun, 27 Oct, at 12:01:48PM, Borislav Petkov wrote:
> On Sun, Oct 27, 2013 at 08:00:35AM +0100, Ingo Molnar wrote:
> >
> > * Tony Luck <[email protected]> wrote:
> >
> > > > Hm, I'm not sure we should move something named after a hardware
> > > > feature into lib/. It's not really generic C library functionality,
> > >
> > >
> > > Not a hardware feature. CPER stands for Common Platform Error Record
> > > from the UEFI standard. [...]
> >
> > By all means UEFI can be considered platform dependent at the moment:
> >
> > comet:~/tip> git grep -i uefi arch/arm/
> > comet:~/tip> git grep -i uefi arch/arm64/
> > comet:~/tip> git grep -i uefi arch/powerpc/
> > comet:~/tip> git grep -i uefi arch/mips/
> > comet:~/tip>
> >
> > If a committee says that a name of some standard is 'common platform' does
> > not make it so. lib/ is mostly kept for mathematical, C-library alike
> > functionality you see in CS textbooks.
> >
> > > As Chen Gong points out, drivers/acpi isn't the right place ... so if
> > > not lib/ ... then where?
> >
> > drivers/uefi/?
>
> Hmm, we do have drivers/firmware/, even drivers/firmware/efi/ subdir and
> since this thing is part of the UEFI spec, we probably should stick it
> there...

I've certainly no problem with moving it under drivers/firmware/efi/,
but please don't create a new subdirectory in drivers/ just for this.

--
Matt Fleming, Intel Open Source Technology Center

2013-10-27 20:34:14

by Borislav Petkov

[permalink] [raw]
Subject: Re: [GIT PULLv2] For x86/mce ... enhanced error logs

On Sun, Oct 27, 2013 at 08:22:15PM +0000, Matt Fleming wrote:
> I've certainly no problem with moving it under drivers/firmware/efi/,
> but please don't create a new subdirectory in drivers/ just for this.

Yeah, no - we have an subdir for this - drivers/firmware/efi/ so no need
for the drivers/u?efi thing.

My train of thought here is, we want to put all firmware-related crap
into drivers/firmware/ and since CPER is from the UEFI spec, it should
go into drivers/firmware/efi.

I guess we can apply that same logic to the remaining UEFI sh*tstorm
coming our way.

:-)

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-10-27 20:46:52

by Matt Fleming

[permalink] [raw]
Subject: Re: [GIT PULLv2] For x86/mce ... enhanced error logs

On Sun, 27 Oct, at 09:34:09PM, Borislav Petkov wrote:
> On Sun, Oct 27, 2013 at 08:22:15PM +0000, Matt Fleming wrote:
> > I've certainly no problem with moving it under drivers/firmware/efi/,
> > but please don't create a new subdirectory in drivers/ just for this.
>
> Yeah, no - we have an subdir for this - drivers/firmware/efi/ so no need
> for the drivers/u?efi thing.
>
> My train of thought here is, we want to put all firmware-related crap
> into drivers/firmware/ and since CPER is from the UEFI spec, it should
> go into drivers/firmware/efi.

Makes total sense to me.

> I guess we can apply that same logic to the remaining UEFI sh*tstorm
> coming our way.

;-)

--
Matt Fleming, Intel Open Source Technology Center

2013-10-28 18:53:57

by Tony Luck

[permalink] [raw]
Subject: [PATCH] Move cper.c from drivers/acpi/apei to drivers/firmware/efi

cper.c contains code to decode and print "Common Platform Error Records".
Originally added under drivers/acpi/apei because the only user was in that
same directory - but now we have another consumer, and we shouldn't have
to force CONFIG_ACPI_APEI get access to this code.

Since CPER is defined in the UEFI specification - the logical home for
this code is under drivers/firmware/efi/

Signed-off-by: Tony Luck <[email protected]>

---

Matt: as discussed earlier on the mailing list ... just looking
for your "Acked-by" so this can go on top of the patch series in
the x86/mce branch of the tip tree that already makes a bunch of
changes to cper.c

Based on Chen Gong's original patch that moved cper.c to lib/

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index 252f0e818a49..08eadb4a57cb 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -374,7 +374,9 @@ source "drivers/acpi/apei/Kconfig"

config ACPI_EXTLOG
tristate "Extended Error Log support"
- depends on X86_MCE && ACPI_APEI
+ depends on X86_MCE
+ select EFI
+ select UEFI_CPER
default n
help
Certain usages such as Predictive Failure Analysis (PFA) require
diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
index f0c1ce95a0ec..786294bb682c 100644
--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -2,6 +2,8 @@ config ACPI_APEI
bool "ACPI Platform Error Interface (APEI)"
select MISC_FILESYSTEMS
select PSTORE
+ select EFI
+ select UEFI_CPER
depends on X86
help
APEI allows to report errors (for example from the chipset)
diff --git a/drivers/acpi/apei/Makefile b/drivers/acpi/apei/Makefile
index d1d1bc0a4ee1..5d575a955940 100644
--- a/drivers/acpi/apei/Makefile
+++ b/drivers/acpi/apei/Makefile
@@ -3,4 +3,4 @@ obj-$(CONFIG_ACPI_APEI_GHES) += ghes.o
obj-$(CONFIG_ACPI_APEI_EINJ) += einj.o
obj-$(CONFIG_ACPI_APEI_ERST_DEBUG) += erst-dbg.o

-apei-y := apei-base.o hest.o cper.o erst.o
+apei-y := apei-base.o hest.o erst.o
diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig
index b0fc7c79dfbb..8dfdd2a1cf12 100644
--- a/drivers/firmware/efi/Kconfig
+++ b/drivers/firmware/efi/Kconfig
@@ -36,4 +36,7 @@ config EFI_VARS_PSTORE_DEFAULT_DISABLE
backend for pstore by default. This setting can be overridden
using the efivars module's pstore_disable parameter.

+config UEFI_CPER
+ defbool n
+
endmenu
diff --git a/drivers/firmware/efi/Makefile b/drivers/firmware/efi/Makefile
index 99245ab5a79c..9ba156d3c775 100644
--- a/drivers/firmware/efi/Makefile
+++ b/drivers/firmware/efi/Makefile
@@ -4,3 +4,4 @@
obj-y += efi.o vars.o
obj-$(CONFIG_EFI_VARS) += efivars.o
obj-$(CONFIG_EFI_VARS_PSTORE) += efi-pstore.o
+obj-$(CONFIG_UEFI_CPER) += cper.o
diff --git a/drivers/acpi/apei/cper.c b/drivers/firmware/efi/cper.c
similarity index 100%
rename from drivers/acpi/apei/cper.c
rename to drivers/firmware/efi/cper.c

2013-10-28 20:35:20

by Matt Fleming

[permalink] [raw]
Subject: Re: [PATCH] Move cper.c from drivers/acpi/apei to drivers/firmware/efi

On Mon, 28 Oct, at 11:53:53AM, Luck, Tony wrote:
> cper.c contains code to decode and print "Common Platform Error Records".
> Originally added under drivers/acpi/apei because the only user was in that
> same directory - but now we have another consumer, and we shouldn't have
> to force CONFIG_ACPI_APEI get access to this code.
>
> Since CPER is defined in the UEFI specification - the logical home for
> this code is under drivers/firmware/efi/
>
> Signed-off-by: Tony Luck <[email protected]>
>
> ---
>
> Matt: as discussed earlier on the mailing list ... just looking
> for your "Acked-by" so this can go on top of the patch series in
> the x86/mce branch of the tip tree that already makes a bunch of
> changes to cper.c

You got it.

Acked-by: Matt Fleming <[email protected]>

--
Matt Fleming, Intel Open Source Technology Center

2013-10-29 08:17:07

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] Move cper.c from drivers/acpi/apei to drivers/firmware/efi


* Luck, Tony <[email protected]> wrote:

> cper.c contains code to decode and print "Common Platform Error Records".
> Originally added under drivers/acpi/apei because the only user was in that
> same directory - but now we have another consumer, and we shouldn't have
> to force CONFIG_ACPI_APEI get access to this code.
>
> Since CPER is defined in the UEFI specification - the logical home for
> this code is under drivers/firmware/efi/
>
> Signed-off-by: Tony Luck <[email protected]>

Looks good to me!

Acked-by: Ingo Molnar <[email protected]>

Ingo