2023-11-28 20:33:06

by Smita Koralahalli

[permalink] [raw]
Subject: Re: [PATCH RFC v4 5/6] firmware/efi: Process CXL Component Events

Hi Ira,

I tested this out. Just one correction below to make it work.

On 11/9/2023 2:07 PM, Ira Weiny wrote:
> BIOS can configure memory devices as firmware first. This will send CXL
> events to the firmware instead of the OS. The firmware can then send
> these events to the OS via UEFI.
>
> UEFI v2.10 section N.2.14 defines a Common Platform Error Record (CPER)
> format for CXL Component Events. The format is mostly the same as the
> CXL Common Event Record Format. The difference is a GUID is used in
> the Section Type to identify the event type.
>
> Add EFI support to detect CXL CPER records and call a notifier chain
> with the record data blobs to be processed by the CXL code.
>
> Signed-off-by: Ira Weiny <[email protected]>
>
> ---
> Changes from RFC v3
> [Smita: ensure cper_cxl_event_rec is packed]
>
> Changes from RFC v2
> [djbw: use common event structures]
> [djbw: remove print in core cper code]
> [djbw: export register call as NS_GPL]
> [iweiny: fix 0day issues]
>
> Changes from RFC v1
> [iweiny: use an enum for know record types and skip converting GUID to UUID]
> [iweiny: commit to the UUID not being part of the event record data]
> [iweiny: use defines for GUID definitions]
> ---

[snip]

> diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
> index 6b689e1efc78..733ab2ab8639 100644
> --- a/include/linux/cxl-event.h
> +++ b/include/linux/cxl-event.h
> @@ -108,4 +108,53 @@ struct cxl_event_record_raw {
> union cxl_event event;
> } __packed;
>
> +enum cxl_event_type {
> + CXL_CPER_EVENT_GEN_MEDIA,
> + CXL_CPER_EVENT_DRAM,
> + CXL_CPER_EVENT_MEM_MODULE,
> +};
> +
> +#define CPER_CXL_DEVICE_ID_VALID BIT(0)
> +#define CPER_CXL_DEVICE_SN_VALID BIT(1)
> +#define CPER_CXL_COMP_EVENT_LOG_VALID BIT(2)
> +struct cper_cxl_event_rec {
> + struct {
> + u32 length;
> + u64 validation_bits;
> + struct cper_cxl_event_devid {
> + u16 vendor_id;
> + u16 device_id;
> + u8 func_num;
> + u8 device_num;
> + u8 bus_num;
> + u16 segment_num;
> + u16 slot_num; /* bits 2:0 reserved */
> + u8 reserved;
> + } device_id;
> + struct cper_cxl_event_sn {
> + u32 lower_dw;
> + u32 upper_dw;
> + } dev_serial_num;
> + } hdr;
> +
> + union cxl_event event;
> +} __packed;

__packed attribute just for cper_cxl_event_rec still fails to properly
align structure elements. Looks like, __packed attribute is needed for
all structs (cper_cxl_event_devid and cper_cxl_event_sn) inside
cper_cxl_event_rec.

Seems easier to use global pragma instead.. I could test and obtain the
output as expected using pragma..

Thanks,
Smita

> +
> +struct cxl_cper_notifier_data {
> + enum cxl_event_type event_type;
> + struct cper_cxl_event_rec *rec;
> +};
> +
> +#ifdef CONFIG_UEFI_CPER
> +int register_cxl_cper_notifier(struct notifier_block *nb);
> +void unregister_cxl_cper_notifier(struct notifier_block *nb);
> +#else
> +static inline int register_cxl_cper_notifier(struct notifier_block *nb)
> +{
> + return 0;
> +}
> +
> +static inline void unregister_cxl_cper_notifier(struct notifier_block *nb) { }
> +#endif
> +
> #endif /* _LINUX_CXL_EVENT_H */
>


2023-11-29 14:29:34

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH RFC v4 5/6] firmware/efi: Process CXL Component Events

Smita Koralahalli wrote:
> Hi Ira,
>
> I tested this out. Just one correction below to make it work.
>

[snip]

> > +
> > +#define CPER_CXL_DEVICE_ID_VALID BIT(0)
> > +#define CPER_CXL_DEVICE_SN_VALID BIT(1)
> > +#define CPER_CXL_COMP_EVENT_LOG_VALID BIT(2)
> > +struct cper_cxl_event_rec {
> > + struct {
> > + u32 length;
> > + u64 validation_bits;
> > + struct cper_cxl_event_devid {
> > + u16 vendor_id;
> > + u16 device_id;
> > + u8 func_num;
> > + u8 device_num;
> > + u8 bus_num;
> > + u16 segment_num;
> > + u16 slot_num; /* bits 2:0 reserved */
> > + u8 reserved;
> > + } device_id;
> > + struct cper_cxl_event_sn {
> > + u32 lower_dw;
> > + u32 upper_dw;
> > + } dev_serial_num;
> > + } hdr;
> > +
> > + union cxl_event event;
> > +} __packed;
>
> __packed attribute just for cper_cxl_event_rec still fails to properly
> align structure elements. Looks like, __packed attribute is needed for
> all structs (cper_cxl_event_devid and cper_cxl_event_sn) inside
> cper_cxl_event_rec.
>
> Seems easier to use global pragma instead.. I could test and obtain the
> output as expected using pragma..

I did not know that was acceptable in the kernel but I see you used it in
cper_cxl.h before...

Ok I'll do that and spin again.

Thanks so much for testing this! I was out last week and still don't have
a test environment.

Ira

2023-12-13 17:14:32

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v4 5/6] firmware/efi: Process CXL Component Events

On Wed, 29 Nov 2023 06:28:01 -0800
Ira Weiny <[email protected]> wrote:

> Smita Koralahalli wrote:
> > Hi Ira,
> >
> > I tested this out. Just one correction below to make it work.
> >
>
> [snip]
>
> > > +
> > > +#define CPER_CXL_DEVICE_ID_VALID BIT(0)
> > > +#define CPER_CXL_DEVICE_SN_VALID BIT(1)
> > > +#define CPER_CXL_COMP_EVENT_LOG_VALID BIT(2)
> > > +struct cper_cxl_event_rec {
> > > + struct {
> > > + u32 length;
> > > + u64 validation_bits;
> > > + struct cper_cxl_event_devid {
> > > + u16 vendor_id;
> > > + u16 device_id;
> > > + u8 func_num;
> > > + u8 device_num;
> > > + u8 bus_num;
> > > + u16 segment_num;
> > > + u16 slot_num; /* bits 2:0 reserved */
> > > + u8 reserved;
> > > + } device_id;
> > > + struct cper_cxl_event_sn {
> > > + u32 lower_dw;
> > > + u32 upper_dw;
> > > + } dev_serial_num;
> > > + } hdr;
> > > +
> > > + union cxl_event event;
> > > +} __packed;
> >
> > __packed attribute just for cper_cxl_event_rec still fails to properly
> > align structure elements. Looks like, __packed attribute is needed for
> > all structs (cper_cxl_event_devid and cper_cxl_event_sn) inside
> > cper_cxl_event_rec.
> >
> > Seems easier to use global pragma instead.. I could test and obtain the
> > output as expected using pragma..
>
> I did not know that was acceptable in the kernel but I see you used it in
> cper_cxl.h before...
>
> Ok I'll do that and spin again.
>
> Thanks so much for testing this! I was out last week and still don't have
> a test environment.

Easy to hack into QEMU :) Hmm. I have a CCIX patch set from years ago
somewhere that does similar. Would be easy to repurposed. Looks like
I never published them (just told people to ask if they wanted them :( ).

Anyhow, if useful I can dig them out.

>
> Ira

2023-12-13 22:28:25

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH RFC v4 5/6] firmware/efi: Process CXL Component Events

Jonathan Cameron wrote:
> On Wed, 29 Nov 2023 06:28:01 -0800
> Ira Weiny <[email protected]> wrote:
>

[snip]

> > > __packed attribute just for cper_cxl_event_rec still fails to properly
> > > align structure elements. Looks like, __packed attribute is needed for
> > > all structs (cper_cxl_event_devid and cper_cxl_event_sn) inside
> > > cper_cxl_event_rec.
> > >
> > > Seems easier to use global pragma instead.. I could test and obtain the
> > > output as expected using pragma..
> >
> > I did not know that was acceptable in the kernel but I see you used it in
> > cper_cxl.h before...
> >
> > Ok I'll do that and spin again.
> >
> > Thanks so much for testing this! I was out last week and still don't have
> > a test environment.
>
> Easy to hack into QEMU :) Hmm. I have a CCIX patch set from years ago
> somewhere that does similar. Would be easy to repurposed. Looks like
> I never published them (just told people to ask if they wanted them :( ).
>
> Anyhow, if useful I can dig them out.

If you have a branch with them with a somewhat latest qemu that could work
too.

Ira

2023-12-19 17:21:46

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v4 5/6] firmware/efi: Process CXL Component Events

On Wed, 13 Dec 2023 14:28:03 -0800
Ira Weiny <[email protected]> wrote:

> Jonathan Cameron wrote:
> > On Wed, 29 Nov 2023 06:28:01 -0800
> > Ira Weiny <[email protected]> wrote:
> >
>
> [snip]
>
> > > > __packed attribute just for cper_cxl_event_rec still fails to properly
> > > > align structure elements. Looks like, __packed attribute is needed for
> > > > all structs (cper_cxl_event_devid and cper_cxl_event_sn) inside
> > > > cper_cxl_event_rec.
> > > >
> > > > Seems easier to use global pragma instead.. I could test and obtain the
> > > > output as expected using pragma..
> > >
> > > I did not know that was acceptable in the kernel but I see you used it in
> > > cper_cxl.h before...
> > >
> > > Ok I'll do that and spin again.
> > >
> > > Thanks so much for testing this! I was out last week and still don't have
> > > a test environment.
> >
> > Easy to hack into QEMU :) Hmm. I have a CCIX patch set from years ago
> > somewhere that does similar. Would be easy to repurposed. Looks like
> > I never published them (just told people to ask if they wanted them :( ).
> >
> > Anyhow, if useful I can dig them out.
>
> If you have a branch with them with a somewhat latest qemu that could work
> too.
They are ancient and based on GHES emulation that got reworked before being
merged. I had a quick go at a forwards port but this is a bigger job than
I expected. May be a little while :(

Jonathan

>
> Ira


2023-12-20 23:48:51

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH RFC v4 5/6] firmware/efi: Process CXL Component Events

Jonathan Cameron wrote:
> On Wed, 13 Dec 2023 14:28:03 -0800
> Ira Weiny <[email protected]> wrote:
>
> > Jonathan Cameron wrote:
> > > On Wed, 29 Nov 2023 06:28:01 -0800
> > > Ira Weiny <[email protected]> wrote:
> > >
> >
> > [snip]
> >
> > > > > __packed attribute just for cper_cxl_event_rec still fails to properly
> > > > > align structure elements. Looks like, __packed attribute is needed for
> > > > > all structs (cper_cxl_event_devid and cper_cxl_event_sn) inside
> > > > > cper_cxl_event_rec.
> > > > >
> > > > > Seems easier to use global pragma instead.. I could test and obtain the
> > > > > output as expected using pragma..
> > > >
> > > > I did not know that was acceptable in the kernel but I see you used it in
> > > > cper_cxl.h before...
> > > >
> > > > Ok I'll do that and spin again.
> > > >
> > > > Thanks so much for testing this! I was out last week and still don't have
> > > > a test environment.
> > >
> > > Easy to hack into QEMU :) Hmm. I have a CCIX patch set from years ago
> > > somewhere that does similar. Would be easy to repurposed. Looks like
> > > I never published them (just told people to ask if they wanted them :( ).
> > >
> > > Anyhow, if useful I can dig them out.
> >
> > If you have a branch with them with a somewhat latest qemu that could work
> > too.
> They are ancient and based on GHES emulation that got reworked before being
> merged. I had a quick go at a forwards port but this is a bigger job than
> I expected. May be a little while :(
>

Let's not waste the time on it then. Dan and I would like to get this
merged in 6.8 if possible.

Ira

2024-01-03 17:51:08

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v4 5/6] firmware/efi: Process CXL Component Events

On Tue, 19 Dec 2023 17:12:10 +0000
Jonathan Cameron <[email protected]> wrote:

> On Wed, 13 Dec 2023 14:28:03 -0800
> Ira Weiny <[email protected]> wrote:
>
> > Jonathan Cameron wrote:
> > > On Wed, 29 Nov 2023 06:28:01 -0800
> > > Ira Weiny <[email protected]> wrote:
> > >
> >
> > [snip]
> >
> > > > > __packed attribute just for cper_cxl_event_rec still fails to properly
> > > > > align structure elements. Looks like, __packed attribute is needed for
> > > > > all structs (cper_cxl_event_devid and cper_cxl_event_sn) inside
> > > > > cper_cxl_event_rec.
> > > > >
> > > > > Seems easier to use global pragma instead.. I could test and obtain the
> > > > > output as expected using pragma..
> > > >
> > > > I did not know that was acceptable in the kernel but I see you used it in
> > > > cper_cxl.h before...
> > > >
> > > > Ok I'll do that and spin again.
> > > >
> > > > Thanks so much for testing this! I was out last week and still don't have
> > > > a test environment.
> > >
> > > Easy to hack into QEMU :) Hmm. I have a CCIX patch set from years ago
> > > somewhere that does similar. Would be easy to repurposed. Looks like
> > > I never published them (just told people to ask if they wanted them :( ).
> > >
> > > Anyhow, if useful I can dig them out.
> >
> > If you have a branch with them with a somewhat latest qemu that could work
> > too.
> They are ancient and based on GHES emulation that got reworked before being
> merged. I had a quick go at a forwards port but this is a bigger job than
> I expected. May be a little while :(

Working again (embarrassingly I had the error source numbers reversed due
to a merge resolution that went wrong which took me a day to find). I'll flesh
out the injection but it will basically look like normal error injection
via qmp (json records) with a bonus parameter to stick them out as via
GHESv2 / CPER rather than AER internal error. I've not figured out how
to wire HEST up for x86 emulation yet though so it's ARM virt only for now.
(HEST isn't created for x86 qemu machines whereas it is for arm virt with ras=on)
Obviously that emulation is wrong in all sorts of ways as I should be dealing
with firmware/OSPM negotiation and setting the messaging up etc but meh
- it works for exercising the code :)

On the plus side I get nice trace points using your series and Smita's one.
Quite a bit of data is 0s at the moment as I'm lazy and it's the end of the day
here - I'll fix that up later this week as I can see 'everything' in QEMU
and the register values etc are already handled via the native injection paths.

Jonathan

>
> Jonathan
>
> >
> > Ira
>
>


2024-01-03 20:41:07

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH RFC v4 5/6] firmware/efi: Process CXL Component Events

Jonathan Cameron wrote:
> On Tue, 19 Dec 2023 17:12:10 +0000
> Jonathan Cameron <[email protected]> wrote:
>
> > On Wed, 13 Dec 2023 14:28:03 -0800
> > Ira Weiny <[email protected]> wrote:
> >
> > > Jonathan Cameron wrote:
> > > > On Wed, 29 Nov 2023 06:28:01 -0800
> > > > Ira Weiny <[email protected]> wrote:
> > > >
> > >
> > > [snip]
> > >
> > > > > > __packed attribute just for cper_cxl_event_rec still fails to properly
> > > > > > align structure elements. Looks like, __packed attribute is needed for
> > > > > > all structs (cper_cxl_event_devid and cper_cxl_event_sn) inside
> > > > > > cper_cxl_event_rec.
> > > > > >
> > > > > > Seems easier to use global pragma instead.. I could test and obtain the
> > > > > > output as expected using pragma..
> > > > >
> > > > > I did not know that was acceptable in the kernel but I see you used it in
> > > > > cper_cxl.h before...
> > > > >
> > > > > Ok I'll do that and spin again.
> > > > >
> > > > > Thanks so much for testing this! I was out last week and still don't have
> > > > > a test environment.
> > > >
> > > > Easy to hack into QEMU :) Hmm. I have a CCIX patch set from years ago
> > > > somewhere that does similar. Would be easy to repurposed. Looks like
> > > > I never published them (just told people to ask if they wanted them :( ).
> > > >
> > > > Anyhow, if useful I can dig them out.
> > >
> > > If you have a branch with them with a somewhat latest qemu that could work
> > > too.
> > They are ancient and based on GHES emulation that got reworked before being
> > merged. I had a quick go at a forwards port but this is a bigger job than
> > I expected. May be a little while :(
>
> Working again (embarrassingly I had the error source numbers reversed due
> to a merge resolution that went wrong which took me a day to find). I'll flesh
> out the injection but it will basically look like normal error injection
> via qmp (json records) with a bonus parameter to stick them out as via
> GHESv2 / CPER rather than AER internal error. I've not figured out how
> to wire HEST up for x86 emulation yet though so it's ARM virt only for now.
> (HEST isn't created for x86 qemu machines whereas it is for arm virt with ras=on)
> Obviously that emulation is wrong in all sorts of ways as I should be dealing
> with firmware/OSPM negotiation and setting the messaging up etc but meh
> - it works for exercising the code :)
>
> On the plus side I get nice trace points using your series and Smita's one.
> Quite a bit of data is 0s at the moment as I'm lazy and it's the end of the day
> here - I'll fix that up later this week as I can see 'everything' in QEMU
> and the register values etc are already handled via the native injection paths.

Thanks for the testing!
Ira