2017-08-10 17:43:23

by Dongjiu Geng

[permalink] [raw]
Subject: [PATCH] acpi: apei: fix the wrongly parse generic error status block

The revision 0x300 generic error data entry is different with the old
version. when ghes_do_proc traverses to get the data entry, it does not
consider this difference. so when error status block has revision 0x300
data entry, it will have issue.

Signed-off-by: Dongjiu Geng <[email protected]>
---
drivers/acpi/apei/apei-internal.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/apei/apei-internal.h b/drivers/acpi/apei/apei-internal.h
index 6e9f14c0a71b..6491f1c4a96e 100644
--- a/drivers/acpi/apei/apei-internal.h
+++ b/drivers/acpi/apei/apei-internal.h
@@ -122,8 +122,8 @@ struct dentry *apei_get_debugfs_dir(void);

#define apei_estatus_for_each_section(estatus, section) \
for (section = (struct acpi_hest_generic_data *)(estatus + 1); \
- (void *)section - (void *)estatus < estatus->data_length; \
- section = (void *)(section+1) + section->error_data_length)
+ (void *)section - (void *)(estatus + 1) < estatus->data_length; \
+ section = acpi_hest_get_next(section))

static inline u32 cper_estatus_len(struct acpi_hest_generic_status *estatus)
{
--
2.14.0


2017-08-10 17:48:04

by Tyler Baicar

[permalink] [raw]
Subject: Re: [PATCH] acpi: apei: fix the wrongly parse generic error status block

Hello,

I have already posted a patch fixing this. Please see:

https://lkml.org/lkml/2017/8/3/824

This makes the loop identical to the CPER code which prints the estatus
blocks to the kernel logs.

Thanks,

Tyler


On 8/10/2017 12:06 PM, Dongjiu Geng wrote:
> The revision 0x300 generic error data entry is different with the old
> version. when ghes_do_proc traverses to get the data entry, it does not
> consider this difference. so when error status block has revision 0x300
> data entry, it will have issue.
>
> Signed-off-by: Dongjiu Geng <[email protected]>
> ---
> drivers/acpi/apei/apei-internal.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/acpi/apei/apei-internal.h b/drivers/acpi/apei/apei-internal.h
> index 6e9f14c0a71b..6491f1c4a96e 100644
> --- a/drivers/acpi/apei/apei-internal.h
> +++ b/drivers/acpi/apei/apei-internal.h
> @@ -122,8 +122,8 @@ struct dentry *apei_get_debugfs_dir(void);
>
> #define apei_estatus_for_each_section(estatus, section) \
> for (section = (struct acpi_hest_generic_data *)(estatus + 1); \
> - (void *)section - (void *)estatus < estatus->data_length; \
> - section = (void *)(section+1) + section->error_data_length)
> + (void *)section - (void *)(estatus + 1) < estatus->data_length; \
> + section = acpi_hest_get_next(section))
>
> static inline u32 cper_estatus_len(struct acpi_hest_generic_status *estatus)
> {

--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

2017-08-10 21:32:24

by Dongjiu Geng

[permalink] [raw]
Subject: Re: [PATCH] acpi: apei: fix the wrongly parse generic error status block

Hello,

sorry, I do not see that. Just know I have reviewed your modification, may be my change can be simpleness and reserve the macro of apei_estatus_for_each_section
can be used by other place to avoid duplicated code, such as prints the estatus blocks.

On 2017/8/11 1:48, Baicar, Tyler wrote:
> Hello,
>
> I have already posted a patch fixing this. Please see:
>
> https://lkml.org/lkml/2017/8/3/824
>
> This makes the loop identical to the CPER code which prints the estatus blocks to the kernel logs.
>
> Thanks,
>
> Tyler
>
>
> On 8/10/2017 12:06 PM, Dongjiu Geng wrote:
>> The revision 0x300 generic error data entry is different with the old
>> version. when ghes_do_proc traverses to get the data entry, it does not
>> consider this difference. so when error status block has revision 0x300
>> data entry, it will have issue.
>>
>> Signed-off-by: Dongjiu Geng <[email protected]>
>> ---
>> drivers/acpi/apei/apei-internal.h | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/acpi/apei/apei-internal.h b/drivers/acpi/apei/apei-internal.h
>> index 6e9f14c0a71b..6491f1c4a96e 100644
>> --- a/drivers/acpi/apei/apei-internal.h
>> +++ b/drivers/acpi/apei/apei-internal.h
>> @@ -122,8 +122,8 @@ struct dentry *apei_get_debugfs_dir(void);
>> #define apei_estatus_for_each_section(estatus, section) \
>> for (section = (struct acpi_hest_generic_data *)(estatus + 1); \
>> - (void *)section - (void *)estatus < estatus->data_length; \
>> - section = (void *)(section+1) + section->error_data_length)
>> + (void *)section - (void *)(estatus + 1) < estatus->data_length; \
>> + section = acpi_hest_get_next(section))
>> static inline u32 cper_estatus_len(struct acpi_hest_generic_status *estatus)
>> {
>

2017-08-10 21:38:32

by Dongjiu Geng

[permalink] [raw]
Subject: Re: [PATCH] acpi: apei: fix the wrongly parse generic error status block

may be directly remove the macro apei_estatus_for_each_section is not better, if other place code also
needs to iterate through the GHES estatus blocks, it will be repeated written again.


On 2017/8/11 5:31, gengdongjiu wrote:
> Hello,
>
> sorry, I do not see that. Just know I have reviewed your modification, may be my change can be simpleness and reserve the macro of apei_estatus_for_each_section
> can be used by other place to avoid duplicated code, such as prints the estatus blocks.
>
> On 2017/8/11 1:48, Baicar, Tyler wrote:
>> Hello,
>>
>> I have already posted a patch fixing this. Please see:
>>
>> https://lkml.org/lkml/2017/8/3/824
>>
>> This makes the loop identical to the CPER code which prints the estatus blocks to the kernel logs.
>>
>> Thanks,
>>
>> Tyler
>>
>>
>> On 8/10/2017 12:06 PM, Dongjiu Geng wrote:
>>> The revision 0x300 generic error data entry is different with the old
>>> version. when ghes_do_proc traverses to get the data entry, it does not
>>> consider this difference. so when error status block has revision 0x300
>>> data entry, it will have issue.
>>>
>>> Signed-off-by: Dongjiu Geng <[email protected]>
>>> ---
>>> drivers/acpi/apei/apei-internal.h | 4 ++--
>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/acpi/apei/apei-internal.h b/drivers/acpi/apei/apei-internal.h
>>> index 6e9f14c0a71b..6491f1c4a96e 100644
>>> --- a/drivers/acpi/apei/apei-internal.h
>>> +++ b/drivers/acpi/apei/apei-internal.h
>>> @@ -122,8 +122,8 @@ struct dentry *apei_get_debugfs_dir(void);
>>> #define apei_estatus_for_each_section(estatus, section) \
>>> for (section = (struct acpi_hest_generic_data *)(estatus + 1); \
>>> - (void *)section - (void *)estatus < estatus->data_length; \
>>> - section = (void *)(section+1) + section->error_data_length)
>>> + (void *)section - (void *)(estatus + 1) < estatus->data_length; \
>>> + section = acpi_hest_get_next(section))
>>> static inline u32 cper_estatus_len(struct acpi_hest_generic_status *estatus)
>>> {
>>

2017-08-11 13:20:20

by Tyler Baicar

[permalink] [raw]
Subject: Re: [PATCH] acpi: apei: fix the wrongly parse generic error status block

I removed the apei_estatus_for_each_section because it was only being
used in this one spot even though several other parts of the code do the
same iteration (it is done several times in the CPER code). I made this
iteration match the CPER iterations because the CPER iterations are
verifying that the structures are all valid and lengths are correct. If
those checks are being done this way, it makes the most sense to mimic
that iteration here when calling into EDAC and triggering trace events.

Thanks,

Tyler


On 8/10/2017 3:37 PM, gengdongjiu wrote:
> may be directly remove the macro apei_estatus_for_each_section is not better, if other place code also
> needs to iterate through the GHES estatus blocks, it will be repeated written again.
>
>
> On 2017/8/11 5:31, gengdongjiu wrote:
>> Hello,
>>
>> sorry, I do not see that. Just know I have reviewed your modification, may be my change can be simpleness and reserve the macro of apei_estatus_for_each_section
>> can be used by other place to avoid duplicated code, such as prints the estatus blocks.
>>
>> On 2017/8/11 1:48, Baicar, Tyler wrote:
>>> Hello,
>>>
>>> I have already posted a patch fixing this. Please see:
>>>
>>> https://lkml.org/lkml/2017/8/3/824
>>>
>>> This makes the loop identical to the CPER code which prints the estatus blocks to the kernel logs.
>>>
>>> Thanks,
>>>
>>> Tyler
>>>
>>>
>>> On 8/10/2017 12:06 PM, Dongjiu Geng wrote:
>>>> The revision 0x300 generic error data entry is different with the old
>>>> version. when ghes_do_proc traverses to get the data entry, it does not
>>>> consider this difference. so when error status block has revision 0x300
>>>> data entry, it will have issue.
>>>>
>>>> Signed-off-by: Dongjiu Geng <[email protected]>
>>>> ---
>>>> drivers/acpi/apei/apei-internal.h | 4 ++--
>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/acpi/apei/apei-internal.h b/drivers/acpi/apei/apei-internal.h
>>>> index 6e9f14c0a71b..6491f1c4a96e 100644
>>>> --- a/drivers/acpi/apei/apei-internal.h
>>>> +++ b/drivers/acpi/apei/apei-internal.h
>>>> @@ -122,8 +122,8 @@ struct dentry *apei_get_debugfs_dir(void);
>>>> #define apei_estatus_for_each_section(estatus, section) \
>>>> for (section = (struct acpi_hest_generic_data *)(estatus + 1); \
>>>> - (void *)section - (void *)estatus < estatus->data_length; \
>>>> - section = (void *)(section+1) + section->error_data_length)
>>>> + (void *)section - (void *)(estatus + 1) < estatus->data_length; \
>>>> + section = acpi_hest_get_next(section))
>>>> static inline u32 cper_estatus_len(struct acpi_hest_generic_status *estatus)
>>>> {

--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

2017-08-11 23:41:52

by gengdongjiu

[permalink] [raw]
Subject: Re: [PATCH] acpi: apei: fix the wrongly parse generic error status block

2017-08-11 21:19 GMT+08:00 Baicar, Tyler <[email protected]>:
> I removed the apei_estatus_for_each_section because it was only being used
> in this one spot even though several other parts of the code do the same
> iteration (it is done several times in the CPER code). I made this iteration
> match the CPER iterations because the CPER iterations are verifying that the
> structures are all valid and lengths are correct. If those checks are being
> done this way, it makes the most sense to mimic that iteration here when
> calling into EDAC and triggering trace events.

I think the macro includes the verification for the structures and
lengths correction, it it does not correct, it will breadk the loop.
I do not see your modifcation does some special validation, it almost
smilar with the macro does.
in all the code there are three functions to do the iteration.
ghes_do_proc/cper_estatus_print/cper_estatus_check
the cper_estatus_check function is especial, because its purpose is
to validate CPER, so it will check every length. but not all
function's purpose is to check, for example cper_estatus_print, as
shown below, so we can use this macro to clear the code. Now we can
see there are two function use it, but in the future, if want to
iterate CPER, we can use the macro if it does not want to do special
thing.

#define apei_estatus_for_each_section(estatus, section) \
for (section = (struct acpi_hest_generic_data *)(estatus + 1); \
(void *)section - (void *)(estatus + 1) < estatus->data_length; \
------> here it will check whether length is valid, if not, it will
break the loop
section = acpi_hest_get_next(section))


Original code:
void cper_estatus_print(const char *pfx,
const struct acpi_hest_generic_status *estatus)
{
struct acpi_hest_generic_data *gdata;
unsigned int data_len;
int sec_no = 0;
char newpfx[64];
__u16 severity;
severity = estatus->error_severity;
if (severity == CPER_SEV_CORRECTED)
printk("%s%s\n", pfx,
"It has been corrected by h/w "
"and requires no further action");
printk("%s""event severity: %s\n", pfx, cper_severity_str(severity));
data_len = estatus->data_length;
gdata = (struct acpi_hest_generic_data *)(estatus + 1);
snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
while (data_len >= acpi_hest_get_size(gdata)) {
cper_estatus_print_section(newpfx, gdata, sec_no);
data_len -= acpi_hest_get_record_size(gdata);
gdata = acpi_hest_get_next(gdata);
sec_no++;
}
}

Can change to:
void cper_estatus_print(const char *pfx,
const struct acpi_hest_generic_status *estatus)
{
struct acpi_hest_generic_data *gdata;
int sec_no = 0;
char newpfx[64];
__u16 severity;
severity = estatus->error_severity;
if (severity == CPER_SEV_CORRECTED)
printk("%s%s\n", pfx,
"It has been corrected by h/w "
"and requires no further action");
printk("%s""event severity: %s\n", pfx, cper_severity_str(severity));
snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
apei_estatus_for_each_section {
cper_estatus_print_section(newpfx, gdata, sec_no);
sec_no++;
}
}



>
> Thanks,
>
> Tyler
>
>
> On 8/10/2017 3:37 PM, gengdongjiu wrote:
>>
>> may be directly remove the macro apei_estatus_for_each_section is not
>> better, if other place code also
>> needs to iterate through the GHES estatus blocks, it will be repeated
>> written again.
>>
>>
>> On 2017/8/11 5:31, gengdongjiu wrote:
>>>
>>> Hello,
>>>
>>> sorry, I do not see that. Just know I have reviewed your
>>> modification, may be my change can be simpleness and reserve the macro of
>>> apei_estatus_for_each_section
>>> can be used by other place to avoid duplicated code, such as prints the
>>> estatus blocks.
>>>
>>> On 2017/8/11 1:48, Baicar, Tyler wrote:
>>>>
>>>> Hello,
>>>>
>>>> I have already posted a patch fixing this. Please see:
>>>>
>>>> https://lkml.org/lkml/2017/8/3/824
>>>>
>>>> This makes the loop identical to the CPER code which prints the estatus
>>>> blocks to the kernel logs.
>>>>
>>>> Thanks,
>>>>
>>>> Tyler
>>>>
>>>>
>>>> On 8/10/2017 12:06 PM, Dongjiu Geng wrote:
>>>>>
>>>>> The revision 0x300 generic error data entry is different with the old
>>>>> version. when ghes_do_proc traverses to get the data entry, it does not
>>>>> consider this difference. so when error status block has revision 0x300
>>>>> data entry, it will have issue.
>>>>>
>>>>> Signed-off-by: Dongjiu Geng <[email protected]>
>>>>> ---
>>>>> drivers/acpi/apei/apei-internal.h | 4 ++--
>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/acpi/apei/apei-internal.h
>>>>> b/drivers/acpi/apei/apei-internal.h
>>>>> index 6e9f14c0a71b..6491f1c4a96e 100644
>>>>> --- a/drivers/acpi/apei/apei-internal.h
>>>>> +++ b/drivers/acpi/apei/apei-internal.h
>>>>> @@ -122,8 +122,8 @@ struct dentry *apei_get_debugfs_dir(void);
>>>>> #define apei_estatus_for_each_section(estatus, section)
>>>>> \
>>>>> for (section = (struct acpi_hest_generic_data *)(estatus + 1);
>>>>> \
>>>>> - (void *)section - (void *)estatus < estatus->data_length;
>>>>> \
>>>>> - section = (void *)(section+1) + section->error_data_length)
>>>>> + (void *)section - (void *)(estatus + 1) <
>>>>> estatus->data_length; \
>>>>> + section = acpi_hest_get_next(section))
>>>>> static inline u32 cper_estatus_len(struct acpi_hest_generic_status
>>>>> *estatus)
>>>>> {
>
>
> --
> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
> Technologies, Inc.
> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project.
>

2017-08-14 14:05:02

by Tyler Baicar

[permalink] [raw]
Subject: Re: [PATCH] acpi: apei: fix the wrongly parse generic error status block

On 8/11/2017 5:41 PM, gengdongjiu wrote:
> 2017-08-11 21:19 GMT+08:00 Baicar, Tyler <[email protected]>:
>> I removed the apei_estatus_for_each_section because it was only being used
>> in this one spot even though several other parts of the code do the same
>> iteration (it is done several times in the CPER code). I made this iteration
>> match the CPER iterations because the CPER iterations are verifying that the
>> structures are all valid and lengths are correct. If those checks are being
>> done this way, it makes the most sense to mimic that iteration here when
>> calling into EDAC and triggering trace events.
> I think the macro includes the verification for the structures and
> lengths correction, it it does not correct, it will breadk the loop.
> I do not see your modifcation does some special validation, it almost
> smilar with the macro does.
> in all the code there are three functions to do the iteration.
> ghes_do_proc/cper_estatus_print/cper_estatus_check
> the cper_estatus_check function is especial, because its purpose is
> to validate CPER, so it will check every length. but not all
> function's purpose is to check, for example cper_estatus_print, as
> shown below, so we can use this macro to clear the code. Now we can
> see there are two function use it, but in the future, if want to
> iterate CPER, we can use the macro if it does not want to do special
> thing.
>
> #define apei_estatus_for_each_section(estatus, section) \
> for (section = (struct acpi_hest_generic_data *)(estatus + 1); \
> (void *)section - (void *)(estatus + 1) < estatus->data_length; \
> ------> here it will check whether length is valid, if not, it will
> break the loop
> section = acpi_hest_get_next(section))
>
>
> Original code:
> void cper_estatus_print(const char *pfx,
> const struct acpi_hest_generic_status *estatus)
> {
> struct acpi_hest_generic_data *gdata;
> unsigned int data_len;
> int sec_no = 0;
> char newpfx[64];
> __u16 severity;
> severity = estatus->error_severity;
> if (severity == CPER_SEV_CORRECTED)
> printk("%s%s\n", pfx,
> "It has been corrected by h/w "
> "and requires no further action");
> printk("%s""event severity: %s\n", pfx, cper_severity_str(severity));
> data_len = estatus->data_length;
> gdata = (struct acpi_hest_generic_data *)(estatus + 1);
> snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
> while (data_len >= acpi_hest_get_size(gdata)) {
> cper_estatus_print_section(newpfx, gdata, sec_no);
> data_len -= acpi_hest_get_record_size(gdata);
> gdata = acpi_hest_get_next(gdata);
> sec_no++;
> }
> }
>
> Can change to:
> void cper_estatus_print(const char *pfx,
> const struct acpi_hest_generic_status *estatus)
> {
> struct acpi_hest_generic_data *gdata;
> int sec_no = 0;
> char newpfx[64];
> __u16 severity;
> severity = estatus->error_severity;
> if (severity == CPER_SEV_CORRECTED)
> printk("%s%s\n", pfx,
> "It has been corrected by h/w "
> "and requires no further action");
> printk("%s""event severity: %s\n", pfx, cper_severity_str(severity));
> snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
> apei_estatus_for_each_section {
> cper_estatus_print_section(newpfx, gdata, sec_no);
> sec_no++;
> }
> }
This change works too, I think it just makes sense to have the
iterations in the CPER and GHES code match. Do you want to add a patch
to your patch here to change the CPER code as well? If so, I'll wait for
that and test it out.

Thanks,
Tyler

2017-08-15 01:15:01

by Dongjiu Geng

[permalink] [raw]
Subject: Re: [PATCH] acpi: apei: fix the wrongly parse generic error status block

Hi,Tyler,
Yes, I will add a patch based on it, thanks a lot that you will also have a test.


On 2017/8/14 22:04, Baicar, Tyler wrote:
> This change works too, I think it just makes sense to have the iterations in the CPER and GHES code match. Do you want to add a patch to your patch here to change the CPER code as well? If so, I'll wait for that and test it out.
>
> Thanks,
> Tyler