2020-11-18 08:45:59

by Zhen Lei

[permalink] [raw]
Subject: [PATCH 1/1] ACPI/nfit: correct the badrange to be reported in nfit_handle_mce()

The badrange to be reported should always cover mce->addr.

Signed-off-by: Zhen Lei <[email protected]>
---
drivers/acpi/nfit/mce.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
index ee8d9973f60b..053e719c7bea 100644
--- a/drivers/acpi/nfit/mce.c
+++ b/drivers/acpi/nfit/mce.c
@@ -63,7 +63,7 @@ static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,

/* If this fails due to an -ENOMEM, there is little we can do */
nvdimm_bus_add_badrange(acpi_desc->nvdimm_bus,
- ALIGN(mce->addr, L1_CACHE_BYTES),
+ ALIGN_DOWN(mce->addr, L1_CACHE_BYTES),
L1_CACHE_BYTES);
nvdimm_region_notify(nfit_spa->nd_region,
NVDIMM_REVALIDATE_POISON);
--
2.26.0.106.g9fadedd



2020-11-18 08:58:29

by Zhen Lei

[permalink] [raw]
Subject: Re: [PATCH 1/1] ACPI/nfit: correct the badrange to be reported in nfit_handle_mce()



On 2020/11/18 16:41, Zhen Lei wrote:
> The badrange to be reported should always cover mce->addr.
Maybe I should change this description to:
Make sure the badrange to be reported can always cover mce->addr.

>
> Signed-off-by: Zhen Lei <[email protected]>
> ---
> drivers/acpi/nfit/mce.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
> index ee8d9973f60b..053e719c7bea 100644
> --- a/drivers/acpi/nfit/mce.c
> +++ b/drivers/acpi/nfit/mce.c
> @@ -63,7 +63,7 @@ static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
>
> /* If this fails due to an -ENOMEM, there is little we can do */
> nvdimm_bus_add_badrange(acpi_desc->nvdimm_bus,
> - ALIGN(mce->addr, L1_CACHE_BYTES),
> + ALIGN_DOWN(mce->addr, L1_CACHE_BYTES),
> L1_CACHE_BYTES);
> nvdimm_region_notify(nfit_spa->nd_region,
> NVDIMM_REVALIDATE_POISON);
>

2020-11-18 19:21:05

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 1/1] ACPI/nfit: correct the badrange to be reported in nfit_handle_mce()

On Wed, Nov 18, 2020 at 12:55 AM Leizhen (ThunderTown)
<[email protected]> wrote:
>
>
>
> On 2020/11/18 16:41, Zhen Lei wrote:
> > The badrange to be reported should always cover mce->addr.
> Maybe I should change this description to:
> Make sure the badrange to be reported can always cover mce->addr.

Yes, I like that better. Can you also say a bit more about how you
found this bug? As far as I can see this looks like -stable material.

2020-11-18 20:53:38

by Verma, Vishal L

[permalink] [raw]
Subject: Re: [PATCH 1/1] ACPI/nfit: correct the badrange to be reported in nfit_handle_mce()

On Wed, 2020-11-18 at 16:41 +0800, Zhen Lei wrote:
> The badrange to be reported should always cover mce->addr.
>
> Signed-off-by: Zhen Lei <[email protected]>
> ---
> drivers/acpi/nfit/mce.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)

Ah good find, agreed with Dan that this is stable material.
Minor nit, I'd recommend rewording the subject line to something like:

"acpi/nfit: fix badrange insertion in nfit_handle_mce()"

Otherwise, looks good to me.
Reviewed-by: Vishal Verma <[email protected]>

>
> diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
> index ee8d9973f60b..053e719c7bea 100644
> --- a/drivers/acpi/nfit/mce.c
> +++ b/drivers/acpi/nfit/mce.c
> @@ -63,7 +63,7 @@ static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
>
> /* If this fails due to an -ENOMEM, there is little we can do */
> nvdimm_bus_add_badrange(acpi_desc->nvdimm_bus,
> - ALIGN(mce->addr, L1_CACHE_BYTES),
> + ALIGN_DOWN(mce->addr, L1_CACHE_BYTES),
> L1_CACHE_BYTES);
> nvdimm_region_notify(nfit_spa->nd_region,
> NVDIMM_REVALIDATE_POISON);

2020-11-19 01:58:11

by Zhen Lei

[permalink] [raw]
Subject: Re: [PATCH 1/1] ACPI/nfit: correct the badrange to be reported in nfit_handle_mce()



On 2020/11/19 3:16, Dan Williams wrote:
> On Wed, Nov 18, 2020 at 12:55 AM Leizhen (ThunderTown)
> <[email protected]> wrote:
>>
>>
>>
>> On 2020/11/18 16:41, Zhen Lei wrote:
>>> The badrange to be reported should always cover mce->addr.
>> Maybe I should change this description to:
>> Make sure the badrange to be reported can always cover mce->addr.
>
> Yes, I like that better. Can you also say a bit more about how you
> found this bug? As far as I can see this looks like -stable material.

I found it when I was learning the code. I'm looking at it carefully.

>
>

2020-11-19 01:58:31

by Zhen Lei

[permalink] [raw]
Subject: Re: [PATCH 1/1] ACPI/nfit: correct the badrange to be reported in nfit_handle_mce()



On 2020/11/19 4:50, Verma, Vishal L wrote:
> On Wed, 2020-11-18 at 16:41 +0800, Zhen Lei wrote:
>> The badrange to be reported should always cover mce->addr.
>>
>> Signed-off-by: Zhen Lei <[email protected]>
>> ---
>> drivers/acpi/nfit/mce.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> Ah good find, agreed with Dan that this is stable material.
> Minor nit, I'd recommend rewording the subject line to something like:
>
> "acpi/nfit: fix badrange insertion in nfit_handle_mce()"

OK, I will send v2.

>
> Otherwise, looks good to me.
> Reviewed-by: Vishal Verma <[email protected]>
>
>>
>> diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
>> index ee8d9973f60b..053e719c7bea 100644
>> --- a/drivers/acpi/nfit/mce.c
>> +++ b/drivers/acpi/nfit/mce.c
>> @@ -63,7 +63,7 @@ static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
>>
>> /* If this fails due to an -ENOMEM, there is little we can do */
>> nvdimm_bus_add_badrange(acpi_desc->nvdimm_bus,
>> - ALIGN(mce->addr, L1_CACHE_BYTES),
>> + ALIGN_DOWN(mce->addr, L1_CACHE_BYTES),
>> L1_CACHE_BYTES);
>> nvdimm_region_notify(nfit_spa->nd_region,
>> NVDIMM_REVALIDATE_POISON);
>

2020-11-19 02:10:41

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 1/1] ACPI/nfit: correct the badrange to be reported in nfit_handle_mce()

On Wed, Nov 18, 2020 at 5:53 PM Leizhen (ThunderTown)
<[email protected]> wrote:
>
>
>
> On 2020/11/19 3:16, Dan Williams wrote:
> > On Wed, Nov 18, 2020 at 12:55 AM Leizhen (ThunderTown)
> > <[email protected]> wrote:
> >>
> >>
> >>
> >> On 2020/11/18 16:41, Zhen Lei wrote:
> >>> The badrange to be reported should always cover mce->addr.
> >> Maybe I should change this description to:
> >> Make sure the badrange to be reported can always cover mce->addr.
> >
> > Yes, I like that better. Can you also say a bit more about how you
> > found this bug? As far as I can see this looks like -stable material.
>
> I found it when I was learning the code. I'm looking at it carefully.

Ok, good eye.

The impact of this one is somewhat mitigated by the fact that errors
are expanded to 512 byte blast radius, and error consumption maps 4k
around errors. I suspect few are trying to use the badblock list to do
fine grained recovery so this bug went unnoticed.