2017-04-26 01:50:06

by Zheng, Lv

[permalink] [raw]
Subject: [RFC PATCH] ACPICA: Tables: Fix regression introduced by a too early mechanism enabling

In the Linux kernel side, acpi_get_table() hasn't been fully balanced by
acpi_put_table() invocations. So it is not a good timing to report errors.
The strict balanced validation count check should only be enabled after
confirming that all kernel side invocations are safe.

Thus this patch removes the fatal error but leaves the error report to
indicate the leak so that developers can notice the required engineering
change. Reported by Dan Williams, fixed by Lv Zheng.

Reported-by: Dan Williams <[email protected]>
Signed-off-by: Lv Zheng <[email protected]>
---
drivers/acpi/acpica/tbutils.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c
index 5a968a7..9e7d95cf 100644
--- a/drivers/acpi/acpica/tbutils.c
+++ b/drivers/acpi/acpica/tbutils.c
@@ -422,7 +422,6 @@ acpi_tb_get_table(struct acpi_table_desc *table_desc,
"Table %p, Validation count is zero after increment\n",
table_desc));
table_desc->validation_count--;
- return_ACPI_STATUS(AE_LIMIT);
}

*out_table = table_desc->pointer;
--
2.7.4


2017-04-26 05:01:07

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC PATCH] ACPICA: Tables: Fix regression introduced by a too early mechanism enabling

On Tue, Apr 25, 2017 at 6:49 PM, Lv Zheng <[email protected]> wrote:
> In the Linux kernel side, acpi_get_table() hasn't been fully balanced by
> acpi_put_table() invocations. So it is not a good timing to report errors.
> The strict balanced validation count check should only be enabled after
> confirming that all kernel side invocations are safe.

We've been living with this bug for 7 years, let's just go fix all
acpi_get_table() invocations to make sure they have a corresponding
acpi_put_table().

>
> Thus this patch removes the fatal error but leaves the error report to
> indicate the leak so that developers can notice the required engineering
> change. Reported by Dan Williams, fixed by Lv Zheng.
>
> Reported-by: Dan Williams <[email protected]>
> Signed-off-by: Lv Zheng <[email protected]>
> ---
> drivers/acpi/acpica/tbutils.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c
> index 5a968a7..9e7d95cf 100644
> --- a/drivers/acpi/acpica/tbutils.c
> +++ b/drivers/acpi/acpica/tbutils.c
> @@ -422,7 +422,6 @@ acpi_tb_get_table(struct acpi_table_desc *table_desc,
> "Table %p, Validation count is zero after increment\n",
> table_desc));
> table_desc->validation_count--;
> - return_ACPI_STATUS(AE_LIMIT);

If you want to leave the error report turn it into a WARN_ON_ONCE() so
it doesn't keep triggering, but I'd rather we just focus on the
missing acpi_put_table() calls.

2017-04-26 05:15:26

by Zheng, Lv

[permalink] [raw]
Subject: RE: [RFC PATCH] ACPICA: Tables: Fix regression introduced by a too early mechanism enabling

Hi,

> From: Dan Williams [mailto:[email protected]]
> Subject: Re: [RFC PATCH] ACPICA: Tables: Fix regression introduced by a too early mechanism enabling
>
> On Tue, Apr 25, 2017 at 6:49 PM, Lv Zheng <[email protected]> wrote:
> > In the Linux kernel side, acpi_get_table() hasn't been fully balanced by
> > acpi_put_table() invocations. So it is not a good timing to report errors.
> > The strict balanced validation count check should only be enabled after
> > confirming that all kernel side invocations are safe.
>
> We've been living with this bug for 7 years, let's just go fix all
> acpi_get_table() invocations to make sure they have a corresponding
> acpi_put_table().

We knew that, you should have already seen a series internally or
externally from me achieving this.
It's done several years ago. But it takes long time to make the
ACPICA part upstreamed.

Now my plan is:
1. introduce the APIs but allow old usage models in order not to
change old ACPICA behavior and its users.
2. fix all users
3. disallow old usage models.
It's just my mistake to leak the final stage approach to the ACPICA
upstream from my local repo.
Now we can try to jump to the final step, but as far as I know,
not only Linux, ACPICA itself also contains several broken cases.

Bottom line of Linux kernel is we shouldn't break any running system.
So IMO, we will need this commit during this special period.

I didn't say the final step is wrong or is not required.
We can do both in parallel.

So could you please help to confirm if it's working.
And I would like to suggest linux to take this first step fix along
with other final step fixes during this period.

Thanks and best regards
Lv

>
> >
> > Thus this patch removes the fatal error but leaves the error report to
> > indicate the leak so that developers can notice the required engineering
> > change. Reported by Dan Williams, fixed by Lv Zheng.
> >
> > Reported-by: Dan Williams <[email protected]>
> > Signed-off-by: Lv Zheng <[email protected]>
> > ---
> > drivers/acpi/acpica/tbutils.c | 1 -
> > 1 file changed, 1 deletion(-)
> >
> > diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c
> > index 5a968a7..9e7d95cf 100644
> > --- a/drivers/acpi/acpica/tbutils.c
> > +++ b/drivers/acpi/acpica/tbutils.c
> > @@ -422,7 +422,6 @@ acpi_tb_get_table(struct acpi_table_desc *table_desc,
> > "Table %p, Validation count is zero after increment\n",
> > table_desc));
> > table_desc->validation_count--;
> > - return_ACPI_STATUS(AE_LIMIT);
>
> If you want to leave the error report turn it into a WARN_ON_ONCE() so
> it doesn't keep triggering, but I'd rather we just focus on the
> missing acpi_put_table() calls.

2017-04-26 14:13:53

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC PATCH] ACPICA: Tables: Fix regression introduced by a too early mechanism enabling

On Tue, Apr 25, 2017 at 10:15 PM, Zheng, Lv <[email protected]> wrote:
> Hi,
>
>> From: Dan Williams [mailto:[email protected]]
>> Subject: Re: [RFC PATCH] ACPICA: Tables: Fix regression introduced by a too early mechanism enabling
>>
>> On Tue, Apr 25, 2017 at 6:49 PM, Lv Zheng <[email protected]> wrote:
>> > In the Linux kernel side, acpi_get_table() hasn't been fully balanced by
>> > acpi_put_table() invocations. So it is not a good timing to report errors.
>> > The strict balanced validation count check should only be enabled after
>> > confirming that all kernel side invocations are safe.
>>
>> We've been living with this bug for 7 years, let's just go fix all
>> acpi_get_table() invocations to make sure they have a corresponding
>> acpi_put_table().
>
> We knew that, you should have already seen a series internally or
> externally from me achieving this.
> It's done several years ago. But it takes long time to make the
> ACPICA part upstreamed.
>
> Now my plan is:
> 1. introduce the APIs but allow old usage models in order not to
> change old ACPICA behavior and its users.
> 2. fix all users
> 3. disallow old usage models.
> It's just my mistake to leak the final stage approach to the ACPICA
> upstream from my local repo.
> Now we can try to jump to the final step, but as far as I know,
> not only Linux, ACPICA itself also contains several broken cases.
>
> Bottom line of Linux kernel is we shouldn't break any running system.
> So IMO, we will need this commit during this special period.
>
> I didn't say the final step is wrong or is not required.
> We can do both in parallel.
>
> So could you please help to confirm if it's working.
> And I would like to suggest linux to take this first step fix along
> with other final step fixes during this period.

I just think "this period" is very short and we can skip the band-aid
and go straight to auditing the 48 call sites of acpi_get_table.

2017-04-26 15:34:50

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC PATCH] ACPICA: Tables: Fix regression introduced by a too early mechanism enabling

On Wed, Apr 26, 2017 at 7:13 AM, Dan Williams <[email protected]> wrote:
> On Tue, Apr 25, 2017 at 10:15 PM, Zheng, Lv <[email protected]> wrote:
>> Hi,
>>
>>> From: Dan Williams [mailto:[email protected]]
>>> Subject: Re: [RFC PATCH] ACPICA: Tables: Fix regression introduced by a too early mechanism enabling
>>>
>>> On Tue, Apr 25, 2017 at 6:49 PM, Lv Zheng <[email protected]> wrote:
>>> > In the Linux kernel side, acpi_get_table() hasn't been fully balanced by
>>> > acpi_put_table() invocations. So it is not a good timing to report errors.
>>> > The strict balanced validation count check should only be enabled after
>>> > confirming that all kernel side invocations are safe.
>>>
>>> We've been living with this bug for 7 years, let's just go fix all
>>> acpi_get_table() invocations to make sure they have a corresponding
>>> acpi_put_table().
>>
>> We knew that, you should have already seen a series internally or
>> externally from me achieving this.
>> It's done several years ago. But it takes long time to make the
>> ACPICA part upstreamed.
>>
>> Now my plan is:
>> 1. introduce the APIs but allow old usage models in order not to
>> change old ACPICA behavior and its users.
>> 2. fix all users
>> 3. disallow old usage models.
>> It's just my mistake to leak the final stage approach to the ACPICA
>> upstream from my local repo.
>> Now we can try to jump to the final step, but as far as I know,
>> not only Linux, ACPICA itself also contains several broken cases.
>>
>> Bottom line of Linux kernel is we shouldn't break any running system.
>> So IMO, we will need this commit during this special period.
>>
>> I didn't say the final step is wrong or is not required.
>> We can do both in parallel.
>>
>> So could you please help to confirm if it's working.
>> And I would like to suggest linux to take this first step fix along
>> with other final step fixes during this period.
>
> I just think "this period" is very short and we can skip the band-aid
> and go straight to auditing the 48 call sites of acpi_get_table.

Moreover, I don't think this workaround is a workable approach because
it leaves the ACPI_ERROR() in place to continue to spam the logs.