2008-07-19 11:38:23

by Alan Jenkins

[permalink] [raw]
Subject: [PATCH 1/3] acpi: Rip out EC_FLAGS_QUERY_PENDING (prevent race condition)

From: Alan Jenkins <[email protected]>

If several GPE interrupts arrive at a time, then perform an equal number of
queries in the workqueue. It doesn't matter if there are excess interrupts;
the queries will just come back with nothing.

This won't fix anything, because it still switches to polling GPEs if more
than five interrupts arrive at a time. But it simplifies the code, and makes
it easier to modify further without introducing race conditions.

Signed-off-by: Alan Jenkins <[email protected]>
Tested-by: Alan Jenkins <[email protected]>

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index 5622aee..8e4b1a4 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -77,7 +77,6 @@ enum ec_event {

enum {
EC_FLAGS_WAIT_GPE = 0, /* Don't check status until GPE arrives */
- EC_FLAGS_QUERY_PENDING, /* Query is pending */
EC_FLAGS_GPE_MODE, /* Expect GPE to be sent for status change */
EC_FLAGS_NO_GPE, /* Don't use GPE mode */
EC_FLAGS_RESCHEDULE_POLL /* Re-schedule poll */
@@ -230,8 +229,7 @@ static int acpi_ec_transaction_unlocked(struct acpi_ec *ec, u8 command,
"finish-write timeout, command = %d\n", command);
goto end;
}
- } else if (command == ACPI_EC_COMMAND_QUERY)
- clear_bit(EC_FLAGS_QUERY_PENDING, &ec->flags);
+ }

for (; rdata_len > 0; --rdata_len) {
result = acpi_ec_wait(ec, ACPI_EC_EVENT_OBF_1, force_poll);
@@ -502,7 +508,7 @@ static u32 acpi_ec_gpe_handler(void *data)
wake_up(&ec->wait);

if (state & ACPI_EC_FLAG_SCI) {
- if (!test_and_set_bit(EC_FLAGS_QUERY_PENDING, &ec->flags))
+ if (ec->handlers_installed)
status = acpi_os_execute(OSL_EC_BURST_HANDLER,
acpi_ec_gpe_query, ec);
} else if (!test_bit(EC_FLAGS_GPE_MODE, &ec->flags) &&
@@ -665,7 +671,6 @@ static struct acpi_ec *make_acpi_ec(void)
struct acpi_ec *ec = kzalloc(sizeof(struct acpi_ec), GFP_KERNEL);
if (!ec)
return NULL;
- ec->flags = 1 << EC_FLAGS_QUERY_PENDING;
mutex_init(&ec->lock);
init_waitqueue_head(&ec->wait);
INIT_LIST_HEAD(&ec->list);
@@ -858,8 +863,6 @@ static int acpi_ec_start(struct acpi_device *device)

ret = ec_install_handlers(ec);

- /* EC is fully operational, allow queries */
- clear_bit(EC_FLAGS_QUERY_PENDING, &ec->flags);
ec_schedule_ec_poll(ec);
return ret;
}


2008-07-19 16:59:31

by Alexey Starikovskiy

[permalink] [raw]
Subject: Re: [PATCH 1/3] acpi: Rip out EC_FLAGS_QUERY_PENDING (prevent race condition)

Alan,

I don't think this is very good idea -- now every interrupt will add a new entry to workqueue,
and given the rate of this interrupt, it could be become quite long.

Your second patch is redundant if you add queue entry for each interrupt, and as it does not
require any investments into memory, I like it more.

Also, it is very cool to rip of things you don't care for, but that essentially reverts a fix done for 9998,
so at least, you should ask these people if you broke their setups.

Regards,
Alex.

Alan Jenkins wrote:
> From: Alan Jenkins <[email protected]>
>
> If several GPE interrupts arrive at a time, then perform an equal number of
> queries in the workqueue. It doesn't matter if there are excess interrupts;
> the queries will just come back with nothing.
>
> This won't fix anything, because it still switches to polling GPEs if more
> than five interrupts arrive at a time. But it simplifies the code, and makes
> it easier to modify further without introducing race conditions.
>
> Signed-off-by: Alan Jenkins <[email protected]>
> Tested-by: Alan Jenkins <[email protected]>
>
> diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
> index 5622aee..8e4b1a4 100644
> --- a/drivers/acpi/ec.c
> +++ b/drivers/acpi/ec.c
> @@ -77,7 +77,6 @@ enum ec_event {
>
> enum {
> EC_FLAGS_WAIT_GPE = 0, /* Don't check status until GPE arrives */
> - EC_FLAGS_QUERY_PENDING, /* Query is pending */
> EC_FLAGS_GPE_MODE, /* Expect GPE to be sent for status change */
> EC_FLAGS_NO_GPE, /* Don't use GPE mode */
> EC_FLAGS_RESCHEDULE_POLL /* Re-schedule poll */
> @@ -230,8 +229,7 @@ static int acpi_ec_transaction_unlocked(struct acpi_ec *ec, u8 command,
> "finish-write timeout, command = %d\n", command);
> goto end;
> }
> - } else if (command == ACPI_EC_COMMAND_QUERY)
> - clear_bit(EC_FLAGS_QUERY_PENDING, &ec->flags);
> + }
>
> for (; rdata_len > 0; --rdata_len) {
> result = acpi_ec_wait(ec, ACPI_EC_EVENT_OBF_1, force_poll);
> @@ -502,7 +508,7 @@ static u32 acpi_ec_gpe_handler(void *data)
> wake_up(&ec->wait);
>
> if (state & ACPI_EC_FLAG_SCI) {
> - if (!test_and_set_bit(EC_FLAGS_QUERY_PENDING, &ec->flags))
> + if (ec->handlers_installed)
> status = acpi_os_execute(OSL_EC_BURST_HANDLER,
> acpi_ec_gpe_query, ec);
> } else if (!test_bit(EC_FLAGS_GPE_MODE, &ec->flags) &&
> @@ -665,7 +671,6 @@ static struct acpi_ec *make_acpi_ec(void)
> struct acpi_ec *ec = kzalloc(sizeof(struct acpi_ec), GFP_KERNEL);
> if (!ec)
> return NULL;
> - ec->flags = 1 << EC_FLAGS_QUERY_PENDING;
> mutex_init(&ec->lock);
> init_waitqueue_head(&ec->wait);
> INIT_LIST_HEAD(&ec->list);
> @@ -858,8 +863,6 @@ static int acpi_ec_start(struct acpi_device *device)
>
> ret = ec_install_handlers(ec);
>
> - /* EC is fully operational, allow queries */
> - clear_bit(EC_FLAGS_QUERY_PENDING, &ec->flags);
> ec_schedule_ec_poll(ec);
> return ret;
> }
>
>

2008-07-19 20:41:29

by Alan Jenkins

[permalink] [raw]
Subject: Re: [PATCH 1/3] acpi: Rip out EC_FLAGS_QUERY_PENDING (prevent race condition)

Alexey Starikovskiy wrote:
> Alan,

Thanks for your robust response. Sorry for not discussing this more in
the first place. I wanted to write the code that fixes my system first,
and post it as soon as possible. I think I missed a flag to say "this
post is not an immediate request for submission".

> I don't think this is very good idea -- now every interrupt will add a
> new entry to workqueue,
> and given the rate of this interrupt, it could be become quite long.
>

I confess I gave up too quickly on the synchronisation issues and
resorted to brute force. Patch #1 is preventing a narrow race:

- acpi_ec_write_cmd() (QUERY)
- EC writes 0 (no event) to input buffer
- new event arrives, triggering a GPE interrupt which is ignored because
QUERY_PENDING is set.
- QUERY_PENDING is cleared (but too late).

Now we ignored an event, which will be delayed until the next GPE interrupt.

The original reason for patch #1 was to stop patch #2 driving me
insane. If you apply the second patch on it's own, you break
EC_FLAGS_QUERY_PENDING. It gets cleared as soon as the _first_ query
command is submitted. It works, but it means the flag no longer has a
clearly defined meaning. I tried to fix that by only clearing it once
the loop has finished - and then realised I was widening a pre-existing
race condition.

As you say, I took the easy way out and pushed it all into the
workqueue. So the workqueue ends up as a buffer of GPE occurrences. I
did look at reducing the memory usage while avoiding race conditions,
but I couldn't find a reasonable solution. But I looked at it again now
and I have a better solution:

...
/* moved here from acpi_ec_transaction_unlocked() */
clear_bit(EC_FLAGS_QUERY_PENDING, &ec->flags);

while (acpi_ec_query(ec, &value) == 0)
acpi_ec_gpe_run_handler(ec, value);
}

The PENDING flag then reflects whether or not there's an item on the
workqueue. Does that make sense?

> Your second patch is redundant if you add queue entry for each
> interrupt, and as it does not
> require any investments into memory, I like it more.
>
It's not quite redundant with the first patch. We still have GPE
polling mode - patch #1 doesn't affect that. In polling mode, it's
essential to query all the pending events - otherwise, if they arrive
more frequently than the polling interval then you will inevitably drop
events.

Patch #2 is also required to fix my buggy hardware. My laptop's EC
buffers multiple events, but clears SCI_EVT after every query. This
caused problems in polling mode; with multiple events between polling
intervals only one gets queried - and after a while the buffer overflows
and it breaks completely.
> Also, it is very cool to rip of things you don't care for, but that
> essentially reverts a fix done for 9998,
> so at least, you should ask these people if you broke their setups.
>
I assume you're referring to the "would benefit from wider testing"
patch #3. Thanks for identifying the bugzilla entry - I had difficulty
separating the different entries on GPEs. I'm optimistic that we can
fix all these crazy buffering EC's without having to disable GPE interrupts.

The reason I like my proposed fix is that it makes the code simple
enough that I can understand it, without making any assumptions about
how many interrupts arrive per GPE. The EC can be broken in lots of
ways, so long as:

1. We receive interrupts when one or more GPE's are pending.
2. We don't get a constant interrupt storm.

I don't think I missed anything. Is there anything else I should check
before I try to get testing?

Thanks
Alan

2008-07-19 21:12:42

by Alexey Starikovskiy

[permalink] [raw]
Subject: Re: [PATCH 1/3] acpi: Rip out EC_FLAGS_QUERY_PENDING (prevent race condition)

Alan Jenkins wrote:
> /* moved here from acpi_ec_transaction_unlocked() */
> clear_bit(EC_FLAGS_QUERY_PENDING, &ec->flags);
>
> while (acpi_ec_query(ec, &value) == 0)
> acpi_ec_gpe_run_handler(ec, value);
> }
>
> The PENDING flag then reflects whether or not there's an item on the
> workqueue. Does that make sense?
Yes, this is what the flag was made for --
to guard workqueue from being filled up.
>
>> Your second patch is redundant if you add queue entry for each
>> interrupt, and as it does not
>> require any investments into memory, I like it more.
>>
> It's not quite redundant with the first patch. We still have GPE
> polling mode - patch #1 doesn't affect that. In polling mode, it's
> essential to query all the pending events - otherwise, if they arrive
> more frequently than the polling interval then you will inevitably drop
> events.
I meant that if we have patch #3, which drops poll mode, we only need
one of the first two, not both. And in this case I like #2 more than #1.
>
> Patch #2 is also required to fix my buggy hardware. My laptop's EC
> buffers multiple events, but clears SCI_EVT after every query. This
> caused problems in polling mode; with multiple events between polling
> intervals only one gets queried - and after a while the buffer overflows
> and it breaks completely.
I fully agree here, it would be great to make sure that EC buffers are empty.
>> Also, it is very cool to rip of things you don't care for, but that
>> essentially reverts a fix done for 9998,
>> so at least, you should ask these people if you broke their setups.
>>
> I assume you're referring to the "would benefit from wider testing"
> patch #3. Thanks for identifying the bugzilla entry - I had difficulty
> separating the different entries on GPEs. I'm optimistic that we can
> fix all these crazy buffering EC's without having to disable GPE interrupts.
I would say, we should test pair of #2 and #3, or your original patch + my patch.
I still like the option to not have any new interrupts until I'm done with current one --
much like the PENDING bit.
>
> The reason I like my proposed fix is that it makes the code simple
> enough that I can understand it, without making any assumptions about
> how many interrupts arrive per GPE. The EC can be broken in lots of
> ways, so long as:
>
> 1. We receive interrupts when one or more GPE's are pending.
> 2. We don't get a constant interrupt storm.
>
> I don't think I missed anything. Is there anything else I should check
> before I try to get testing?
Well, there is Keith Packard with concern that either high number of interrupts or long
spins on polling status register could push battery life down the drain...
One more argument to disable GPE until we service previous.

Regards,
Alex.

2008-07-20 14:55:16

by Alan Jenkins

[permalink] [raw]
Subject: Re: [PATCH 1/3] acpi: Rip out EC_FLAGS_QUERY_PENDING (prevent race condition)

Alexey Starikovskiy wrote:
> Alan Jenkins wrote:
>> /* moved here from acpi_ec_transaction_unlocked() */
>> clear_bit(EC_FLAGS_QUERY_PENDING, &ec->flags);
>>
>> while (acpi_ec_query(ec, &value) == 0)
>> acpi_ec_gpe_run_handler(ec, value);
>> }
>>
>> The PENDING flag then reflects whether or not there's an item on the
>> workqueue. Does that make sense?
> Yes, this is what the flag was made for -- to guard workqueue from
> being filled up.
>>
>>> Your second patch is redundant if you add queue entry for each
>>> interrupt, and as it does not
>>> require any investments into memory, I like it more.
>>>
>> It's not quite redundant with the first patch. We still have GPE
>> polling mode - patch #1 doesn't affect that. In polling mode, it's
>> essential to query all the pending events - otherwise, if they arrive
>> more frequently than the polling interval then you will inevitably drop
>> events.
> I meant that if we have patch #3, which drops poll mode, we only need
> one of the first two, not both. And in this case I like #2 more than #1.
>>
>> Patch #2 is also required to fix my buggy hardware. My laptop's EC
>> buffers multiple events, but clears SCI_EVT after every query. This
>> caused problems in polling mode; with multiple events between polling
>> intervals only one gets queried - and after a while the buffer overflows
>> and it breaks completely.
> I fully agree here, it would be great to make sure that EC buffers are
> empty.
>>> Also, it is very cool to rip of things you don't care for, but that
>>> essentially reverts a fix done for 9998,
>>> so at least, you should ask these people if you broke their setups.
>>>
>> I assume you're referring to the "would benefit from wider testing"
>> patch #3. Thanks for identifying the bugzilla entry - I had difficulty
>> separating the different entries on GPEs. I'm optimistic that we can
>> fix all these crazy buffering EC's without having to disable GPE
>> interrupts.
> I would say, we should test pair of #2 and #3, or your original patch
> + my patch.
> I still like the option to not have any new interrupts until I'm done
> with current one --
> much like the PENDING bit.
>>
>> The reason I like my proposed fix is that it makes the code simple
>> enough that I can understand it, without making any assumptions about
>> how many interrupts arrive per GPE. The EC can be broken in lots of
>> ways, so long as:
>>
>> 1. We receive interrupts when one or more GPE's are pending.
>> 2. We don't get a constant interrupt storm.
>>
>> I don't think I missed anything. Is there anything else I should check
>> before I try to get testing?
> Well, there is Keith Packard with concern that either high number of
> interrupts or long
> spins on polling status register could push battery life down the
> drain...
> One more argument to disable GPE until we service previous.
> Regards,
> Alex.

Ah, the powertop argument. Disabling interrupts is a nice safety
blanket, but it means you end up polling the status register during
queries. I think the interrupts are better for battery life. The
excess interrupts happen all at once, so they don't wake the CPU up more
frequently, they just keep it awake slightly longer.

I'm convinced that emptying the EC buffer will reduce the number of
interrupts, even for the reporter of #9998. I'll ask them to test this
patchset, then respin it to fix the mess I made of
EC_FLAGS_QUERY_PENDING. As I mentioned earlier I've worked out how to
avoid my theoretical race without causing increased memory usage.

Thanks
Alan