2008-10-11 16:58:06

by Alan Jenkins

[permalink] [raw]
Subject: acpi-test tree on eeepc: EC error message on second resume

I've just run an acpi-test kernel on my EeePC and noticed a new issue.
It seems to be caused (or revealed) by the EC interrupt transaction patch.

On the second suspend/resume cycle, I see a kernel error message.

[ 78.747707] ACPI: Waking up from system sleep state S3
[ 79.330001] ACPI: EC: input buffer not empty, aborting transaction
[ 79.423327] ACPI: EC: non-query interrupt received, switching to
interrupt mode

I still don't see any issues in the code. I'll try getting a DEBUG
trace to see the EC interrupts. Any other suggestions?

Thanks
Alan


2008-10-11 17:05:55

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: acpi-test tree on eeepc: EC error message on second resume

On Saturday, 11 of October 2008, Alan Jenkins wrote:
> I've just run an acpi-test kernel on my EeePC and noticed a new issue.
> It seems to be caused (or revealed) by the EC interrupt transaction patch.
>
> On the second suspend/resume cycle, I see a kernel error message.
>
> [ 78.747707] ACPI: Waking up from system sleep state S3
> [ 79.330001] ACPI: EC: input buffer not empty, aborting transaction
> [ 79.423327] ACPI: EC: non-query interrupt received, switching to
> interrupt mode
>
> I still don't see any issues in the code. I'll try getting a DEBUG
> trace to see the EC interrupts. Any other suggestions?

Not really, but is this reproducible? I mean, does it happen always on the
second resume and does it happen on every next resume after the first one?

Thanks,
Rafael

2008-10-11 17:13:04

by Alan Jenkins

[permalink] [raw]
Subject: Re: acpi-test tree on eeepc: EC error message on second resume

Rafael J. Wysocki wrote:
> On Saturday, 11 of October 2008, Alan Jenkins wrote:
>
>> I've just run an acpi-test kernel on my EeePC and noticed a new issue.
>> It seems to be caused (or revealed) by the EC interrupt transaction patch.
>>
>> On the second suspend/resume cycle, I see a kernel error message.
>>
>> [ 78.747707] ACPI: Waking up from system sleep state S3
>> [ 79.330001] ACPI: EC: input buffer not empty, aborting transaction
>> [ 79.423327] ACPI: EC: non-query interrupt received, switching to
>> interrupt mode
>>
>> I still don't see any issues in the code. I'll try getting a DEBUG
>> trace to see the EC interrupts. Any other suggestions?
>>
>
> Not really, but is this reproducible? I mean, does it happen always on the
> second resume and does it happen on every next resume after the first one?
>
> Thanks,
> Rafael
>

Ah. No, I spoke to soon. It happened on the second resume the first
two times I tried it. But this third time with DEBUG enabled, it
happened on the first suspend/resume.

And it doesn't happen on all subsequent resumes either. I've had one
suspend/resume without the error, just after a suspend/resume with the
error.

So it's not deterministic, but it is easy to reproduce.

Alan

2008-10-11 17:49:11

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: acpi-test tree on eeepc: EC error message on second resume

On Saturday, 11 of October 2008, Alan Jenkins wrote:
> Rafael J. Wysocki wrote:
> > On Saturday, 11 of October 2008, Alan Jenkins wrote:
> >
> >> I've just run an acpi-test kernel on my EeePC and noticed a new issue.
> >> It seems to be caused (or revealed) by the EC interrupt transaction patch.
> >>
> >> On the second suspend/resume cycle, I see a kernel error message.
> >>
> >> [ 78.747707] ACPI: Waking up from system sleep state S3
> >> [ 79.330001] ACPI: EC: input buffer not empty, aborting transaction
> >> [ 79.423327] ACPI: EC: non-query interrupt received, switching to
> >> interrupt mode
> >>
> >> I still don't see any issues in the code. I'll try getting a DEBUG
> >> trace to see the EC interrupts. Any other suggestions?
> >>
> >
> > Not really, but is this reproducible? I mean, does it happen always on the
> > second resume and does it happen on every next resume after the first one?
> >
> > Thanks,
> > Rafael
> >
>
> Ah. No, I spoke to soon. It happened on the second resume the first
> two times I tried it. But this third time with DEBUG enabled, it
> happened on the first suspend/resume.
>
> And it doesn't happen on all subsequent resumes either. I've had one
> suspend/resume without the error, just after a suspend/resume with the
> error.
>
> So it's not deterministic, but it is easy to reproduce.

I can't reproduce this on any hardware available to me, so far.

Is this related to any other problem, like things not working etc.?

Rafael

2008-10-11 18:15:43

by Alan Jenkins

[permalink] [raw]
Subject: Re: acpi-test tree on eeepc: EC error message on second resume

Rafael J. Wysocki wrote:
> On Saturday, 11 of October 2008, Alan Jenkins wrote:
>
>> Rafael J. Wysocki wrote:
>>
>>> On Saturday, 11 of October 2008, Alan Jenkins wrote:
>>>
>>>
>>>> I've just run an acpi-test kernel on my EeePC and noticed a new issue.
>>>> It seems to be caused (or revealed) by the EC interrupt transaction patch.
>>>>
>>>> On the second suspend/resume cycle, I see a kernel error message.
>>>>
>>>> [ 78.747707] ACPI: Waking up from system sleep state S3
>>>> [ 79.330001] ACPI: EC: input buffer not empty, aborting transaction
>>>> [ 79.423327] ACPI: EC: non-query interrupt received, switching to
>>>> interrupt mode
>>>>
>>>> I still don't see any issues in the code. I'll try getting a DEBUG
>>>> trace to see the EC interrupts. Any other suggestions?
>>>>
>>>>
>>> Not really, but is this reproducible? I mean, does it happen always on the
>>> second resume and does it happen on every next resume after the first one?
>>>
>>> Thanks,
>>> Rafael
>>>
>>>
>> Ah. No, I spoke to soon. It happened on the second resume the first
>> two times I tried it. But this third time with DEBUG enabled, it
>> happened on the first suspend/resume.
>>
>> And it doesn't happen on all subsequent resumes either. I've had one
>> suspend/resume without the error, just after a suspend/resume with the
>> error.
>>
>> So it's not deterministic, but it is easy to reproduce.
>>
>
> I can't reproduce this on any hardware available to me, so far.
>
> Is this related to any other problem, like things not working etc.?
>

Nope, just an error message. Though I do worry that "random EC
transaction aborts during resume" could hit something important
somewhere, sometime.

I think I found the problem. The "input buffer empty" wait depends on
"interrupt mode" to work properly, and we don't immediately enable the
interrupt on resume. The wait should have a polling fallback anyway, to
be consistent with the other transaction waits.

Alan

2008-10-11 18:39:18

by Alexey Starikovskiy

[permalink] [raw]
Subject: Re: acpi-test tree on eeepc: EC error message on second resume

Alan Jenkins wrote:
> I think I found the problem. The "input buffer empty" wait depends on
> "interrupt mode" to work properly, and we don't immediately enable the
> interrupt on resume. The wait should have a polling fallback anyway, to
> be consistent with the other transaction waits.
>
> Alan
Yep, I think something like attached patch may help:

Thanks,
Alex.


Attachments:
2.patch (743.00 B)

2008-10-11 19:26:50

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: acpi-test tree on eeepc: EC error message on second resume

On Saturday, 11 of October 2008, Alexey Starikovskiy wrote:
> Alan Jenkins wrote:
> > I think I found the problem. The "input buffer empty" wait depends on
> > "interrupt mode" to work properly, and we don't immediately enable the
> > interrupt on resume. The wait should have a polling fallback anyway, to
> > be consistent with the other transaction waits.
> >
> > Alan
> Yep, I think something like attached patch may help:

[Can you please append patches instead of or apart from attaching them?
That would make it easier to comment them.]

if (!wait_event_timeout(ec->wait, ec_check_ibf0(ec),
- msecs_to_jiffies(ACPI_EC_DELAY))) {
+ msecs_to_jiffies(ACPI_EC_DELAY)) &&
+ !ec_check_ibf0(ec)) {

Shouldn't this go under the spinlock? Surely it can race with the GPE handler.

pr_err(PREFIX "input buffer is not empty, "
"aborting transaction\n");

Thanks,
Rafael

2008-10-11 19:31:20

by Alexey Starikovskiy

[permalink] [raw]
Subject: Re: acpi-test tree on eeepc: EC error message on second resume

Rafael J. Wysocki wrote:
> On Saturday, 11 of October 2008, Alexey Starikovskiy wrote:
>
>> Alan Jenkins wrote:
>>
>>> I think I found the problem. The "input buffer empty" wait depends on
>>> "interrupt mode" to work properly, and we don't immediately enable the
>>> interrupt on resume. The wait should have a polling fallback anyway, to
>>> be consistent with the other transaction waits.
>>>
>>> Alan
>>>
>> Yep, I think something like attached patch may help:
>>
>
> [Can you please append patches instead of or apart from attaching them?
> That would make it easier to comment them.]
>
>
Ok.
> if (!wait_event_timeout(ec->wait, ec_check_ibf0(ec),
> - msecs_to_jiffies(ACPI_EC_DELAY))) {
> + msecs_to_jiffies(ACPI_EC_DELAY)) &&
> + !ec_check_ibf0(ec)) {
>
> Shouldn't this go under the spinlock? Surely it can race with the GPE handler.
>
>
No, we discussed this before -- we are outside of the transaction, thus
no GPE
activity could interfere with ec_check_ibf0.

Regards,
Alex.

2008-10-11 19:35:53

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: acpi-test tree on eeepc: EC error message on second resume

On Saturday, 11 of October 2008, Alexey Starikovskiy wrote:
> Rafael J. Wysocki wrote:
> > On Saturday, 11 of October 2008, Alexey Starikovskiy wrote:
> >
> >> Alan Jenkins wrote:
> >>
> >>> I think I found the problem. The "input buffer empty" wait depends on
> >>> "interrupt mode" to work properly, and we don't immediately enable the
> >>> interrupt on resume. The wait should have a polling fallback anyway, to
> >>> be consistent with the other transaction waits.
> >>>
> >>> Alan
> >>>
> >> Yep, I think something like attached patch may help:
> >>
> >
> > [Can you please append patches instead of or apart from attaching them?
> > That would make it easier to comment them.]
> >
> >
> Ok.
> > if (!wait_event_timeout(ec->wait, ec_check_ibf0(ec),
> > - msecs_to_jiffies(ACPI_EC_DELAY))) {
> > + msecs_to_jiffies(ACPI_EC_DELAY)) &&
> > + !ec_check_ibf0(ec)) {
> >
> > Shouldn't this go under the spinlock? Surely it can race with the GPE handler.
> >
> >
> No, we discussed this before -- we are outside of the transaction, thus
> no GPE
> activity could interfere with ec_check_ibf0.

Ok, this is in the process context and we don't really expect to get an
interrupt at this point, but what happens if the EC generates an event that's
not related to any transiaction. Is that guaranteed to never happen?

Thanks,
Rafael

2008-10-11 19:38:59

by Alexey Starikovskiy

[permalink] [raw]
Subject: Re: acpi-test tree on eeepc: EC error message on second resume

Rafael J. Wysocki wrote:
>> No, we discussed this before -- we are outside of the transaction, thus
>> no GPE
>> activity could interfere with ec_check_ibf0.
>
> Ok, this is in the process context and we don't really expect to get an
> interrupt at this point, but what happens if the EC generates an event that's
> not related to any transiaction. Is that guaranteed to never happen?
Interrupt handler in this case can't cause a change to status register, thus our
read of it will not be affected by interrupt.
>
> Thanks,
> Rafael

2008-10-11 20:50:01

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: acpi-test tree on eeepc: EC error message on second resume

On Saturday, 11 of October 2008, Alexey Starikovskiy wrote:
> Rafael J. Wysocki wrote:
> >> No, we discussed this before -- we are outside of the transaction, thus
> >> no GPE
> >> activity could interfere with ec_check_ibf0.
> >
> > Ok, this is in the process context and we don't really expect to get an
> > interrupt at this point, but what happens if the EC generates an event that's
> > not related to any transiaction. Is that guaranteed to never happen?
> Interrupt handler in this case can't cause a change to status register, thus our
> read of it will not be affected by interrupt.

Ok, thanks.

Alan, does the patch work for you?

Rafael

2008-10-12 09:14:17

by Alan Jenkins

[permalink] [raw]
Subject: Re: acpi-test tree on eeepc: EC error message on second resume

Rafael J. Wysocki wrote:
> On Saturday, 11 of October 2008, Alexey Starikovskiy wrote:
>
>> Rafael J. Wysocki wrote:
>>
>>>> No, we discussed this before -- we are outside of the transaction, thus
>>>> no GPE
>>>> activity could interfere with ec_check_ibf0.
>>>>
>>> Ok, this is in the process context and we don't really expect to get an
>>> interrupt at this point, but what happens if the EC generates an event that's
>>> not related to any transiaction. Is that guaranteed to never happen?
>>>
>> Interrupt handler in this case can't cause a change to status register, thus our
>> read of it will not be affected by interrupt.
>>
>
> Ok, thanks.
>
> Alan, does the patch work for you?
>
> Rafael
>

Yes. Two reboot cycles, three suspend/resume cycles each, and no error
message.

I hope we have a better fix in mind though :-P. The patch doesn't solve
the unnecessary 500ms delay when this thing happens.

Thanks
Alan

2008-10-12 19:23:35

by Alexey Starikovskiy

[permalink] [raw]
Subject: Re: acpi-test tree on eeepc: EC error message on second resume

Alan Jenkins wrote:
> Rafael J. Wysocki wrote:
>> On Saturday, 11 of October 2008, Alexey Starikovskiy wrote:
>>
>>> Rafael J. Wysocki wrote:
>>>
>>>>> No, we discussed this before -- we are outside of the transaction, thus
>>>>> no GPE
>>>>> activity could interfere with ec_check_ibf0.
>>>>>
>>>> Ok, this is in the process context and we don't really expect to get an
>>>> interrupt at this point, but what happens if the EC generates an event that's
>>>> not related to any transiaction. Is that guaranteed to never happen?
>>>>
>>> Interrupt handler in this case can't cause a change to status register, thus our
>>> read of it will not be affected by interrupt.
>>>
>> Ok, thanks.
>>
>> Alan, does the patch work for you?
>>
>> Rafael
>>
>
> Yes. Two reboot cycles, three suspend/resume cycles each, and no error
> message.
>
> I hope we have a better fix in mind though :-P. The patch doesn't solve
> the unnecessary 500ms delay when this thing happens.

Something like this?

Regards,
Alex.


Attachments:
2.patch (1.20 kB)

2008-10-13 01:32:16

by Zhao, Yakui

[permalink] [raw]
Subject: Re: acpi-test tree on eeepc: EC error message on second resume

On Sun, 2008-10-12 at 23:23 +0400, Alexey Starikovskiy wrote:
> +static int ec_wait_ibf0(struct acpi_ec *ec)
> +{
> + unsigned long delay = jiffies +
> msecs_to_jiffies(ACPI_EC_DELAY);
> + while (time_before(jiffies, delay)) {
> + if (wait_event_timeout(ec->wait, ec_check_ibf0(ec),
> + msecs_to_jiffies(1)))
> + return 0;
Please add the bogus timeout check.
best regards.
Yakui
> + }
> + return -ETIME;
> +}
> +

2008-10-13 08:22:58

by Alan Jenkins

[permalink] [raw]
Subject: Re: acpi-test tree on eeepc: EC error message on second resume

Alexis Starikovskiy wrote:
> Alan Jenkins wrote:
>> Rafael J. Wysocki wrote:
>>> On Saturday, 11 of October 2008, Alexey Starikovskiy wrote:
>>>
>>>> Rafael J. Wysocki wrote:
>>>>
>>>>>> No, we discussed this before -- we are outside of the
>>>>>> transaction, thus no GPE
>>>>>> activity could interfere with ec_check_ibf0.
>>>>>>
>>>>> Ok, this is in the process context and we don't really expect to
>>>>> get an
>>>>> interrupt at this point, but what happens if the EC generates an
>>>>> event that's
>>>>> not related to any transiaction. Is that guaranteed to never happen?
>>>>>
>>>> Interrupt handler in this case can't cause a change to status
>>>> register, thus our read of it will not be affected by interrupt.
>>>>
>>> Ok, thanks.
>>>
>>> Alan, does the patch work for you?
>>>
>>> Rafael
>>>
>>
>> Yes. Two reboot cycles, three suspend/resume cycles each, and no error
>> message.
>>
>> I hope we have a better fix in mind though :-P. The patch doesn't solve
>> the unnecessary 500ms delay when this thing happens.
>
> Something like this?
>
> Regards,
> Alex.

You sent it as an attachment again :-).

That should work, odd as it looks. We don't need to worry about the GPE
workaround because that's only active _inside_ the transaction. I don't
know what Zhao thinks is missing.

Sorry I can't test right now. I tried to install 3D support on my
laptop for showing-off purposes, and somehow broke X.

Thanks
Alan

2008-10-13 16:39:50

by Alan Jenkins

[permalink] [raw]
Subject: Re: acpi-test tree on eeepc: EC error message on second resume

Alan Jenkins wrote:
> Alexis Starikovskiy wrote:
>
>> Alan Jenkins wrote:
>>
>>> Rafael J. Wysocki wrote:
>>>
>>>> On Saturday, 11 of October 2008, Alexey Starikovskiy wrote:
>>>>
>>>>
>>>>> Rafael J. Wysocki wrote:
>>>>>
>>>>>
>>>>>>> No, we discussed this before -- we are outside of the
>>>>>>> transaction, thus no GPE
>>>>>>> activity could interfere with ec_check_ibf0.
>>>>>>>
>>>>>>>
>>>>>> Ok, this is in the process context and we don't really expect to
>>>>>> get an
>>>>>> interrupt at this point, but what happens if the EC generates an
>>>>>> event that's
>>>>>> not related to any transiaction. Is that guaranteed to never happen?
>>>>>>
>>>>>>
>>>>> Interrupt handler in this case can't cause a change to status
>>>>> register, thus our read of it will not be affected by interrupt.
>>>>>
>>>>>
>>>> Ok, thanks.
>>>>
>>>> Alan, does the patch work for you?
>>>>
>>>> Rafael
>>>>
>>>>
>>> Yes. Two reboot cycles, three suspend/resume cycles each, and no error
>>> message.
>>>
>>> I hope we have a better fix in mind though :-P. The patch doesn't solve
>>> the unnecessary 500ms delay when this thing happens.
>>>
>> Something like this?
>>
>> Regards,
>> Alex.
>>
>
> You sent it as an attachment again :-).
>
> That should work, odd as it looks. We don't need to worry about the GPE
> workaround because that's only active _inside_ the transaction. I don't
> know what Zhao thinks is missing.
>
> Sorry I can't test right now. I tried to install 3D support on my
> laptop for showing-off purposes, and somehow broke X.
>
Drama over, I've now tested it. No error messages, and the printk
timings show that it has stopped hanging for half a second.

Thanks
Alan