2015-11-06 20:44:17

by Chris Bainbridge

[permalink] [raw]
Subject: [PATCH] Preserve task state in reentrant calls to ___wait_event

In the ACPI SBS initialisation, a reentrant call to wait_event_timeout()
causes an intermittent boot stall of several minutes usually following
the "Switching to clocksource tsc" message. This stall is caused by:

1. drivers/acpi/sbshc.c wait_transaction_complete() calls
wait_event_timeout():

if (wait_event_timeout(hc->wait, smb_check_done(hc),
msecs_to_jiffies(timeout)))

2. ___wait_event sets task state to uninterruptible

3. ___wait_event calls the "condition" smb_check_done()

4. smb_check_done (sbshc.c) calls through to ec_read() in
drivers/acpi/ec.c

5. ec_guard() is reached which calls wait_event_timeout()

if (wait_event_timeout(ec->wait,
ec_transaction_completed(ec),
guard))

ie. wait_event_timeout() is being called again inside evaluation of
the previous wait_event_timeout() condition

5. The EC IRQ handler calls wake_up() and wakes up the sleeping task in
ec_guard()

6. The task is now in state running even though the wait "condition" is
still being evaluated

7. The "condition" check returns false so ___wait_event calls
schedule_timeout()

8. Since the task state is running, the scheduler immediately schedules
it again

9. This loop repeats for around 250 seconds event though the original
wait_event_timeout was only 1000ms.

This happens because each the call to schedule_timeout() usually
returns immediately, taking less than 1ms, so the jiffies timeout
counter is not decremented. The task is now stuck in a running
state, and so is highly likely to get rescheduled immediately, which
takes less than a jiffy.

The root problem is that wait_event_timeout() does not preserve the task
state when called by tasks that are not running. We fix this by
preserving and restoring the task state in ___wait_event().

Signed-off-by: Chris Bainbridge <[email protected]>
---
I am assuming here that wait_event_timeout() is supposed to support reentrant
calls. If not, perhaps it should BUG_ON when called with a non-running task
state, and the SBS HC / ACPI EC code needs to be fixed to stop doing this. If
reentrant calls are supposed to work, then this patch will preserve the task
state (there may be a more appropriate way to support reentrant calls than this
exact patch, suggestions/alternatives are welcome, but this does work).
---
include/linux/wait.h | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/linux/wait.h b/include/linux/wait.h
index 1e1bf9f..a847cf8 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -209,11 +209,12 @@ wait_queue_head_t *bit_waitqueue(void *, int);
* otherwise.
*/

-#define ___wait_event(wq, condition, state, exclusive, ret, cmd) \
+#define ___wait_event(wq, condition, nstate, exclusive, ret, cmd) \
({ \
__label__ __out; \
wait_queue_t __wait; \
long __ret = ret; /* explicit shadow */ \
+ long ostate = current->state; \
\
INIT_LIST_HEAD(&__wait.task_list); \
if (exclusive) \
@@ -222,16 +223,16 @@ wait_queue_head_t *bit_waitqueue(void *, int);
__wait.flags = 0; \
\
for (;;) { \
- long __int = prepare_to_wait_event(&wq, &__wait, state);\
+ long __int = prepare_to_wait_event(&wq, &__wait, nstate);\
\
if (condition) \
break; \
\
- if (___wait_is_interruptible(state) && __int) { \
+ if (___wait_is_interruptible(nstate) && __int) { \
__ret = __int; \
if (exclusive) { \
abort_exclusive_wait(&wq, &__wait, \
- state, NULL); \
+ nstate, NULL); \
goto __out; \
} \
break; \
@@ -240,6 +241,7 @@ wait_queue_head_t *bit_waitqueue(void *, int);
cmd; \
} \
finish_wait(&wq, &__wait); \
+ set_current_state(ostate); \
__out: __ret; \
})

--
2.1.4


2015-11-06 22:37:36

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] Preserve task state in reentrant calls to ___wait_event

On Friday, November 06, 2015 08:44:08 PM Chris Bainbridge wrote:
> In the ACPI SBS initialisation, a reentrant call to wait_event_timeout()
> causes an intermittent boot stall of several minutes usually following
> the "Switching to clocksource tsc" message. This stall is caused by:
>
> 1. drivers/acpi/sbshc.c wait_transaction_complete() calls
> wait_event_timeout():
>
> if (wait_event_timeout(hc->wait, smb_check_done(hc),
> msecs_to_jiffies(timeout)))
>
> 2. ___wait_event sets task state to uninterruptible
>
> 3. ___wait_event calls the "condition" smb_check_done()
>
> 4. smb_check_done (sbshc.c) calls through to ec_read() in
> drivers/acpi/ec.c
>
> 5. ec_guard() is reached which calls wait_event_timeout()
>
> if (wait_event_timeout(ec->wait,
> ec_transaction_completed(ec),
> guard))
>
> ie. wait_event_timeout() is being called again inside evaluation of
> the previous wait_event_timeout() condition

Hmm.

This doesn't look quite right to me.

> 5. The EC IRQ handler calls wake_up() and wakes up the sleeping task in
> ec_guard()
>
> 6. The task is now in state running even though the wait "condition" is
> still being evaluated
>
> 7. The "condition" check returns false so ___wait_event calls
> schedule_timeout()
>
> 8. Since the task state is running, the scheduler immediately schedules
> it again
>
> 9. This loop repeats for around 250 seconds event though the original
> wait_event_timeout was only 1000ms.
>
> This happens because each the call to schedule_timeout() usually
> returns immediately, taking less than 1ms, so the jiffies timeout
> counter is not decremented. The task is now stuck in a running
> state, and so is highly likely to get rescheduled immediately, which
> takes less than a jiffy.
>
> The root problem is that wait_event_timeout() does not preserve the task
> state when called by tasks that are not running. We fix this by
> preserving and restoring the task state in ___wait_event().
>
> Signed-off-by: Chris Bainbridge <[email protected]>
> ---
> I am assuming here that wait_event_timeout() is supposed to support reentrant
> calls. If not, perhaps it should BUG_ON when called with a non-running task
> state, and the SBS HC / ACPI EC code needs to be fixed to stop doing this.

OK, so I'm not sure about that too. Peter?

I'd fix the SBS HC / ACPI EC code to stop doing this regardless.

> If reentrant calls are supposed to work, then this patch will preserve the task
> state (there may be a more appropriate way to support reentrant calls than this
> exact patch, suggestions/alternatives are welcome, but this does work).

Thanks,
Rafael

2015-11-07 08:32:50

by Chris Bainbridge

[permalink] [raw]
Subject: Re: [PATCH] Preserve task state in reentrant calls to ___wait_event

On Fri, Nov 06, 2015 at 08:44:08PM +0000, Chris Bainbridge wrote:
> -#define ___wait_event(wq, condition, state, exclusive, ret, cmd) \
> +#define ___wait_event(wq, condition, nstate, exclusive, ret, cmd) \
> ({ \
> __label__ __out; \
> wait_queue_t __wait; \
> long __ret = ret; /* explicit shadow */ \
> + long ostate = current->state; \

XXX

> \
> INIT_LIST_HEAD(&__wait.task_list); \
> if (exclusive) \
> @@ -222,16 +223,16 @@ wait_queue_head_t *bit_waitqueue(void *, int);
> __wait.flags = 0; \
> \
> for (;;) { \
> - long __int = prepare_to_wait_event(&wq, &__wait, state);\
> + long __int = prepare_to_wait_event(&wq, &__wait, nstate);\
> \
> if (condition) \
> break; \
> \
> - if (___wait_is_interruptible(state) && __int) { \
> + if (___wait_is_interruptible(nstate) && __int) { \
> __ret = __int; \
> if (exclusive) { \
> abort_exclusive_wait(&wq, &__wait, \
> - state, NULL); \
> + nstate, NULL); \
> goto __out; \
> } \
> break; \
> @@ -240,6 +241,7 @@ wait_queue_head_t *bit_waitqueue(void *, int);
> cmd; \
> } \
> finish_wait(&wq, &__wait); \
> + set_current_state(ostate); \

I'm not convinced that this particular code is (or can be) race free in
the general reentrant case. The outer call to ___wait_event will miss
any wake_up received in the inner call between XXX above (store of
current->state) and this point of restoring the previous state. So if
the inner condition evaluation or some interrupt handler happens to
trigger a wake_up meant for the outer call then it will be lost.

> __out: __ret; \
> })

2015-11-07 09:20:13

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] Preserve task state in reentrant calls to ___wait_event

On Fri, Nov 06, 2015 at 08:44:08PM +0000, Chris Bainbridge wrote:
> I am assuming here that wait_event_timeout() is supposed to support reentrant
> calls.

Not really. It is sort of allowed, provided the inner one will rarely
block. And therefore the outer one will mostly work.

> If not, perhaps it should BUG_ON when called with a non-running task
> state,

There is a warning in __might_sleep() that catches some of this.

> and the SBS HC / ACPI EC code needs to be fixed to stop doing this.

Yes.

2015-11-12 18:05:44

by Chris Bainbridge

[permalink] [raw]
Subject: [PATCH] ACPI / SMBus: Fix boot stalls / high CPU caused by reentrant code

In the SBS initialisation, a reentrant call to wait_event_timeout()
causes an intermittent boot stall of several minutes usually following
the "Switching to clocksource tsc" message. Another symptom of this bug
is high CPU usage from programs (Firefox, upowerd) querying the battery
state. This is caused by:

1. drivers/acpi/sbshc.c wait_transaction_complete() calls
wait_event_timeout():

if (wait_event_timeout(hc->wait, smb_check_done(hc),
msecs_to_jiffies(timeout)))

2. ___wait_event sets task state to uninterruptible

3. ___wait_event calls the "condition" smb_check_done()

4. smb_check_done (sbshc.c) calls through to ec_read() in
drivers/acpi/ec.c

5. ec_guard() is reached which calls wait_event_timeout()

if (wait_event_timeout(ec->wait,
ec_transaction_completed(ec),
guard))

ie. wait_event_timeout() is being called again inside evaluation of
the previous wait_event_timeout() condition

5. The EC IRQ handler calls wake_up() and wakes up the sleeping task in
ec_guard()

6. The task is now in state running even though the wait "condition" is
still being evaluated

7. The "condition" check returns false so ___wait_event calls
schedule_timeout()

8. Since the task state is running, the scheduler immediately schedules
it again

9. This loop usually repeats for around 250 seconds even though the
original wait_event_timeout was only 1000ms.

The timeout is incorrect because each call to schedule_timeout()
usually returns immediately, taking less than 1ms, so the jiffies
timeout counter is not decremented. The task is now stuck in a
running state, and so is highly likely to be immediately
rescheduled, which takes less than a jiffy. The loop will never exit
if all schedule_timeout() calls take less than a jiffy.

Fix this by replacing SMBus reads in the wait_event_timeout condition
with checks of a boolean value that is updated by the EC query handler.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=107191
Link: https://lkml.org/lkml/2015/11/6/776
Signed-off-by: Chris Bainbridge <[email protected]>
---
drivers/acpi/sbshc.c | 26 +++++++-------------------
1 file changed, 7 insertions(+), 19 deletions(-)

diff --git a/drivers/acpi/sbshc.c b/drivers/acpi/sbshc.c
index bf034f8..e290051 100644
--- a/drivers/acpi/sbshc.c
+++ b/drivers/acpi/sbshc.c
@@ -30,6 +30,7 @@ struct acpi_smb_hc {
u8 query_bit;
smbus_alarm_callback callback;
void *context;
+ bool done;
};

static int acpi_smbus_hc_add(struct acpi_device *device);
@@ -100,27 +101,11 @@ static inline int smb_hc_write(struct acpi_smb_hc *hc, u8 address, u8 data)
return ec_write(hc->offset + address, data);
}

-static inline int smb_check_done(struct acpi_smb_hc *hc)
-{
- union acpi_smb_status status = {.raw = 0};
- smb_hc_read(hc, ACPI_SMB_STATUS, &status.raw);
- return status.fields.done && (status.fields.status == SMBUS_OK);
-}
-
static int wait_transaction_complete(struct acpi_smb_hc *hc, int timeout)
{
- if (wait_event_timeout(hc->wait, smb_check_done(hc),
- msecs_to_jiffies(timeout)))
+ if (wait_event_timeout(hc->wait, hc->done, msecs_to_jiffies(timeout)))
return 0;
- /*
- * After the timeout happens, OS will try to check the status of SMbus.
- * If the status is what OS expected, it will be regarded as the bogus
- * timeout.
- */
- if (smb_check_done(hc))
- return 0;
- else
- return -ETIME;
+ return -ETIME;
}

static int acpi_smbus_transaction(struct acpi_smb_hc *hc, u8 protocol,
@@ -135,6 +120,7 @@ static int acpi_smbus_transaction(struct acpi_smb_hc *hc, u8 protocol,
}

mutex_lock(&hc->lock);
+ hc->done = false;
if (macbook)
udelay(5);
if (smb_hc_read(hc, ACPI_SMB_PROTOCOL, &temp))
@@ -235,8 +221,10 @@ static int smbus_alarm(void *context)
if (smb_hc_read(hc, ACPI_SMB_STATUS, &status.raw))
return 0;
/* Check if it is only a completion notify */
- if (status.fields.done)
+ if (status.fields.done && status.fields.status == SMBUS_OK) {
+ hc->done = true;
wake_up(&hc->wait);
+ }
if (!status.fields.alarm)
return 0;
mutex_lock(&hc->lock);
--
2.1.4

2015-11-12 19:24:54

by Chris Bainbridge

[permalink] [raw]
Subject: [PATCH] Revert "ACPI / SBS: Add 5 us delay to fix SBS hangs on MacBook"

This reverts commit 3349fb64b2927407017d970dd5c4daf3c5ad69f8.

After commit 8848d20cf09c ("ACPI / SMBus: Fix boot stalls / high CPU
caused by reentrant code") the delay is no longer necessary.

Signed-off-by: Chris Bainbridge <[email protected]>
---
drivers/acpi/sbshc.c | 22 ----------------------
1 file changed, 22 deletions(-)

diff --git a/drivers/acpi/sbshc.c b/drivers/acpi/sbshc.c
index e2900518cf7e..2fa8304171e0 100644
--- a/drivers/acpi/sbshc.c
+++ b/drivers/acpi/sbshc.c
@@ -14,7 +14,6 @@
#include <linux/delay.h>
#include <linux/module.h>
#include <linux/interrupt.h>
-#include <linux/dmi.h>
#include "sbshc.h"

#define PREFIX "ACPI: "
@@ -89,8 +88,6 @@ enum acpi_smb_offset {
ACPI_SMB_ALARM_DATA = 0x26, /* 2 bytes alarm data */
};

-static bool macbook;
-
static inline int smb_hc_read(struct acpi_smb_hc *hc, u8 address, u8 *data)
{
return ec_read(hc->offset + address, data);
@@ -121,8 +118,6 @@ static int acpi_smbus_transaction(struct acpi_smb_hc *hc, u8 protocol,

mutex_lock(&hc->lock);
hc->done = false;
- if (macbook)
- udelay(5);
if (smb_hc_read(hc, ACPI_SMB_PROTOCOL, &temp))
goto end;
if (temp) {
@@ -250,29 +245,12 @@ extern int acpi_ec_add_query_handler(struct acpi_ec *ec, u8 query_bit,
acpi_handle handle, acpi_ec_query_func func,
void *data);

-static int macbook_dmi_match(const struct dmi_system_id *d)
-{
- pr_debug("Detected MacBook, enabling workaround\n");
- macbook = true;
- return 0;
-}
-
-static struct dmi_system_id acpi_smbus_dmi_table[] = {
- { macbook_dmi_match, "Apple MacBook", {
- DMI_MATCH(DMI_BOARD_VENDOR, "Apple"),
- DMI_MATCH(DMI_PRODUCT_NAME, "MacBook") },
- },
- { },
-};
-
static int acpi_smbus_hc_add(struct acpi_device *device)
{
int status;
unsigned long long val;
struct acpi_smb_hc *hc;

- dmi_check_system(acpi_smbus_dmi_table);
-
if (!device)
return -EINVAL;

--
2.1.4