Many callers of the wait_event_timeout() and
wait_event_interruptible_timeout() expect that the return value will be
positive if the specified condition becomes true before the timeout
elapses. However, at the moment this isn't guaranteed. If the wake-up
handler is delayed enough, the time remaining until timeout will be
calculated as 0 - and passed back as a return value - even if the
condition became true before the timeout has passed.
Fix this by returning at least 1 if the condition becomes true. This
semantic is in line with what wait_for_condition_timeout() does; see
commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
failure under heavy load".
Signed-off-by: Imre Deak <[email protected]>
---
include/linux/wait.h | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/include/linux/wait.h b/include/linux/wait.h
index 7cb64d4..5336842 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -217,6 +217,8 @@ do { \
if (!ret) \
break; \
} \
+ if (!ret && (condition)) \
+ ret = 1; \
finish_wait(&wq, &__wait); \
} while (0)
@@ -233,8 +235,9 @@ do { \
* wake_up() has to be called after changing any variable that could
* change the result of the wait condition.
*
- * The function returns 0 if the @timeout elapsed, and the remaining
- * jiffies if the condition evaluated to true before the timeout elapsed.
+ * The function returns 0 if the @timeout elapsed, or the remaining
+ * jiffies (at least 1) if the @condition evaluated to %true before
+ * the @timeout elapsed.
*/
#define wait_event_timeout(wq, condition, timeout) \
({ \
@@ -302,6 +305,8 @@ do { \
ret = -ERESTARTSYS; \
break; \
} \
+ if (!ret && (condition)) \
+ ret = 1; \
finish_wait(&wq, &__wait); \
} while (0)
@@ -318,9 +323,10 @@ do { \
* wake_up() has to be called after changing any variable that could
* change the result of the wait condition.
*
- * The function returns 0 if the @timeout elapsed, -ERESTARTSYS if it
- * was interrupted by a signal, and the remaining jiffies otherwise
- * if the condition evaluated to true before the timeout elapsed.
+ * Returns:
+ * 0 if the @timeout elapsed, -%ERESTARTSYS if it was interrupted by
+ * a signal, or the remaining jiffies (at least 1) if the @condition
+ * evaluated to %true before the @timeout elapsed.
*/
#define wait_event_interruptible_timeout(wq, condition, timeout) \
({ \
--
1.7.10.4
On Thu, May 2, 2013 at 10:58 AM, Imre Deak <[email protected]> wrote:
> Many callers of the wait_event_timeout() and
> wait_event_interruptible_timeout() expect that the return value will be
> positive if the specified condition becomes true before the timeout
> elapses. However, at the moment this isn't guaranteed. If the wake-up
> handler is delayed enough, the time remaining until timeout will be
> calculated as 0 - and passed back as a return value - even if the
> condition became true before the timeout has passed.
>
> Fix this by returning at least 1 if the condition becomes true. This
> semantic is in line with what wait_for_condition_timeout() does; see
> commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
> failure under heavy load".
>
> Signed-off-by: Imre Deak <[email protected]>
We have 3 instances of this bug in drm/i915. One case even where we
switch between the interruptible and not interruptible
wait_event_timeout variants, foolishly presuming they have the same
semantics. I very much like this.
Acked-by: Daniel Vetter <[email protected]>
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
Imre Deak <[email protected]> wrote:
> Many callers of the wait_event_timeout() and
> wait_event_interruptible_timeout() expect that the return value will be
> positive if the specified condition becomes true before the timeout
> elapses. However, at the moment this isn't guaranteed. If the wake-up
> handler is delayed enough, the time remaining until timeout will be
> calculated as 0 - and passed back as a return value - even if the
> condition became true before the timeout has passed.
Fun.
> Fix this by returning at least 1 if the condition becomes true. This
> semantic is in line with what wait_for_condition_timeout() does; see
> commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
> failure under heavy load".
But now you can't distinguish the timer expiring first, if the thread doing
the waiting gets delayed sufficiently long for the event to happen.
I'm not sure there's a good answer - except maybe making the timer expiry
handler check the condition (which would likely get really yucky really
quickly).
David
On Thu, 2013-05-02 at 11:29 +0100, David Howells wrote:
> Imre Deak <[email protected]> wrote:
>
> > Many callers of the wait_event_timeout() and
> > wait_event_interruptible_timeout() expect that the return value will be
> > positive if the specified condition becomes true before the timeout
> > elapses. However, at the moment this isn't guaranteed. If the wake-up
> > handler is delayed enough, the time remaining until timeout will be
> > calculated as 0 - and passed back as a return value - even if the
> > condition became true before the timeout has passed.
>
> Fun.
>
> > Fix this by returning at least 1 if the condition becomes true. This
> > semantic is in line with what wait_for_condition_timeout() does; see
> > commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
> > failure under heavy load".
>
> But now you can't distinguish the timer expiring first, if the thread doing
> the waiting gets delayed sufficiently long for the event to happen.
I'm trying to understand what sequence do you mean. I can think of the
following - as an example - in case of starting a transaction that will
set a completion flag:
waiter completion handler
start transaction
set completion_flag
ret = wait_event_timeout(timeout, completion_flag)
In this case ret will be timeout which is the original behavior, so
should be ok. One exception is if timeout=0 to begin with, since then -
after this change - ret will be 1. But I can't see how that use case is
useful. I guess I'm missing something, could you elaborate?
--Imre
> I'm not sure there's a good answer - except maybe making the timer expiry
> handler check the condition (which would likely get really yucky really
> quickly).
>
> David
On Thu, May 2, 2013 at 12:29 PM, David Howells <[email protected]> wrote:
>> Fix this by returning at least 1 if the condition becomes true. This
>> semantic is in line with what wait_for_condition_timeout() does; see
>> commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
>> failure under heavy load".
>
> But now you can't distinguish the timer expiring first, if the thread doing
> the waiting gets delayed sufficiently long for the event to happen.
That can already happen, e.g.
1. wakeup happens and condition is true.
2. we compute remaining jiffies > 0
-> preempt
3. now wait_for_event_timeout returns.
Only difference is that the delay/preempt happens in between 1. and
2., and then suddenly the wake up didn't happen in time (with the
current return code semantics).
So imo the current behaviour is simply a bug and will miss timely
wakeups in some cases.
The other way round, namely wait_for_event_timeout taking longer than
the timeout is expected (and part of the interface for every timeout
function). So all current callers already need to be able to cope with
random preemption/delays pushing the total time before the call to
wait_for_event and checking the return value over the timeout, even
when condition was signalled in time.
If there's any case which relies on accurate timeout detection that
simply won't work with wait_for_event (they need an nmi or a hw
timestamp counter or something similar).
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
On Thu, May 02 2013, Daniel Vetter wrote:
> On Thu, May 2, 2013 at 12:29 PM, David Howells <[email protected]> wrote:
> >> Fix this by returning at least 1 if the condition becomes true. This
> >> semantic is in line with what wait_for_condition_timeout() does; see
> >> commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
> >> failure under heavy load".
> >
> > But now you can't distinguish the timer expiring first, if the thread doing
> > the waiting gets delayed sufficiently long for the event to happen.
>
> That can already happen, e.g.
>
> 1. wakeup happens and condition is true.
> 2. we compute remaining jiffies > 0
> -> preempt
> 3. now wait_for_event_timeout returns.
>
> Only difference is that the delay/preempt happens in between 1. and
> 2., and then suddenly the wake up didn't happen in time (with the
> current return code semantics).
>
> So imo the current behaviour is simply a bug and will miss timely
> wakeups in some cases.
>
> The other way round, namely wait_for_event_timeout taking longer than
> the timeout is expected (and part of the interface for every timeout
> function). So all current callers already need to be able to cope with
> random preemption/delays pushing the total time before the call to
> wait_for_event and checking the return value over the timeout, even
> when condition was signalled in time.
>
> If there's any case which relies on accurate timeout detection that
> simply won't work with wait_for_event (they need an nmi or a hw
> timestamp counter or something similar).
I seriously doubt that anyone is depending on any sort of accuracy on
the return. 1 jiffy is not going to make or break anything - in fact,
jiffies could be incremented nsecs after the initial call. So a
granularity of at least 1 is going to be expected in any case.
The important bit here is that the API should behave as expected. And
the most logical way to code that is to check the return value. I can
easily see people forgetting to re-check the condition, hence you get a
bug. The fact that you and the original reporter already had accidents
with this is a clear sign that the logical way to use the API is not the
correct one.
IMHO, the change definitely makes sense.
--
Jens Axboe
Jens Axboe <[email protected]> wrote:
> IMHO, the change definitely makes sense.
Yeah... I think I agree.
David
Imre Deak <[email protected]> wrote:
> Many callers of the wait_event_timeout() and
> wait_event_interruptible_timeout() expect that the return value will be
> positive if the specified condition becomes true before the timeout
> elapses. However, at the moment this isn't guaranteed. If the wake-up
> handler is delayed enough, the time remaining until timeout will be
> calculated as 0 - and passed back as a return value - even if the
> condition became true before the timeout has passed.
>
> Fix this by returning at least 1 if the condition becomes true. This
> semantic is in line with what wait_for_condition_timeout() does; see
> commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
> failure under heavy load".
>
> Signed-off-by: Imre Deak <[email protected]>
Acked-by: David Howells <[email protected]>
On Thu, 2013-05-02 at 14:23 +0200, Jens Axboe wrote:
> On Thu, May 02 2013, Daniel Vetter wrote:
> > On Thu, May 2, 2013 at 12:29 PM, David Howells <[email protected]> wrote:
> > >> Fix this by returning at least 1 if the condition becomes true. This
> > >> semantic is in line with what wait_for_condition_timeout() does; see
> > >> commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
> > >> failure under heavy load".
> > >
> > > But now you can't distinguish the timer expiring first, if the thread doing
> > > the waiting gets delayed sufficiently long for the event to happen.
> >
> > That can already happen, e.g.
> >
> > 1. wakeup happens and condition is true.
> > 2. we compute remaining jiffies > 0
> > -> preempt
> > 3. now wait_for_event_timeout returns.
> >
> > Only difference is that the delay/preempt happens in between 1. and
> > 2., and then suddenly the wake up didn't happen in time (with the
> > current return code semantics).
> >
> > So imo the current behaviour is simply a bug and will miss timely
> > wakeups in some cases.
> >
> > The other way round, namely wait_for_event_timeout taking longer than
> > the timeout is expected (and part of the interface for every timeout
> > function). So all current callers already need to be able to cope with
> > random preemption/delays pushing the total time before the call to
> > wait_for_event and checking the return value over the timeout, even
> > when condition was signalled in time.
> >
> > If there's any case which relies on accurate timeout detection that
> > simply won't work with wait_for_event (they need an nmi or a hw
> > timestamp counter or something similar).
>
> I seriously doubt that anyone is depending on any sort of accuracy on
> the return. 1 jiffy is not going to make or break anything - in fact,
> jiffies could be incremented nsecs after the initial call. So a
> granularity of at least 1 is going to be expected in any case.
>
> The important bit here is that the API should behave as expected. And
> the most logical way to code that is to check the return value. I can
> easily see people forgetting to re-check the condition, hence you get a
> bug. The fact that you and the original reporter already had accidents
> with this is a clear sign that the logical way to use the API is not the
> correct one.
>
> IMHO, the change definitely makes sense.
Ok, so taking courage of this answer ;P How about also the following?
diff --git a/kernel/timer.c b/kernel/timer.c
index dbf7a78..5a62456 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1515,7 +1515,11 @@ signed long __sched schedule_timeout(signed long
timeout)
}
}
- expire = timeout + jiffies;
+ /*
+ * We can't be sure how close we are to the next tick, so +1 to
+ * guarantee that we wait at least timeout amount.
+ */
+ expire = timeout + jiffies + 1;
setup_timer_on_stack(&timer, process_timeout, (unsigned long)current);
__mod_timer(&timer, expire, false, TIMER_NOT_PINNED);
It'd plug a similar hole for wait_event_timeout() and similar users, who
don't compensate for the above..
--Imre
On Thu, May 02 2013, David Howells wrote:
> Imre Deak <[email protected]> wrote:
>
> > Many callers of the wait_event_timeout() and
> > wait_event_interruptible_timeout() expect that the return value will be
> > positive if the specified condition becomes true before the timeout
> > elapses. However, at the moment this isn't guaranteed. If the wake-up
> > handler is delayed enough, the time remaining until timeout will be
> > calculated as 0 - and passed back as a return value - even if the
> > condition became true before the timeout has passed.
> >
> > Fix this by returning at least 1 if the condition becomes true. This
> > semantic is in line with what wait_for_condition_timeout() does; see
> > commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
> > failure under heavy load".
> >
> > Signed-off-by: Imre Deak <[email protected]>
>
> Acked-by: David Howells <[email protected]>
You can add mine, too:
Acked-by: Jens Axboe <[email protected]>
--
Jens Axboe
On Thu, May 02 2013, Imre Deak wrote:
> On Thu, 2013-05-02 at 14:23 +0200, Jens Axboe wrote:
> > On Thu, May 02 2013, Daniel Vetter wrote:
> > > On Thu, May 2, 2013 at 12:29 PM, David Howells <[email protected]> wrote:
> > > >> Fix this by returning at least 1 if the condition becomes true. This
> > > >> semantic is in line with what wait_for_condition_timeout() does; see
> > > >> commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
> > > >> failure under heavy load".
> > > >
> > > > But now you can't distinguish the timer expiring first, if the thread doing
> > > > the waiting gets delayed sufficiently long for the event to happen.
> > >
> > > That can already happen, e.g.
> > >
> > > 1. wakeup happens and condition is true.
> > > 2. we compute remaining jiffies > 0
> > > -> preempt
> > > 3. now wait_for_event_timeout returns.
> > >
> > > Only difference is that the delay/preempt happens in between 1. and
> > > 2., and then suddenly the wake up didn't happen in time (with the
> > > current return code semantics).
> > >
> > > So imo the current behaviour is simply a bug and will miss timely
> > > wakeups in some cases.
> > >
> > > The other way round, namely wait_for_event_timeout taking longer than
> > > the timeout is expected (and part of the interface for every timeout
> > > function). So all current callers already need to be able to cope with
> > > random preemption/delays pushing the total time before the call to
> > > wait_for_event and checking the return value over the timeout, even
> > > when condition was signalled in time.
> > >
> > > If there's any case which relies on accurate timeout detection that
> > > simply won't work with wait_for_event (they need an nmi or a hw
> > > timestamp counter or something similar).
> >
> > I seriously doubt that anyone is depending on any sort of accuracy on
> > the return. 1 jiffy is not going to make or break anything - in fact,
> > jiffies could be incremented nsecs after the initial call. So a
> > granularity of at least 1 is going to be expected in any case.
> >
> > The important bit here is that the API should behave as expected. And
> > the most logical way to code that is to check the return value. I can
> > easily see people forgetting to re-check the condition, hence you get a
> > bug. The fact that you and the original reporter already had accidents
> > with this is a clear sign that the logical way to use the API is not the
> > correct one.
> >
> > IMHO, the change definitely makes sense.
>
> Ok, so taking courage of this answer ;P How about also the following?
>
> diff --git a/kernel/timer.c b/kernel/timer.c
> index dbf7a78..5a62456 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -1515,7 +1515,11 @@ signed long __sched schedule_timeout(signed long
> timeout)
> }
> }
>
> - expire = timeout + jiffies;
> + /*
> + * We can't be sure how close we are to the next tick, so +1 to
> + * guarantee that we wait at least timeout amount.
> + */
> + expire = timeout + jiffies + 1;
>
> setup_timer_on_stack(&timer, process_timeout, (unsigned long)current);
> __mod_timer(&timer, expire, false, TIMER_NOT_PINNED);
>
>
> It'd plug a similar hole for wait_event_timeout() and similar users, who
> don't compensate for the above..
Any jiffy based API is going to have this issue. I think it's different
from the original patch, which just makes the API potentially return
something that is confusing.
So not sure on the above, sorry.
--
Jens Axboe
On Thu, 2013-05-02 at 14:54 +0200, Jens Axboe wrote:
> On Thu, May 02 2013, Imre Deak wrote:
> > On Thu, 2013-05-02 at 14:23 +0200, Jens Axboe wrote:
> > > On Thu, May 02 2013, Daniel Vetter wrote:
> > > > On Thu, May 2, 2013 at 12:29 PM, David Howells <[email protected]> wrote:
> > > > >> Fix this by returning at least 1 if the condition becomes true. This
> > > > >> semantic is in line with what wait_for_condition_timeout() does; see
> > > > >> commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
> > > > >> failure under heavy load".
> > > > >
> > > > > But now you can't distinguish the timer expiring first, if the thread doing
> > > > > the waiting gets delayed sufficiently long for the event to happen.
> > > >
> > > > That can already happen, e.g.
> > > >
> > > > 1. wakeup happens and condition is true.
> > > > 2. we compute remaining jiffies > 0
> > > > -> preempt
> > > > 3. now wait_for_event_timeout returns.
> > > >
> > > > Only difference is that the delay/preempt happens in between 1. and
> > > > 2., and then suddenly the wake up didn't happen in time (with the
> > > > current return code semantics).
> > > >
> > > > So imo the current behaviour is simply a bug and will miss timely
> > > > wakeups in some cases.
> > > >
> > > > The other way round, namely wait_for_event_timeout taking longer than
> > > > the timeout is expected (and part of the interface for every timeout
> > > > function). So all current callers already need to be able to cope with
> > > > random preemption/delays pushing the total time before the call to
> > > > wait_for_event and checking the return value over the timeout, even
> > > > when condition was signalled in time.
> > > >
> > > > If there's any case which relies on accurate timeout detection that
> > > > simply won't work with wait_for_event (they need an nmi or a hw
> > > > timestamp counter or something similar).
> > >
> > > I seriously doubt that anyone is depending on any sort of accuracy on
> > > the return. 1 jiffy is not going to make or break anything - in fact,
> > > jiffies could be incremented nsecs after the initial call. So a
> > > granularity of at least 1 is going to be expected in any case.
> > >
> > > The important bit here is that the API should behave as expected. And
> > > the most logical way to code that is to check the return value. I can
> > > easily see people forgetting to re-check the condition, hence you get a
> > > bug. The fact that you and the original reporter already had accidents
> > > with this is a clear sign that the logical way to use the API is not the
> > > correct one.
> > >
> > > IMHO, the change definitely makes sense.
> >
> > Ok, so taking courage of this answer ;P How about also the following?
> >
> > diff --git a/kernel/timer.c b/kernel/timer.c
> > index dbf7a78..5a62456 100644
> > --- a/kernel/timer.c
> > +++ b/kernel/timer.c
> > @@ -1515,7 +1515,11 @@ signed long __sched schedule_timeout(signed long
> > timeout)
> > }
> > }
> >
> > - expire = timeout + jiffies;
> > + /*
> > + * We can't be sure how close we are to the next tick, so +1 to
> > + * guarantee that we wait at least timeout amount.
> > + */
> > + expire = timeout + jiffies + 1;
> >
> > setup_timer_on_stack(&timer, process_timeout, (unsigned long)current);
> > __mod_timer(&timer, expire, false, TIMER_NOT_PINNED);
> >
> >
> > It'd plug a similar hole for wait_event_timeout() and similar users, who
> > don't compensate for the above..
>
> Any jiffy based API is going to have this issue. I think it's different
> from the original patch, which just makes the API potentially return
> something that is confusing.
Yea, at least those that take a relative time. Usually the timeout is
given with a big overhead, so there it won't be a problem. But I also
found users like drivers/acpi/ec.c, that will pass a timeout of 1
jiffies, which may then result in premature timeouts.
> So not sure on the above, sorry.
Ok. A WARN to wait_event_timeout for the above case could still be a
useful guard..
--Imre
On Thu, May 2, 2013 at 3:56 PM, Imre Deak <[email protected]> wrote:
> On Thu, 2013-05-02 at 14:54 +0200, Jens Axboe wrote:
>> On Thu, May 02 2013, Imre Deak wrote:
>> > On Thu, 2013-05-02 at 14:23 +0200, Jens Axboe wrote:
>> > > On Thu, May 02 2013, Daniel Vetter wrote:
>> > > > On Thu, May 2, 2013 at 12:29 PM, David Howells <[email protected]> wrote:
>> > > > >> Fix this by returning at least 1 if the condition becomes true. This
>> > > > >> semantic is in line with what wait_for_condition_timeout() does; see
>> > > > >> commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
>> > > > >> failure under heavy load".
>> > > > >
>> > > > > But now you can't distinguish the timer expiring first, if the thread doing
>> > > > > the waiting gets delayed sufficiently long for the event to happen.
>> > > >
>> > > > That can already happen, e.g.
>> > > >
>> > > > 1. wakeup happens and condition is true.
>> > > > 2. we compute remaining jiffies > 0
>> > > > -> preempt
>> > > > 3. now wait_for_event_timeout returns.
>> > > >
>> > > > Only difference is that the delay/preempt happens in between 1. and
>> > > > 2., and then suddenly the wake up didn't happen in time (with the
>> > > > current return code semantics).
>> > > >
>> > > > So imo the current behaviour is simply a bug and will miss timely
>> > > > wakeups in some cases.
>> > > >
>> > > > The other way round, namely wait_for_event_timeout taking longer than
>> > > > the timeout is expected (and part of the interface for every timeout
>> > > > function). So all current callers already need to be able to cope with
>> > > > random preemption/delays pushing the total time before the call to
>> > > > wait_for_event and checking the return value over the timeout, even
>> > > > when condition was signalled in time.
>> > > >
>> > > > If there's any case which relies on accurate timeout detection that
>> > > > simply won't work with wait_for_event (they need an nmi or a hw
>> > > > timestamp counter or something similar).
>> > >
>> > > I seriously doubt that anyone is depending on any sort of accuracy on
>> > > the return. 1 jiffy is not going to make or break anything - in fact,
>> > > jiffies could be incremented nsecs after the initial call. So a
>> > > granularity of at least 1 is going to be expected in any case.
>> > >
>> > > The important bit here is that the API should behave as expected. And
>> > > the most logical way to code that is to check the return value. I can
>> > > easily see people forgetting to re-check the condition, hence you get a
>> > > bug. The fact that you and the original reporter already had accidents
>> > > with this is a clear sign that the logical way to use the API is not the
>> > > correct one.
>> > >
>> > > IMHO, the change definitely makes sense.
>> >
>> > Ok, so taking courage of this answer ;P How about also the following?
>> >
>> > diff --git a/kernel/timer.c b/kernel/timer.c
>> > index dbf7a78..5a62456 100644
>> > --- a/kernel/timer.c
>> > +++ b/kernel/timer.c
>> > @@ -1515,7 +1515,11 @@ signed long __sched schedule_timeout(signed long
>> > timeout)
>> > }
>> > }
>> >
>> > - expire = timeout + jiffies;
>> > + /*
>> > + * We can't be sure how close we are to the next tick, so +1 to
>> > + * guarantee that we wait at least timeout amount.
>> > + */
>> > + expire = timeout + jiffies + 1;
>> >
>> > setup_timer_on_stack(&timer, process_timeout, (unsigned long)current);
>> > __mod_timer(&timer, expire, false, TIMER_NOT_PINNED);
>> >
>> >
>> > It'd plug a similar hole for wait_event_timeout() and similar users, who
>> > don't compensate for the above..
>>
>> Any jiffy based API is going to have this issue. I think it's different
>> from the original patch, which just makes the API potentially return
>> something that is confusing.
>
> Yea, at least those that take a relative time. Usually the timeout is
> given with a big overhead, so there it won't be a problem. But I also
> found users like drivers/acpi/ec.c, that will pass a timeout of 1
> jiffies, which may then result in premature timeouts.
In drm/i915 we have a bunch of those 1 jiffie timeouts, too,
especially with low HZ configs. For easy comparison with hw docs we
try to stick to the documented timeout limits and convert them with
msec_to_jiffies (which does the right thing wrt rounding). But since
these timeouts are often in the low ms range, that often means just 1
jiffie timeouts. So the timeout could be off by a large factor (and we
have the bugzillas to prove it happens in reality).
I'm not sure whether sprinkling +1 all over the codebase is a that
nice approach. Adding the +1 in sched_timeout otoh would give us a
nice idiot proof interface for driver guys like me. We also have some
custom register wait loops using msleep(1) and jiffy based timeouts,
and there our driver macro does the +1. That helped a lot in avoiding
stupid bugs, but now we've converted a few things over to interrupts +
timeout and the bugs are back ...
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
On Thu, 2013-05-02 at 14:35 +0200, Jens Axboe wrote:
> On Thu, May 02 2013, David Howells wrote:
> > Imre Deak <[email protected]> wrote:
> >
> > > Many callers of the wait_event_timeout() and
> > > wait_event_interruptible_timeout() expect that the return value will be
> > > positive if the specified condition becomes true before the timeout
> > > elapses. However, at the moment this isn't guaranteed. If the wake-up
> > > handler is delayed enough, the time remaining until timeout will be
> > > calculated as 0 - and passed back as a return value - even if the
> > > condition became true before the timeout has passed.
> > >
> > > Fix this by returning at least 1 if the condition becomes true. This
> > > semantic is in line with what wait_for_condition_timeout() does; see
> > > commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
> > > failure under heavy load".
> > >
> > > Signed-off-by: Imre Deak <[email protected]>
> >
> > Acked-by: David Howells <[email protected]>
>
> You can add mine, too:
>
> Acked-by: Jens Axboe <[email protected]>
Ok, I think we agree that the +1 thing in schedule_timeout() discussed
in this thread should be handled separately, so if there is no other
objection I'd be happy if this patch was merged through someone's tree
as-is.
--Imre
On Thu, 2 May 2013 11:36:56 +0200 Daniel Vetter <[email protected]> wrote:
> On Thu, May 2, 2013 at 10:58 AM, Imre Deak <[email protected]> wrote:
> > Many callers of the wait_event_timeout() and
> > wait_event_interruptible_timeout() expect that the return value will be
> > positive if the specified condition becomes true before the timeout
> > elapses. However, at the moment this isn't guaranteed. If the wake-up
> > handler is delayed enough, the time remaining until timeout will be
> > calculated as 0 - and passed back as a return value - even if the
> > condition became true before the timeout has passed.
> >
> > Fix this by returning at least 1 if the condition becomes true. This
> > semantic is in line with what wait_for_condition_timeout() does; see
> > commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
> > failure under heavy load".
> >
> > Signed-off-by: Imre Deak <[email protected]>
>
> We have 3 instances of this bug in drm/i915. One case even where we
> switch between the interruptible and not interruptible
> wait_event_timeout variants, foolishly presuming they have the same
> semantics. I very much like this.
Let's think about scheduling this fix.
Are any of the bugs which we expect this patch fixes serious enough to
warrant merging it into 3.10? And -stable?
On Tue, 2013-05-07 at 16:12 -0700, Andrew Morton wrote:
> On Thu, 2 May 2013 11:36:56 +0200 Daniel Vetter <[email protected]> wrote:
>
> > On Thu, May 2, 2013 at 10:58 AM, Imre Deak <[email protected]> wrote:
> > > Many callers of the wait_event_timeout() and
> > > wait_event_interruptible_timeout() expect that the return value will be
> > > positive if the specified condition becomes true before the timeout
> > > elapses. However, at the moment this isn't guaranteed. If the wake-up
> > > handler is delayed enough, the time remaining until timeout will be
> > > calculated as 0 - and passed back as a return value - even if the
> > > condition became true before the timeout has passed.
> > >
> > > Fix this by returning at least 1 if the condition becomes true. This
> > > semantic is in line with what wait_for_condition_timeout() does; see
> > > commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
> > > failure under heavy load".
> > >
> > > Signed-off-by: Imre Deak <[email protected]>
> >
> > We have 3 instances of this bug in drm/i915. One case even where we
> > switch between the interruptible and not interruptible
> > wait_event_timeout variants, foolishly presuming they have the same
> > semantics. I very much like this.
>
> Let's think about scheduling this fix.
>
> Are any of the bugs which we expect this patch fixes serious enough to
> warrant merging it into 3.10? And -stable?
There is at least [1], but I'm sure there is more similar reports about
i915. I'd vote for -stable at least.
--Imre
[1] https://bugs.freedesktop.org/show_bug.cgi?id=64133
Hello,
Just noticed this commit...
commit 4c663cfc523a88d97a8309b04a089c27dc57fd7e
Author: Imre Deak <[email protected]>
Date: Fri May 24 15:55:09 2013 -0700
Many callers of the wait_event_timeout() and
wait_event_interruptible_timeout() expect that the return value will be
positive if the specified condition becomes true before the timeout
elapses. However, at the moment this isn't guaranteed. If the wake-up
handler is delayed enough, the time remaining until timeout will be
calculated as 0 - and passed back as a return value - even if the
condition became true before the timeout has passed.
OK, agreed.
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -217,6 +217,8 @@ do { \
if (!ret) \
break; \
} \
+ if (!ret && (condition)) \
+ ret = 1; \
finish_wait(&wq, &__wait); \
} while (0)
Well, this evaluates "condition" twice, perhaps it would be more
clean to do, say,
#define __wait_event_timeout(wq, condition, ret) \
do { \
DEFINE_WAIT(__wait); \
\
for (;;) { \
prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE); \
if (condition) { \
if (!ret) \
ret = 1; \
break; \
} else if (!ret) \
break; \
ret = schedule_timeout(ret); \
} \
finish_wait(&wq, &__wait); \
} while (0)
but this is minor.
@@ -233,8 +235,9 @@ do { \
* wake_up() has to be called after changing any variable that could
* change the result of the wait condition.
*
- * The function returns 0 if the @timeout elapsed, and the remaining
- * jiffies if the condition evaluated to true before the timeout elapsed.
+ * The function returns 0 if the @timeout elapsed, or the remaining
+ * jiffies (at least 1) if the @condition evaluated to %true before
+ * the @timeout elapsed.
This is still not true if timeout == 0.
Shouldn't we also change wait_event_timeout() ? Say,
#define wait_event_timeout(wq, condition, timeout) \
({ \
long __ret = timeout; \
if (!(condition)) \
__wait_event_timeout(wq, condition, __ret); \
else if (!__ret) \
__ret = 1; \
__ret; \
})
Or wait_event_timeout(timeout => 0) is not legal in a non-void context?
To me the code like
long wait_for_something(bool nonblock)
{
timeout = nonblock ? 0 : DEFAULT_TIMEOUT;
return wait_event_timeout(..., timeout);
}
looks reasonable and correct. But it is not?
Oleg.
On Tue, 2013-06-04 at 21:28 +0200, Oleg Nesterov wrote:
> Hello,
>
> Just noticed this commit...
>
> commit 4c663cfc523a88d97a8309b04a089c27dc57fd7e
> Author: Imre Deak <[email protected]>
> Date: Fri May 24 15:55:09 2013 -0700
>
> Many callers of the wait_event_timeout() and
> wait_event_interruptible_timeout() expect that the return value will be
> positive if the specified condition becomes true before the timeout
> elapses. However, at the moment this isn't guaranteed. If the wake-up
> handler is delayed enough, the time remaining until timeout will be
> calculated as 0 - and passed back as a return value - even if the
> condition became true before the timeout has passed.
>
> OK, agreed.
>
> --- a/include/linux/wait.h
> +++ b/include/linux/wait.h
> @@ -217,6 +217,8 @@ do { \
> if (!ret) \
> break; \
> } \
> + if (!ret && (condition)) \
> + ret = 1; \
> finish_wait(&wq, &__wait); \
> } while (0)
>
> Well, this evaluates "condition" twice, perhaps it would be more
> clean to do, say,
>
> #define __wait_event_timeout(wq, condition, ret) \
> do { \
> DEFINE_WAIT(__wait); \
> \
> for (;;) { \
> prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE); \
> if (condition) { \
> if (!ret) \
> ret = 1; \
> break; \
> } else if (!ret) \
> break; \
> ret = schedule_timeout(ret); \
> } \
> finish_wait(&wq, &__wait); \
> } while (0)
>
> but this is minor.
>
> @@ -233,8 +235,9 @@ do { \
> * wake_up() has to be called after changing any variable that could
> * change the result of the wait condition.
> *
> - * The function returns 0 if the @timeout elapsed, and the remaining
> - * jiffies if the condition evaluated to true before the timeout elapsed.
> + * The function returns 0 if the @timeout elapsed, or the remaining
> + * jiffies (at least 1) if the @condition evaluated to %true before
> + * the @timeout elapsed.
>
> This is still not true if timeout == 0.
>
> Shouldn't we also change wait_event_timeout() ? Say,
>
> #define wait_event_timeout(wq, condition, timeout) \
> ({ \
> long __ret = timeout; \
> if (!(condition)) \
> __wait_event_timeout(wq, condition, __ret); \
> else if (!__ret) \
> __ret = 1; \
> __ret; \
> })
>
> Or wait_event_timeout(timeout => 0) is not legal in a non-void context?
>
> To me the code like
>
> long wait_for_something(bool nonblock)
> {
> timeout = nonblock ? 0 : DEFAULT_TIMEOUT;
> return wait_event_timeout(..., timeout);
> }
>
> looks reasonable and correct. But it is not?
I don't see why it would be not legal. Note though that in the above
form wait_event_timeout(cond, 0) would still schedule() if cond is
false, which is not what I'd expect from a non-blocking function.
--Imre
On Wed, 2013-06-05 at 00:35 +0300, Imre Deak wrote:
> On Tue, 2013-06-04 at 21:28 +0200, Oleg Nesterov wrote:
> > Hello,
> >
> > Just noticed this commit...
> >
> > commit 4c663cfc523a88d97a8309b04a089c27dc57fd7e
> > Author: Imre Deak <[email protected]>
> > Date: Fri May 24 15:55:09 2013 -0700
> >
> > Many callers of the wait_event_timeout() and
> > wait_event_interruptible_timeout() expect that the return value will be
> > positive if the specified condition becomes true before the timeout
> > elapses. However, at the moment this isn't guaranteed. If the wake-up
> > handler is delayed enough, the time remaining until timeout will be
> > calculated as 0 - and passed back as a return value - even if the
> > condition became true before the timeout has passed.
> >
> > OK, agreed.
> >
> > --- a/include/linux/wait.h
> > +++ b/include/linux/wait.h
> > @@ -217,6 +217,8 @@ do { \
> > if (!ret) \
> > break; \
> > } \
> > + if (!ret && (condition)) \
> > + ret = 1; \
> > finish_wait(&wq, &__wait); \
> > } while (0)
> >
> > Well, this evaluates "condition" twice, perhaps it would be more
> > clean to do, say,
> >
> > #define __wait_event_timeout(wq, condition, ret) \
> > do { \
> > DEFINE_WAIT(__wait); \
> > \
> > for (;;) { \
> > prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE); \
> > if (condition) { \
> > if (!ret) \
> > ret = 1; \
> > break; \
> > } else if (!ret) \
> > break; \
> > ret = schedule_timeout(ret); \
> > } \
> > finish_wait(&wq, &__wait); \
> > } while (0)
> >
> > but this is minor.
> >
> > @@ -233,8 +235,9 @@ do { \
> > * wake_up() has to be called after changing any variable that could
> > * change the result of the wait condition.
> > *
> > - * The function returns 0 if the @timeout elapsed, and the remaining
> > - * jiffies if the condition evaluated to true before the timeout elapsed.
> > + * The function returns 0 if the @timeout elapsed, or the remaining
> > + * jiffies (at least 1) if the @condition evaluated to %true before
> > + * the @timeout elapsed.
> >
> > This is still not true if timeout == 0.
> >
> > Shouldn't we also change wait_event_timeout() ? Say,
> >
> > #define wait_event_timeout(wq, condition, timeout) \
> > ({ \
> > long __ret = timeout; \
> > if (!(condition)) \
> > __wait_event_timeout(wq, condition, __ret); \
> > else if (!__ret) \
> > __ret = 1; \
> > __ret; \
> > })
> >
> > Or wait_event_timeout(timeout => 0) is not legal in a non-void context?
> >
> > To me the code like
> >
> > long wait_for_something(bool nonblock)
> > {
> > timeout = nonblock ? 0 : DEFAULT_TIMEOUT;
> > return wait_event_timeout(..., timeout);
> > }
> >
> > looks reasonable and correct. But it is not?
>
> I don't see why it would be not legal. Note though that in the above
> form wait_event_timeout(cond, 0) would still schedule() if cond is
> false, which is not what I'd expect from a non-blocking function.
Ah sorry, if you also rewrite __wait_event_timeout() then timeout=>0
wouldn't schedule(), so things would work as expected.
--Imre
On 06/05, Imre Deak wrote:
>
> On Wed, 2013-06-05 at 00:35 +0300, Imre Deak wrote:
> > On Tue, 2013-06-04 at 21:28 +0200, Oleg Nesterov wrote:
> > >
> > > Shouldn't we also change wait_event_timeout() ? Say,
> > >
> > > #define wait_event_timeout(wq, condition, timeout) \
> > > ({ \
> > > long __ret = timeout; \
> > > if (!(condition)) \
> > > __wait_event_timeout(wq, condition, __ret); \
> > > else if (!__ret) \
> > > __ret = 1; \
> > > __ret; \
> > > })
> > >
> > > Or wait_event_timeout(timeout => 0) is not legal in a non-void context?
> > >
> > > To me the code like
> > >
> > > long wait_for_something(bool nonblock)
> > > {
> > > timeout = nonblock ? 0 : DEFAULT_TIMEOUT;
> > > return wait_event_timeout(..., timeout);
> > > }
> > >
> > > looks reasonable and correct. But it is not?
> >
> > I don't see why it would be not legal. Note though that in the above
> > form wait_event_timeout(cond, 0) would still schedule() if cond is
> > false, which is not what I'd expect from a non-blocking function.
Yes, if false. But what if it is true?
> Ah sorry, if you also rewrite __wait_event_timeout() then timeout=>0
> wouldn't schedule(), so things would work as expected.
Can't understand... probably you missed my point. Let me try again.
I think that wait_eveint_timeout(wq, COND, 0) should return !!(COND).
But it doesn't, for example wait_event_timeout(wq, true, 0) == 0, this
doesn'tlook right to me.
And, this is off-topic, but wait_eveint_timeout/__wait_eveint_timeout
do not match wait_event/__wait_event. IOW, you can't use
__wait_eveint_timeout() if you do not need the fast-path check.
So. How about
#define __wait_event_timeout(wq, condition, timeout) \
({ \
DEFINE_WAIT(__wait); \
long __ret = 0, __to = timeout; \
\
for (;;) { \
prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE); \
if (condition) { \
__ret = __to ?: 1; \
break; \
} \
if (!__to) \
break; \
__to = schedule_timeout(__to); \
} \
finish_wait(&wq, &__wait); \
__ret; \
})
#define wait_event_timeout(wq, condition, timeout) \
({ \
long __ret; \
if (condition) \
__ret = (timeout) ?: 1; \
else \
__ret = __wait_event_timeout(wq, condition, timeout); \
__ret; \
})
?
Othwerwise we should document the fact that the caller should alvays verify
timeout != 0 if it checks the result of wait_event_timeout().
On 06/05, Oleg Nesterov wrote:
>
> I think that wait_eveint_timeout(wq, COND, 0) should return !!(COND).
> But it doesn't, for example wait_event_timeout(wq, true, 0) == 0, this
> doesn'tlook right to me.
>
> And, this is off-topic, but wait_eveint_timeout/__wait_eveint_timeout
> do not match wait_event/__wait_event. IOW, you can't use
> __wait_eveint_timeout() if you do not need the fast-path check.
>
> So. How about
>
> #define __wait_event_timeout(wq, condition, timeout) \
> ({ \
> DEFINE_WAIT(__wait); \
> long __ret = 0, __to = timeout; \
> \
> for (;;) { \
> prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE); \
> if (condition) { \
> __ret = __to ?: 1; \
> break; \
> } \
> if (!__to) \
> break; \
> __to = schedule_timeout(__to); \
> } \
> finish_wait(&wq, &__wait); \
> __ret; \
> })
>
> #define wait_event_timeout(wq, condition, timeout) \
> ({ \
> long __ret; \
> if (condition) \
> __ret = (timeout) ?: 1; \
> else \
> __ret = __wait_event_timeout(wq, condition, timeout); \
> __ret; \
> })
>
> ?
>
> Othwerwise we should document the fact that the caller should alvays verify
> timeout != 0 if it checks the result of wait_event_timeout().
And in fact, perhaps we can implement wait_event_common() and avoid the
code duplications?
#define __wait_no_timeout(timeout) \
(__builtin_constant_p(timeout) && (timeout) == MAX_SCHEDULE_TIMEOUT)
/* uglified signal_pending_state() */
#define __wait_signal_pending(state) \
((state == TASK_INTERRUPTIBLE) ? signal_pending(current) : \
(state == TASK_KILLABLE) ? fatal_signal_pending(current) : \
0)
#define __wait_event_common(wq, condition, state, tout) \
({ \
DEFINE_WAIT(__wait); \
long __ret = 0, __tout = tout; \
\
for (;;) { \
prepare_to_wait(&wq, &__wait, state); \
if (condition) { \
__ret = __wait_no_timeout(tout) ?: __tout ?: 1; \
break; \
} \
\
if (__wait_signal_pending(state)) { \
__ret = -ERESTARTSYS; \
break; \
} \
\
if (__wait_no_timeout(tout)) \
schedule(); \
else if (__tout) \
__tout = schedule_timeout(__tout); \
else \
break; \
} \
finish_wait(&wq, &__wait); \
__ret; \
})
#define wait_event_common(wq, condition, state, tout) \
({ \
long __ret; \
if (condition) \
__ret = __wait_no_timeout(tout) ?: (tout) ?: 1; \
else \
__ret = __wait_event_common(wq, condition, state, tout);\
__ret; \
})
Then, for example, wait_event() becomes
#define wait_event(wq, condition) \
wait_event_common(wq, condition, \
TASK_UNINTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT) \
Hmm. I compiled the kernel with the patch below,
$ size vmlinux
text data bss dec hex filename
- 4978601 2935080 10104832 18018513 112f0d1 vmlinux
+ 4977769 2930984 10104832 18013585 112dd91 vmlinux
looks good...
Oleg.
diff --git a/include/linux/wait.h b/include/linux/wait.h
index 1133695..359fffc 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -173,18 +173,52 @@ wait_queue_head_t *bit_waitqueue(void *, int);
#define wake_up_interruptible_sync_poll(x, m) \
__wake_up_sync_key((x), TASK_INTERRUPTIBLE, 1, (void *) (m))
-#define __wait_event(wq, condition) \
-do { \
+#define __wait_no_timeout(timeout) \
+ (__builtin_constant_p(timeout) && (timeout) == MAX_SCHEDULE_TIMEOUT)
+
+/* uglified signal_pending_state() */
+#define __wait_signal_pending(state) \
+ ((state == TASK_INTERRUPTIBLE) ? signal_pending(current) : \
+ (state == TASK_KILLABLE) ? fatal_signal_pending(current) : \
+ 0)
+
+#define __wait_event_common(wq, condition, state, tout) \
+({ \
DEFINE_WAIT(__wait); \
+ long __ret = 0, __tout = tout; \
\
for (;;) { \
- prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE); \
- if (condition) \
+ prepare_to_wait(&wq, &__wait, state); \
+ if (condition) { \
+ __ret = __wait_no_timeout(tout) ?: __tout ?: 1; \
+ break; \
+ } \
+ \
+ if (__wait_signal_pending(state)) { \
+ __ret = -ERESTARTSYS; \
+ break; \
+ } \
+ \
+ if (__wait_no_timeout(tout)) \
+ schedule(); \
+ else if (__tout) \
+ __tout = schedule_timeout(__tout); \
+ else \
break; \
- schedule(); \
} \
finish_wait(&wq, &__wait); \
-} while (0)
+ __ret; \
+})
+
+#define wait_event_common(wq, condition, state, tout) \
+({ \
+ long __ret; \
+ if (condition) \
+ __ret = __wait_no_timeout(tout) ?: (tout) ?: 1; \
+ else \
+ __ret = __wait_event_common(wq, condition, state, tout);\
+ __ret; \
+})
/**
* wait_event - sleep until a condition gets true
@@ -198,29 +232,13 @@ do { \
* wake_up() has to be called after changing any variable that could
* change the result of the wait condition.
*/
-#define wait_event(wq, condition) \
-do { \
- if (condition) \
- break; \
- __wait_event(wq, condition); \
-} while (0)
+#define __wait_event(wq, condition) \
+ __wait_event_common(wq, condition, \
+ TASK_UNINTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT) \
-#define __wait_event_timeout(wq, condition, ret) \
-do { \
- DEFINE_WAIT(__wait); \
- \
- for (;;) { \
- prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE); \
- if (condition) \
- break; \
- ret = schedule_timeout(ret); \
- if (!ret) \
- break; \
- } \
- if (!ret && (condition)) \
- ret = 1; \
- finish_wait(&wq, &__wait); \
-} while (0)
+#define wait_event(wq, condition) \
+ wait_event_common(wq, condition, \
+ TASK_UNINTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT) \
/**
* wait_event_timeout - sleep until a condition gets true or a timeout elapses
@@ -239,31 +257,13 @@ do { \
* jiffies (at least 1) if the @condition evaluated to %true before
* the @timeout elapsed.
*/
-#define wait_event_timeout(wq, condition, timeout) \
-({ \
- long __ret = timeout; \
- if (!(condition)) \
- __wait_event_timeout(wq, condition, __ret); \
- __ret; \
-})
+#define __wait_event_timeout(wq, condition, timeout) \
+ __wait_event_common(wq, condition, \
+ TASK_UNINTERRUPTIBLE, timeout)
-#define __wait_event_interruptible(wq, condition, ret) \
-do { \
- DEFINE_WAIT(__wait); \
- \
- for (;;) { \
- prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE); \
- if (condition) \
- break; \
- if (!signal_pending(current)) { \
- schedule(); \
- continue; \
- } \
- ret = -ERESTARTSYS; \
- break; \
- } \
- finish_wait(&wq, &__wait); \
-} while (0)
+#define wait_event_timeout(wq, condition, timeout) \
+ wait_event_common(wq, condition, \
+ TASK_UNINTERRUPTIBLE, timeout)
/**
* wait_event_interruptible - sleep until a condition gets true
@@ -280,35 +280,13 @@ do { \
* The function will return -ERESTARTSYS if it was interrupted by a
* signal and 0 if @condition evaluated to true.
*/
-#define wait_event_interruptible(wq, condition) \
-({ \
- int __ret = 0; \
- if (!(condition)) \
- __wait_event_interruptible(wq, condition, __ret); \
- __ret; \
-})
+#define __wait_event_interruptible(wq, condition) \
+ __wait_event_common(wq, condition, \
+ TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT)
-#define __wait_event_interruptible_timeout(wq, condition, ret) \
-do { \
- DEFINE_WAIT(__wait); \
- \
- for (;;) { \
- prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE); \
- if (condition) \
- break; \
- if (!signal_pending(current)) { \
- ret = schedule_timeout(ret); \
- if (!ret) \
- break; \
- continue; \
- } \
- ret = -ERESTARTSYS; \
- break; \
- } \
- if (!ret && (condition)) \
- ret = 1; \
- finish_wait(&wq, &__wait); \
-} while (0)
+#define wait_event_interruptible(wq, condition) \
+ wait_event_common(wq, condition, \
+ TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT)
/**
* wait_event_interruptible_timeout - sleep until a condition gets true or a timeout elapses
@@ -328,13 +306,13 @@ do { \
* a signal, or the remaining jiffies (at least 1) if the @condition
* evaluated to %true before the @timeout elapsed.
*/
+#define __wait_event_interruptible_timeout(wq, condition, timeout) \
+ __wait_event_common(wq, condition, \
+ TASK_INTERRUPTIBLE, timeout)
+
#define wait_event_interruptible_timeout(wq, condition, timeout) \
-({ \
- long __ret = timeout; \
- if (!(condition)) \
- __wait_event_interruptible_timeout(wq, condition, __ret); \
- __ret; \
-})
+ wait_event_common(wq, condition, \
+ TASK_INTERRUPTIBLE, timeout)
#define __wait_event_hrtimeout(wq, condition, timeout, state) \
({ \
@@ -601,24 +579,6 @@ do { \
-#define __wait_event_killable(wq, condition, ret) \
-do { \
- DEFINE_WAIT(__wait); \
- \
- for (;;) { \
- prepare_to_wait(&wq, &__wait, TASK_KILLABLE); \
- if (condition) \
- break; \
- if (!fatal_signal_pending(current)) { \
- schedule(); \
- continue; \
- } \
- ret = -ERESTARTSYS; \
- break; \
- } \
- finish_wait(&wq, &__wait); \
-} while (0)
-
/**
* wait_event_killable - sleep until a condition gets true
* @wq: the waitqueue to wait on
@@ -634,14 +594,13 @@ do { \
* The function will return -ERESTARTSYS if it was interrupted by a
* signal and 0 if @condition evaluated to true.
*/
-#define wait_event_killable(wq, condition) \
-({ \
- int __ret = 0; \
- if (!(condition)) \
- __wait_event_killable(wq, condition, __ret); \
- __ret; \
-})
+#define __wait_event_killable(wq, condition) \
+ __wait_event_common(wq, condition, \
+ TASK_KILLABLE, MAX_SCHEDULE_TIMEOUT)
+#define wait_event_killable(wq, condition) \
+ wait_event_common(wq, condition, \
+ TASK_KILLABLE, MAX_SCHEDULE_TIMEOUT)
#define __wait_event_lock_irq(wq, condition, lock, cmd) \
do { \
Hello, Oleg.
On Wed, Jun 05, 2013 at 09:07:23PM +0200, Oleg Nesterov wrote:
> And in fact, perhaps we can implement wait_event_common() and avoid the
> code duplications?
>
> #define __wait_no_timeout(timeout) \
> (__builtin_constant_p(timeout) && (timeout) == MAX_SCHEDULE_TIMEOUT)
>
> /* uglified signal_pending_state() */
> #define __wait_signal_pending(state) \
> ((state == TASK_INTERRUPTIBLE) ? signal_pending(current) : \
> (state == TASK_KILLABLE) ? fatal_signal_pending(current) : \
> 0)
>
> #define __wait_event_common(wq, condition, state, tout) \
> ({ \
> DEFINE_WAIT(__wait); \
> long __ret = 0, __tout = tout; \
> \
> for (;;) { \
> prepare_to_wait(&wq, &__wait, state); \
> if (condition) { \
> __ret = __wait_no_timeout(tout) ?: __tout ?: 1; \
> break; \
> } \
> \
> if (__wait_signal_pending(state)) { \
> __ret = -ERESTARTSYS; \
> break; \
> } \
> \
> if (__wait_no_timeout(tout)) \
> schedule(); \
> else if (__tout) \
> __tout = schedule_timeout(__tout); \
> else \
> break; \
> } \
> finish_wait(&wq, &__wait); \
> __ret; \
> })
Heh, yeah, this looks good to me and a lot better than trying to do
the same thing over and over again and ending up with subtle
differences.
> Hmm. I compiled the kernel with the patch below,
>
> $ size vmlinux
> text data bss dec hex filename
> - 4978601 2935080 10104832 18018513 112f0d1 vmlinux
> + 4977769 2930984 10104832 18013585 112dd91 vmlinux
Nice. Provided you went over assembly outputs of at least some
combinations, please feel free to add
Reviewed-by: Tejun Heo <[email protected]>
Thanks.
--
tejun
Hi Tejun,
On 06/05, Tejun Heo wrote:
>
> Heh, yeah, this looks good to me and a lot better than trying to do
> the same thing over and over again and ending up with subtle
> differences.
Yes, this is the goal. Of course we could fix wait_event_timout() and
wait_event_interruptible_timeout() without unification, but this would
add more copy-and-paste.
> > Hmm. I compiled the kernel with the patch below,
> >
> > $ size vmlinux
> > text data bss dec hex filename
> > - 4978601 2935080 10104832 18018513 112f0d1 vmlinux
> > + 4977769 2930984 10104832 18013585 112dd91 vmlinux
>
> Nice. Provided you went over assembly outputs of at least some
> combinations,
I did, and that is why I am surprized by the numbers above.
Yes, gcc optimizes out the unwanted checks but the generated code
differs, of course. And sometimes the difference looks "random" to me.
Simplest example:
extern wait_queue_head_t WQ;
extern int COND;
void we(void)
{
wait_event(WQ, COND);
}
before:
we:
pushq %rbp #
movq %rsp, %rbp #,
pushq %rbx #
subq $56, %rsp #,
call mcount
movl COND(%rip), %edx # COND,
testl %edx, %edx #
jne .L5 #,
leaq -64(%rbp), %rbx #, tmp66
movq $0, -64(%rbp) #, __wait
movq $autoremove_wake_function, -48(%rbp) #, __wait.func
#APP
# 14 "/home/tmp/K/misc/arch/x86/include/asm/current.h" 1
movq %gs:current_task,%rax #, pfo_ret__
# 0 "" 2
#NO_APP
movq %rax, -56(%rbp) # pfo_ret__, __wait.private
leaq 24(%rbx), %rax #, tmp61
movq %rax, -40(%rbp) # tmp61, __wait.task_list.next
movq %rax, -32(%rbp) # tmp61, __wait.task_list.prev
.p2align 4,,10
.p2align 3
.L4:
movl $2, %edx #,
movq %rbx, %rsi # tmp66,
movq $WQ, %rdi #,
call prepare_to_wait #
movl COND(%rip), %eax # COND,
testl %eax, %eax #
jne .L3 #,
call schedule #
jmp .L4 #
.p2align 4,,10
.p2align 3
.L3:
movq %rbx, %rsi # tmp66,
movq $WQ, %rdi #,
call finish_wait #
.L5:
addq $56, %rsp #,
popq %rbx #
leave
ret
after:
we:
pushq %rbp #
movq %rsp, %rbp #,
pushq %rbx #
subq $56, %rsp #,
call mcount
movl COND(%rip), %edx # COND,
testl %edx, %edx #
jne .L5 #,
leaq -64(%rbp), %rbx #, tmp66
movq $0, -64(%rbp) #, __wait
movq $autoremove_wake_function, -48(%rbp) #, __wait.func
#APP
# 14 "/home/tmp/K/misc/arch/x86/include/asm/current.h" 1
movq %gs:current_task,%rax #, pfo_ret__
# 0 "" 2
#NO_APP
movq %rax, -56(%rbp) # pfo_ret__, __wait.private
leaq 24(%rbx), %rax #, tmp61
movq %rax, -40(%rbp) # tmp61, __wait.task_list.next
movq %rax, -32(%rbp) # tmp61, __wait.task_list.prev
.p2align 4,,10
.p2align 3
.L4:
movl $2, %edx #,
movq %rbx, %rsi # tmp66,
movq $WQ, %rdi #,
call prepare_to_wait #
movl COND(%rip), %eax # COND,
testl %eax, %eax #
je .L3 #,
movq %rbx, %rsi # tmp66,
movq $WQ, %rdi #,
call finish_wait #
jmp .L5 #
.p2align 4,,10
.p2align 3
.L3:
call schedule #
.p2align 4,,8
.p2align 3
jmp .L4 #
.p2align 4,,10
.p2align 3
.L5:
addq $56, %rsp #,
popq %rbx #
leave
.p2align 4,,4
.p2align 3
ret
As you can see, with this patch gcc generates a bit more code. But
only because reorderes finish_wait() and inserts the additional nops,
otherwise code the same. I think this difference is not "reliable"
and can be ignored.
But, the code like "return wait_even_timeout(true, non_const_timeout)"
really generates more code and this is correct: the patch tries to fix
the bug (at least I think this is a bug) and the additional code ensures
that __ret != 0.
> Reviewed-by: Tejun Heo <[email protected]>
Thanks! I'll send this patch soon.
Oleg.