2021-05-15 16:40:47

by Manfred Spraul

[permalink] [raw]
Subject: Re: [RFC PATCH] ipc/mqueue: avoid sleep after wakeup

On 5/15/21 6:06 AM, Hillf Danton wrote:
> On Fri, 14 May 2021 17:51:47 +0200 Manfred Spraul wrote:
>> On 5/14/21 5:01 AM, Hillf Danton wrote:
>>> The pipeline waker could start doing its job once waiter releases lock and
>>> get the work done before waiter takes a nap, so check wait condition before
>>> sleep to avoid waiting the wakeup that will never come, though that does not
>>> hurt much thanks to timer timeouts like a second.
>> First: The timeout could be infinity, thus the code must not rely on a
>> timeout wakeup.
>>
>> A wrong wait is would be a bug.
>>
>>
>>> Check signal for the same reason.
>>>
>>> Signed-off-by: Hillf Danton <[email protected]>
>>> ---
>>>
>>> --- y/ipc/mqueue.c
>>> +++ x/ipc/mqueue.c
>>> @@ -710,15 +710,24 @@ static int wq_sleep(struct mqueue_inode_
>>> __set_current_state(TASK_INTERRUPTIBLE);
>>>
>>> spin_unlock(&info->lock);
>>> - time = schedule_hrtimeout_range_clock(timeout, 0,
>>> - HRTIMER_MODE_ABS, CLOCK_REALTIME);
>>>
>> I do not see a bug:
>>
>> We do the __set_current_state() while holding the spinlock. If there is
>> a wakeup, then the wakeup will change current->state to TASK_RUNNING.
> Correct.
>> schedule() will not remove us from the run queue when current->state is
>> TASK_RUNNING. The same applies if there are pending signals: schedule()
>> checks for pending signals and sets current->state to TASK_RUNNING.
>>
>> Since the __set_current_state() is done while we hold info->lock, and
>> since the wakeup cannot happen before we have dropped the lock [because
>> the task that wakes us up needs the same lock], I do not see how a
>> wakeup could be lost.
>>
>> Thus: Which issue do you see?
> waiter waker
> ---- ----
> unlock
> lock
> irq set STATE_READY
> softirq unlock
> wakeup
> sleep a tick
> schedule();
>
> No need to schedule given READY.

This is not possible to avoid:

waiter waker
---- ----
unlock
schedule();
calls __schedule()
<before rq_lock()>

  lock
set STATE_READY
unlock
wakeup
--> set waiter->state = TASK_RUNNING

Now the run queue will be evaluated even though there is strictly speaking no need to do that.
Changes in ipc/sem.c can't solve that: From what I see, the majority of the critical window is in kernel/sched/*.c and not in ipc/sem.c
I do not consider it as useful to add complexity just to reduce the size of a extremely rare event.

Especially: No harm is done. User space can be preempted at any time, so the kernel is always allowed to check the run queue.

--
Manfred