2003-06-19 17:55:33

by Ray Bryant

[permalink] [raw]
Subject: Re: PROBLEM: Bug in __pollwait() can cause select() and poll() tohang in

Manfred Spraul wrote:
>
> Hi Ray,
>
> your bug description seems to be correct, but the fix is wrong:
> If the allocation is for the 2nd page of wait queue heads, then
> "current->state = TASK_INTERRUPTIBLE" can lead to lost wakeups, if an fd
> that is stored in the first page gets ready during the allocation.

Hi Manfred,

Grumble. :-) Yes, I believe you are correct.

> Setting the state to interruptible is only permitted if a full scan of
> all file descriptors happens before calling schedule(). This is
> expensive and should be avoided.
>
> The correct fix is current->state = TASK_RUNNING just before calling
> yield() in the rebalance code.

But doesn't this have the same kind of problem? e. g., just before
calling yield() in the rebalance code we save current->state, set it to
TASK_RUNNING, then restore current->state on return from yield(). If a
fd becomes ready after the call to yield(), and we entered
__alloc_pages() with state TASK_INTERRUPTIBLE, aren't we in exactly the
same situation as described above?

Let me think about this some more.

Thanks,
--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant
512-453-9679 (work) 512-507-7807 (cell)
[email protected] [email protected]
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------


2003-06-19 18:06:09

by Andrew Morton

[permalink] [raw]
Subject: Re: PROBLEM: Bug in __pollwait() can cause select() and poll() tohang in

Ray Bryant <[email protected]> wrote:
>
> > The correct fix is current->state = TASK_RUNNING just before calling
> > yield() in the rebalance code.
>
> But doesn't this have the same kind of problem? e. g., just before
> calling yield() in the rebalance code we save current->state, set it to
> TASK_RUNNING, then restore current->state on return from yield(). If a
> fd becomes ready after the call to yield(), and we entered
> __alloc_pages() with state TASK_INTERRUPTIBLE, aren't we in exactly the
> same situation as described above?

No, you cannot restore the task state after having set it to TASK_RUNNING.

Just leave the state at TASK_RUNNING. The (silly) code which called the
page allocator in state TASK_[IN]TERRUPTIBLE will just go around its wait
loop an extra time and go back to sleep. This almost always works.