With the new wake_up() mechanism and wait queue that supports callbacks, a
wake_up() function call might end up calling another function. This
function might be the poll callback that someone else installed on the
first device. The callback invocation will make the one that installed the
callback on the first device to think that events are avalable, and it'll
very likely go ahead and wake_up() its own poll wait queue. See the
problem ? We can cycle through and we can either have a deadlock or a
stack blow up. An easy like ineffective solution would be to avoid the
insertion inside an epoll fd on other epoll fds. But this won't prevent
that another device that will use a similar technique will born tomorrow.
This is waht I was thinking to solve those problems :
1) Move the wake_up() call done inside the poll callback outside the lock
void poll_cb(xxx *data)
{
int pwake = 0;
lock(data);
...
if (wait_queue_active(&data->poll_wait))
pwake++;
unlock(data)
if (pwake)
ep_poll_safe_wakeup(&data->psw, &data->poll_wait)
}
2) Use this infrastructure to perform safe poll wakeups
/*
* This is used to implement the safe poll wake up avoiding to reenter
* the poll callback from inside wake_up().
*/
struct poll_safewake {
int wakedoor; /* Init = 1, can do wake up */
atomic_t count; /* Init = 0, wake up count */
};
/* Perform a safe wake up of the poll wait list */
static void ep_poll_safe_wakeup(struct poll_safewake *psw, wait_queue_head_t *wq)
{
atomic_inc(&psw->count);
do {
if (!xchg(&psw->wakedoor, 0))
break;
wake_up(wq);
xchg(&psw->wakedoor, 1);
} while (!atomic_dec_and_test(&psw->count));
}
Does anyone foresee problem in this implementation ?
Another ( crappy ) solution might be to avoid the epoll fd to drop inside
its poll wait queue head, wait queues that has the function pointer != NULL
- Davide
> 1) Move the wake_up() call done inside the poll callback outside the lock
> void poll_cb(xxx *data)
> {
> int pwake = 0;
>
> lock(data);
> ...
> if (wait_queue_active(&data->poll_wait))
> pwake++;
> unlock(data)
> if (pwake)
> ep_poll_safe_wakeup(&data->psw, &data->poll_wait)
> }
This looks like a good thing to do with or without the problem. Minimizing
the time that a lock is held is usually a good idea.
> 2) Use this infrastructure to perform safe poll wakeups
> ...
> static void ep_poll_safe_wakeup(struct poll_safewake *psw, wait_queue_head_t *wq)
> {
> atomic_inc(&psw->count);
> do {
> if (!xchg(&psw->wakedoor, 0))
> break;
> wake_up(wq);
> xchg(&psw->wakedoor, 1);
> } while (!atomic_dec_and_test(&psw->count));
> }
> Does anyone foresee problem in this implementation ?
> Another ( crappy ) solution might be to avoid the epoll fd to drop inside
> its poll wait queue head, wait queues that has the function pointer != NULL
Clever. (I think the second xchg() can just be atomic_set()) Without actually
playing with it, it looks good to me.
If the problem is too hard to solve - it isn't that bad if one can't
epoll recursively. If the functionality was added later, it is
doubtful that the API itself would need to change.
mark
--
[email protected]/[email protected]/[email protected] __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
http://mark.mielke.cc/
On Wed, 20 Nov 2002, Mark Mielke wrote:
> > 2) Use this infrastructure to perform safe poll wakeups
> > ...
> > static void ep_poll_safe_wakeup(struct poll_safewake *psw, wait_queue_head_t *wq)
> > {
> > atomic_inc(&psw->count);
> > do {
> > if (!xchg(&psw->wakedoor, 0))
> > break;
> > wake_up(wq);
> > xchg(&psw->wakedoor, 1);
> > } while (!atomic_dec_and_test(&psw->count));
> > }
> > Does anyone foresee problem in this implementation ?
> > Another ( crappy ) solution might be to avoid the epoll fd to drop inside
> > its poll wait queue head, wait queues that has the function pointer != NULL
>
> Clever. (I think the second xchg() can just be atomic_set()) Without actually
> playing with it, it looks good to me.
>
> If the problem is too hard to solve - it isn't that bad if one can't
> epoll recursively. If the functionality was added later, it is
> doubtful that the API itself would need to change.
This should cover the wake_up() cycle case IMHO. We should be able to
stock epoll fds inside epoll fd. In theory ... :)
- Davide
Davide Libenzi writes:
> 1) Move the wake_up() call done inside the poll callback outside the lock
You can't. You need to hold the lock over the callback or your callback
could end up accessing a freed epitem.
On Mon, 25 Nov 2002, John Myers wrote:
> Davide Libenzi writes:
> > 1) Move the wake_up() call done inside the poll callback outside the lock
>
> You can't. You need to hold the lock over the callback or your callback
> could end up accessing a freed epitem.
No, look at the code :
http://www.xmailserver.org/linux-patches/sys_epoll-2.5.49-0.58.diff
The function ep_collect_ready_items() increases the usage count under
lock. So the epintem is protected, and the file* cannot desappear because
of the read lock on epsem.
- Davide
On Mon, 25 Nov 2002, Davide Libenzi wrote:
> On Mon, 25 Nov 2002, John Myers wrote:
>
> > Davide Libenzi writes:
> > > 1) Move the wake_up() call done inside the poll callback outside the lock
> >
> > You can't. You need to hold the lock over the callback or your callback
> > could end up accessing a freed epitem.
>
> No, look at the code :
>
> http://www.xmailserver.org/linux-patches/sys_epoll-2.5.49-0.58.diff
>
> The function ep_collect_ready_items() increases the usage count under
> lock. So the epintem is protected, and the file* cannot desappear because
> of the read lock on epsem.
Ops, I understood the f_op->poll() not the wake_up(). It can be solved in
the same way. I'll do it now.
- Davide
On Mon, 25 Nov 2002, Davide Libenzi wrote:
> On Mon, 25 Nov 2002, Davide Libenzi wrote:
>
> > On Mon, 25 Nov 2002, John Myers wrote:
> >
> > > Davide Libenzi writes:
> > > > 1) Move the wake_up() call done inside the poll callback outside the lock
> > >
> > > You can't. You need to hold the lock over the callback or your callback
> > > could end up accessing a freed epitem.
> >
> > No, look at the code :
> >
> > http://www.xmailserver.org/linux-patches/sys_epoll-2.5.49-0.58.diff
> >
> > The function ep_collect_ready_items() increases the usage count under
> > lock. So the epintem is protected, and the file* cannot desappear because
> > of the read lock on epsem.
>
> Ops, I understood the f_op->poll() not the wake_up(). It can be solved in
> the same way. I'll do it now.
The only place where a protection is needed is inside ep_insert() and not
because of the wake_up() outsize the lock. Because of this :
/* Add the current item to the list of active epoll hook for this file */
spin_lock(&tfile->f_ep_lock);
list_add_tail(&epi->fllink, &tfile->f_ep_links);
spin_unlock(&tfile->f_ep_lock);
- Davide
On Mon, 25 Nov 2002, John Myers wrote:
> Davide Libenzi wrote:
>
> >No, look at the code :
> >
> >
> Ok, I completely misread what you wrote. I thought you were suggestiong
> a change to the locking behavior of wake_up() itself.
The problem with the wake_up() is its potential recursion when using the
callback'd version of the wait queue. We can have two scenarios ( given
'->' == CONTAINED-IN ) :
1)
epfd1 -> epfd2 -> ... ->epfdN with N huge
2)
epfd1 -> epfd2 -> ... -> epfd1
Now, when you call wake_up() without using any care you'll blow up the
stack. This should be adressed with the 0.58 version of the patch. Also, I
moved the :
/* Add the current item to the list of active epoll hook for this file */
spin_lock(&tfile->f_ep_lock);
list_add_tail(&epi->fllink, &tfile->f_ep_links);
spin_unlock(&tfile->f_ep_lock);
before inserting the item in the hash, so I don't need to increase the
usage count in ep_insert().
- Davide
Davide Libenzi wrote:
>No, look at the code :
>
>
Ok, I completely misread what you wrote. I thought you were suggestiong
a change to the locking behavior of wake_up() itself.
One wonders if a work-to-do would be appropriate.