2009-10-09 05:56:20

by wu Jianfeng

[permalink] [raw]
Subject: select system call's implementation may have some bug, need your help and confirm !!!

A process may sleep for ever when he call select system call.
In detail, if the process was scheduled out just at the point it set
its state to TASK_INTERRUPTIBLE.

The events that cause the process to be scheduled out is(in preempt kernel) :
1) time interrupt and the process's time slice is exhausted.
2) an interrupt accured, and wake up another process with high priority.

see the comments after "##"

int do_select(int n, fd_set_bits *fds, s64 *timeout)
{
struct poll_wqueues table;
poll_table *wait;
int retval, i;

rcu_read_lock();
retval = max_select_fd(n, fds);
rcu_read_unlock();

if (retval < 0)
return retval;
n = retval;

poll_initwait(&table);
wait = &table.pt;
if (!*timeout)
wait = NULL;
retval = 0;
for (;;) {
unsigned long *rinp, *routp, *rexp, *inp, *outp, *exp;
long __timeout;

set_current_state(TASK_INTERRUPTIBLE);
## here set the process state TASK_INTERRUPTIBLE

## if the process was scheduled out here, then the process will
never can be waked up, because it has not been attached to any file 's
wait queue.

inp = fds->in; outp = fds->out; exp = fds->ex;
rinp = fds->res_in; routp = fds->res_out; rexp = fds->res_ex;

for (i = 0; i < n; ++rinp, ++routp, ++rexp) {
unsigned long in, out, ex, all_bits, bit = 1, mask, j;
unsigned long res_in = 0, res_out = 0, res_ex = 0;
struct file_operations *f_op = NULL;
struct file *file = NULL;

in = *inp++; out = *outp++; ex = *exp++;
all_bits = in | out | ex;
if (all_bits == 0) {
i += __NFDBITS;
continue;
}

for (j = 0; j < __NFDBITS; ++j, ++i, bit <<= 1) {
if (i >= n)
break;
if (!(bit & all_bits))
continue;
file = fget(i);
if (file) {
f_op = file->f_op;
mask = DEFAULT_POLLMASK;
if (f_op && f_op->poll)
mask = (*f_op->poll)(file, retval ? NULL : wait);
fput(file);
if ((mask & POLLIN_SET) && (in & bit)) {
res_in |= bit;
retval++;
}
if ((mask & POLLOUT_SET) && (out & bit)) {
res_out |= bit;
retval++;
}
if ((mask & POLLEX_SET) && (ex & bit)) {
res_ex |= bit;
retval++;
}
}
cond_resched();
}
if (res_in)
*rinp = res_in;
if (res_out)
*routp = res_out;
if (res_ex)
*rexp = res_ex;
}
wait = NULL;
if (retval || !*timeout || signal_pending(current))
break;
if(table.error) {
retval = table.error;
break;
}

if (*timeout < 0) {
/* Wait indefinitely */
__timeout = MAX_SCHEDULE_TIMEOUT;
} else if (unlikely(*timeout >= (s64)MAX_SCHEDULE_TIMEOUT - 1)) {
/* Wait for longer than MAX_SCHEDULE_TIMEOUT. Do it in a loop */
__timeout = MAX_SCHEDULE_TIMEOUT - 1;
*timeout -= __timeout;
} else {
__timeout = *timeout;
*timeout = 0;
}
__timeout = schedule_timeout(__timeout);
if (*timeout >= 0)
*timeout += __timeout;
}
__set_current_state(TASK_RUNNING);

poll_freewait(&table);

return retval;
}


2009-10-09 07:53:14

by Alan Jenkins

[permalink] [raw]
Subject: Re: select system call's implementation may have some bug, need your help and confirm !!!

On 10/9/09, wu Jianfeng <[email protected]> wrote:
> A process may sleep for ever when he call select system call.
> In detail, if the process was scheduled out just at the point it set
> its state to TASK_INTERRUPTIBLE.
>
> The events that cause the process to be scheduled out is(in preempt kernel)
> :
> 1) time interrupt and the process's time slice is exhausted.
> 2) an interrupt accured, and wake up another process with high priority.
>
> see the comments after "##"
>
> int do_select(int n, fd_set_bits *fds, s64 *timeout)
> {
> struct poll_wqueues table;
> poll_table *wait;
> int retval, i;
>
> rcu_read_lock();
> retval = max_select_fd(n, fds);
> rcu_read_unlock();
>
> if (retval < 0)
> return retval;
> n = retval;
>
> poll_initwait(&table);
> wait = &table.pt;
> if (!*timeout)
> wait = NULL;
> retval = 0;
> for (;;) {
> unsigned long *rinp, *routp, *rexp, *inp, *outp, *exp;
> long __timeout;
>
> set_current_state(TASK_INTERRUPTIBLE);
> ## here set the process state TASK_INTERRUPTIBLE
>
> ## if the process was scheduled out here, then the process will
> never can be waked up, because it has not been attached to any file 's
> wait queue.

I'm not sure about that, but if you look at the current code (e.g. in
linus' git tree) you will see this code has been changed. Now
set_current_state() is only called from poll_schedule_timeout(), and
it won't suffer from the problem you suggested:


int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
ktime_t *expires, unsigned long slack)
{
int rc = -EINTR;

set_current_state(state);
if (!pwq->triggered)
rc = schedule_hrtimeout_range(expires, slack, HRTIMER_MODE_ABS);
__set_current_state(TASK_RUNNING);


I don't know the specific reason this was changed. Try looking at the
git history if you're still curious.

Regards
Alan

2009-10-09 09:21:18

by Peter Zijlstra

[permalink] [raw]
Subject: Re: select system call's implementation may have some bug, need your help and confirm !!!

On Fri, 2009-10-09 at 08:52 +0100, Alan Jenkins wrote:
> On 10/9/09, wu Jianfeng <[email protected]> wrote:
> > A process may sleep for ever when he call select system call.
> > In detail, if the process was scheduled out just at the point it set
> > its state to TASK_INTERRUPTIBLE.
> >

> > set_current_state(TASK_INTERRUPTIBLE);
> > ## here set the process state TASK_INTERRUPTIBLE
> >
> > ## if the process was scheduled out here, then the process will
> > never can be waked up, because it has not been attached to any file 's
> > wait queue.
>
> I'm not sure about that, but if you look at the current code (e.g. in
> linus' git tree) you will see this code has been changed. Now
> set_current_state() is only called from poll_schedule_timeout(), and
> it won't suffer from the problem you suggested:
>
>
> int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
> ktime_t *expires, unsigned long slack)
> {
> int rc = -EINTR;
>
> set_current_state(state);
> if (!pwq->triggered)
> rc = schedule_hrtimeout_range(expires, slack, HRTIMER_MODE_ABS);
> __set_current_state(TASK_RUNNING);

It would still be liable to the problem suggested, if it were real,
which it is not, as Thomas pointed out.

The distinction is between getting preempted and calling schedule().