LinuxLists.cc - poll() in 2.6 and beyond

2004-03-02 18:21:11

by Richard B. Johnson

[permalink] [raw]

Subject: poll() in 2.6 and beyond

Poll in 2.6.0; when a driver routine calls poll_wait()
it returns <<immediately>> to somewhere in the
kernel, then waits for my wake_up_interuptible(), before
returning control to a user sleeping in poll(). This means
that the user gets the wrong poll return value! It
doesn't get the value it was given as a result of the
interrupt, but the value that existed (0) before the
interrupt occurred.

Poll should not return from poll_wait() until it gets
a wake_up_interruptible() call. The wait variable,
info->pwait, below, has been initialized by executing
init_waitqueue_head(&info->pwait) in the initialization
code. This code works in 2.4.24.

What do I do to make it work in 2.6.0 and beyond? There
are no hints in the 2.6 drivers as they all seem to be
written like this and they presumably work.

/*-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
/*
* The interrupt service routine.
*/
static void pci_isr(int irq, void *p, struct pt_regs *regs)
{
spin_lock(&info->lock);
DEB(printk(KERN_INFO"%s : Interrupt!\n", devname));
info->poll_flag |= POLLIN|DEF_POLL;
wake_up_interruptible(&info->pwait);
spin_unlock(&info->lock);
}
/*-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
/*
* Device poll routine.
*/
static size_t poll(struct file *fp, struct poll_table_struct *wait)
{
size_t poll_flag;
size_t flags;
DEB(printk(KERN_INFO"%s : poll called\n", devname));
poll_wait(fp, &info->pwait, wait);
lockit(TRUE, &flags);
poll_flag = info->poll_flag;
info->poll_flag = 0;
lockit(FALSE, &flags);
DEB(printk(KERN_INFO"%s : poll returns\n", devname));
return poll_flag;
}

Cheers,
Dick Johnson
Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips).
Note 96.31% of all statistics are fiction.

2004-03-02 20:04:48

by Roland Dreier

[permalink] [raw]

Subject: Re: poll() in 2.6 and beyond

Richard> Poll in 2.6.0; when a driver routine calls poll_wait() it
Richard> returns <<immediately>> to somewhere in the kernel, then
Richard> waits for my wake_up_interuptible(), before returning
Richard> control to a user sleeping in poll(). This means that the
Richard> user gets the wrong poll return value! It doesn't get the
Richard> value it was given as a result of the interrupt, but the
Richard> value that existed (0) before the interrupt occurred.

Nothing has changed in 2.6 that I know of. poll_wait() always
returned immediately to the driver. The driver's poll method is
supposed to call poll_wait() on the wait queues that indicate a change
in poll status, and then return with the current status.

Read the description of "poll and select" in LDD:
<http://www.xml.com/ldd/chapter/book/ch05.html#t3>

- Roland

2004-03-02 20:24:27

by Richard B. Johnson

[permalink] [raw]

Subject: Re: poll() in 2.6 and beyond

On Tue, 2 Mar 2004, Roland Dreier wrote:

> Richard> Poll in 2.6.0; when a driver routine calls poll_wait() it
> Richard> returns <<immediately>> to somewhere in the kernel, then
> Richard> waits for my wake_up_interuptible(), before returning
> Richard> control to a user sleeping in poll(). This means that the
> Richard> user gets the wrong poll return value! It doesn't get the
> Richard> value it was given as a result of the interrupt, but the
> Richard> value that existed (0) before the interrupt occurred.
>
> Nothing has changed in 2.6 that I know of. poll_wait() always
> returned immediately to the driver. The driver's poll method is
> supposed to call poll_wait() on the wait queues that indicate a change
> in poll status, and then return with the current status.
>
> Read the description of "poll and select" in LDD:
> <http://www.xml.com/ldd/chapter/book/ch05.html#t3>
>
> - Roland

I'm talking about the driver! When a open fd called poll() or select(),
in user-mode code, the driver's poll() was called, and the driver's poll()
would call poll_wait(). Poll_wait() used to NOT return until the driver
executed wake_up_interruptible() on that wait-queue. When poll_wait()
returned, the driver would return to the caller with the new poll-
status.

Now, when the driver calls poll_wait(), it returns immediately to
the driver. The driver then returns with the wrong poll status because
nothing changed yet. This return, doesn't go back to the user-mode
caller, but remains within the kernel until a wake_up_interruptible()
has been received. Then it returns to the original caller, with the
(wrong) status that existed before the wake_up_interruptible().

When the cycle repeats, the status from the previous event gets
returned, so if the events are all the same (POLLIN), only the
first one is wrong and nobody is the wiser. However, when there
are multiple events, they appear to the user out-of-sequence
and muck up user-mode code.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips).
Note 96.31% of all statistics are fiction.

2004-03-02 21:00:53

by Roland Dreier

[permalink] [raw]

Subject: Re: poll() in 2.6 and beyond

Richard> I'm talking about the driver! When a open fd called
Richard> poll() or select(), in user-mode code, the driver's
Richard> poll() was called, and the driver's poll() would call
Richard> poll_wait(). Poll_wait() used to NOT return until the
Richard> driver executed wake_up_interruptible() on that
Richard> wait-queue. When poll_wait() returned, the driver would
Richard> return to the caller with the new poll- status.

I don't think so. Even in kernel 2.4, poll_wait() just calls
__pollwait(). I don't see anything in __pollwait() that sleeps.
Think about it. How would the kernel handle userspace calling poll()
with more than one file descriptor if each individual driver slept?

I'll repeat my earlier suggestion. Read the description of "poll and
select" in LDD:
<http://www.xml.com/ldd/chapter/book/ch05.html#t3>

If you refuse to understand the documented interface, I don't think
anyone can help you.

- Roland

2004-03-02 21:25:37

by Richard B. Johnson

[permalink] [raw]

Subject: Re: poll() in 2.6 and beyond

On Tue, 2 Mar 2004, Roland Dreier wrote:

> Richard> I'm talking about the driver! When a open fd called
> Richard> poll() or select(), in user-mode code, the driver's
> Richard> poll() was called, and the driver's poll() would call
> Richard> poll_wait(). Poll_wait() used to NOT return until the
> Richard> driver executed wake_up_interruptible() on that
> Richard> wait-queue. When poll_wait() returned, the driver would
> Richard> return to the caller with the new poll- status.
>
> I don't think so. Even in kernel 2.4, poll_wait() just calls
> __pollwait(). I don't see anything in __pollwait() that sleeps.
> Think about it. How would the kernel handle userspace calling poll()
> with more than one file descriptor if each individual driver slept?
>

Well what to you think they do? Spin?

> I'll repeat my earlier suggestion. Read the description of "poll and
> select" in LDD:
> <http://www.xml.com/ldd/chapter/book/ch05.html#t3>
>
> If you refuse to understand the documented interface, I don't think
> anyone can help you.

I am quite familiar with the operation of poll(), thank you.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips).
Note 96.31% of all statistics are fiction.

2004-03-02 21:40:09

by Roland Dreier

[permalink] [raw]

Subject: Re: poll() in 2.6 and beyond

Roland> I don't think so. Even in kernel 2.4, poll_wait() just
Roland> calls __pollwait(). I don't see anything in __pollwait()
Roland> that sleeps. Think about it. How would the kernel handle
Roland> userspace calling poll() with more than one file
Roland> descriptor if each individual driver slept?

Richard> Well what to you think they do? Spin?

I don't know why I continue this, but.... can you point out the line
in the kernel 2.4 source for __pollwait() where it sleeps?

Or think about it. Suppose a user called poll() with two fds, each of
which belonged to a different driver. Suppose each driver slept in
its poll method. If the first driver never became ready (and stayed
asleep), how would poll() return to user space if the second driver
became ready?

What actually happens is that each driver registers with the kernel
the wait queues that it will wake up when it becomes ready. But the
core kernel is responsible for sleeping, outside of the driver code.

Really, read:
<http://www.xml.com/ldd/chapter/book/ch05.html#t3>

You might believe you're familiar with poll() but I think it would
help to read what the Linux Device Drivers book has to say. It
answers the exact question you're asking, and if you don't believe me
you might believe the definitive published book.

- Roland

2004-03-02 21:58:14

by Richard B. Johnson

[permalink] [raw]

Subject: Re: poll() in 2.6 and beyond

On Tue, 2 Mar 2004, Roland Dreier wrote:

> Roland> I don't think so. Even in kernel 2.4, poll_wait() just
> Roland> calls __pollwait(). I don't see anything in __pollwait()
> Roland> that sleeps. Think about it. How would the kernel handle
> Roland> userspace calling poll() with more than one file
> Roland> descriptor if each individual driver slept?
>
> Richard> Well what to you think they do? Spin?
>
> I don't know why I continue this, but.... can you point out the line
> in the kernel 2.4 source for __pollwait() where it sleeps?
>

You are playing games with semantics because you are wrong.
The code in fs/select.c about line 101, adds the current caller
to the wait-queue. This wait-queue is the mechanism by which the
current caller sleeps, i.e., gives the CPU up to somebody else.
That caller's thread will not return past that line until
a wake_up_interruptible() call has been made for/from the
driver or interface handling that file descriptor. In this manner
any number of file discriptors may be handled because the poll()
routine for each of then makes its own entry into the wait-queue
using the described mechanism.

> Or think about it. Suppose a user called poll() with two fds, each of
> which belonged to a different driver. Suppose each driver slept in
> its poll method. If the first driver never became ready (and stayed
> asleep), how would poll() return to user space if the second driver
> became ready?
>

Explained above.
[SNIPPED bullshit]

Cheers,
Dick Johnson
Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips).
Note 96.31% of all statistics are fiction.

2004-03-02 22:43:31

by David Dillow

[permalink] [raw]

Subject: Re: poll() in 2.6 and beyond

On Tue, 2004-03-02 at 16:59, Richard B. Johnson wrote:
> On Tue, 2 Mar 2004, Roland Dreier wrote:

> > I don't know why I continue this, but.... can you point out the line
> > in the kernel 2.4 source for __pollwait() where it sleeps?

> The code in fs/select.c about line 101, adds the current caller
> to the wait-queue. This wait-queue is the mechanism by which the
> current caller sleeps, i.e., gives the CPU up to somebody else.

Actually, no, it does not work that way. The wait queue is the mechanism
by which a process can be *woken up* . The call to add_wait_queue() does
not put the process to sleep.

The process is put to sleep in do_poll(), using schedule_timeout() with
the current state as TASK_INTERRUPTIBLE. This is on line 453.

Your driver will eventually have to wake up sleepers on the queue.
do_poll() will also wake up when the timeout expires, or a signal is
sent to the poll()'ing process.

These semantics have not changed since 2.4. The implementation has
changed a bit, but the driver interface has not changed. Read the
documentation the Roland pointed you to.
--
Dave Dillow <[email protected]>

2004-03-02 22:51:07

by Bill Davidsen

[permalink] [raw]

Subject: Re: poll() in 2.6 and beyond

Roland Dreier wrote:

> I don't know why I continue this, but.... can you point out the line
> in the kernel 2.4 source for __pollwait() where it sleeps?
>
> Or think about it. Suppose a user called poll() with two fds, each of
> which belonged to a different driver. Suppose each driver slept in
> its poll method. If the first driver never became ready (and stayed
> asleep), how would poll() return to user space if the second driver
> became ready?
>
> What actually happens is that each driver registers with the kernel
> the wait queues that it will wake up when it becomes ready. But the
> core kernel is responsible for sleeping, outside of the driver code.

Could you maybe go back to the initial report, which is that after
poll() gets wrong status? It's nice to argue about where the process
waits, but the issue is if it gets the same status with 2.4 and 2.6, and
if not which one should be fixed.

Richard: can you show this with a small demo program? I assume you
didn't find this just by reading code ;-)

2004-03-02 22:56:32

by Roland Dreier

[permalink] [raw]

Subject: Re: poll() in 2.6 and beyond

Richard> You are playing games with semantics because you are
Richard> wrong. The code in fs/select.c about line 101, adds the
Richard> current caller to the wait-queue.

I assume you mean the call to add_wait_queue() there. That does not
sleep. Look at the implementation. add_wait_queue() is defined in
kernel/fork.c -- it just does some locking and calls
__add_wait_queue(). __add_wait_queue() is really nothing more than
a list_add(). There's nothing more to it and nothing that goes to
sleep. Where do you think add_wait_queue() goes to sleep?

Richard> This wait-queue is the mechanism by which the current
Richard> caller sleeps, i.e., gives the CPU up to somebody else.
Richard> That caller's thread will not return past that line until
Richard> a wake_up_interruptible() call has been made for/from the
Richard> driver or interface handling that file descriptor. In
Richard> this manner any number of file discriptors may be handled
Richard> because the poll() routine for each of then makes its own
Richard> entry into the wait-queue using the described mechanism.

But there's only one thread around: the user space process that called
into the kernel via poll(). If the first driver goes to sleep, which
thread do you think is going to wake up and call into the second
driver?

- Roland

2004-03-02 22:58:37

by Roland Dreier

[permalink] [raw]

Subject: Re: poll() in 2.6 and beyond

Bill> Could you maybe go back to the initial report, which is that
Bill> after poll() gets wrong status? It's nice to argue about
Bill> where the process waits, but the issue is if it gets the
Bill> same status with 2.4 and 2.6, and if not which one should be
Bill> fixed.

I'm sure the problem is a buggy driver. The original question
represents a complete misunderstanding of how poll() is implemented,
so it's no surprise that the driver breaks.

- Roland

2004-03-02 23:15:26

by Richard B. Johnson

[permalink] [raw]

Subject: Re: poll() in 2.6 and beyond

On Tue, 2 Mar 2004, Roland Dreier wrote:

> Richard> You are playing games with semantics because you are
> Richard> wrong. The code in fs/select.c about line 101, adds the
> Richard> current caller to the wait-queue.
>
> I assume you mean the call to add_wait_queue() there. That does not
> sleep. Look at the implementation. add_wait_queue() is defined in
> kernel/fork.c -- it just does some locking and calls
> __add_wait_queue(). __add_wait_queue() is really nothing more than
> a list_add(). There's nothing more to it and nothing that goes to
> sleep. Where do you think add_wait_queue() goes to sleep?
>
> Richard> This wait-queue is the mechanism by which the current
> Richard> caller sleeps, i.e., gives the CPU up to somebody else.
> Richard> That caller's thread will not return past that line until
> Richard> a wake_up_interruptible() call has been made for/from the
> Richard> driver or interface handling that file descriptor. In
> Richard> this manner any number of file discriptors may be handled
> Richard> because the poll() routine for each of then makes its own
> Richard> entry into the wait-queue using the described mechanism.
>
> But there's only one thread around: the user space process that called
> into the kernel via poll(). If the first driver goes to sleep, which
> thread do you think is going to wake up and call into the second
> driver?
>

There are two routines where the CPU is actually given up,
do_select() and do_poll(). Search for schedule_timeout().
Once the scheduler has the CPU, it's available for any of
the other drivers. It's also available for the timer queues,
other tasks, and the interrupts.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips).
Note 96.31% of all statistics are fiction.

2004-03-02 23:21:14

by Roland Dreier

[permalink] [raw]

Subject: Re: poll() in 2.6 and beyond

Richard> There are two routines where the CPU is actually given
Richard> up, do_select() and do_poll(). Search for
Richard> schedule_timeout(). Once the scheduler has the CPU, it's
Richard> available for any of the other drivers. It's also
Richard> available for the timer queues, other tasks, and the
Richard> interrupts.

OK, fine, I can't argue with that. But it has nothing to do with the
discussion at hand.

You still haven't said where poll_wait() sleeps in kernel 2.4. You
claimed it was in add_wait_queue(), but add_wait_queue() doesn't sleep
(it's just a list_add() guarded by a lock).

Also, why would another driver for the same poll() call run? There's
only one thread around that cares about this call to poll() -- the
userspace process that originally called poll(). If one driver
sleeps, no other drivers will run until that driver wakes up.

- Roland

2004-03-02 23:31:26

by Richard B. Johnson

[permalink] [raw]

Subject: Re: poll() in 2.6 and beyond

On Tue, 2 Mar 2004, Bill Davidsen wrote:

> Roland Dreier wrote:
>
> > I don't know why I continue this, but.... can you point out the line
> > in the kernel 2.4 source for __pollwait() where it sleeps?
> >
> > Or think about it. Suppose a user called poll() with two fds, each of
> > which belonged to a different driver. Suppose each driver slept in
> > its poll method. If the first driver never became ready (and stayed
> > asleep), how would poll() return to user space if the second driver
> > became ready?
> >
> > What actually happens is that each driver registers with the kernel
> > the wait queues that it will wake up when it becomes ready. But the
> > core kernel is responsible for sleeping, outside of the driver code.
>
> Could you maybe go back to the initial report, which is that after
> poll() gets wrong status? It's nice to argue about where the process
> waits, but the issue is if it gets the same status with 2.4 and 2.6, and
> if not which one should be fixed.
>
> Richard: can you show this with a small demo program? I assume you
> didn't find this just by reading code ;-)

Yes. The code I attached earlier shows that the poll() in a driver
gets called (correctly), then it calls poll_wait(). Unfortunately
the call to poll_wait() returns immediately so that the return
value from the driver's poll() is whatever it was before some
event occurred that the driver was going to signal with
wake_up_interruptible().

The attached code clearly demonstrates this. It doesn't even contain
any code to execute wake_up_interruptible(). When an event occurs
in the driver that would have set the poll_flag to POLLIN, and
executed wake_up_interruptible, the old status, the stuff that
was returned when poll_wait() returned immediately instead of
waiting for the wake up, gets returned to the user-mode program.

Now, if the user-mode program calls poll() again, which is likely,
it gets the status that was returned from the previous event so
it "seems" to work. However, it is always one event behind so
you need two events to recognize the first one.

I attached the module and demo program again. It clearly shows
that poll_wait() gets called and then immediately returns without
waiting...

Cheers,
Dick Johnson
Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips).
Note 96.31% of all statistics are fiction.

Attachments:

modules.tar.gz (1.42 kB)

2004-03-03 00:07:44

[permalink] [raw]

Subject: Re: poll() in 2.6 and beyond

Richard B. Johnson wrote:

> > Could you maybe go back to the initial report, which is that after
> > poll() gets wrong status? It's nice to argue about where the process
> > waits, but the issue is if it gets the same status with 2.4 and 2.6,
> and
> > if not which one should be fixed.
> >
> > Richard: can you show this with a small demo program? I assume you
> > didn't find this just by reading code ;-)
>
> Yes. The code I attached earlier shows that the poll() in a driver
> gets called (correctly), then it calls poll_wait(). Unfortunately
> the call to poll_wait() returns immediately so that the return
> value from the driver's poll() is whatever it was before some
> event occurred that the driver was going to signal with
> wake_up_interruptible().
>
The documentation referred to earlier
(http://www.xml.com/ldd/chapter/book/ch05.html#t3) clearly states that
the poll function implementation for a device should:
"1. Call /poll_wait/ on one or more wait queues that could indicate a
change in the poll status.
2. Return a bit mask describing operations that could be immediately
performed without blocking."
What happens is if your device has data ready RIGHT NOW (without any
waiting), the information regarding that is returned.

Now, if you look closely at the implementation of do_poll(), it will
loop forever until any device returns that data is available (or the
timeout occurs). After data is available, do_pollfd() function no longer
adds the devices to the wait queue.

What this means is that the first time through each device is added to
the wait queue. After the process is woken up from schedule_timeout (or
the timeout occurs), then it will loop through each device again to
either add that device to the wait queue again, or determine that there
is data ready to be read/written, or an error or whatever. do_pollfd()
then increases count, and do_poll() exits from it's loop (and wait
queues are cleaned-up in sys_poll()).

So, yes, the poll_wait() returns immediately, it should not wait, and
your poll method should return the device's CURRENT status in the poll
mask. Don't worry, because when your device wakes the waiting process
when data is ready, the device's poll function is called again and you
again can return the CURRENT status.

Please let me know if I'm misunderstanding this, but quite frankly,
poll_wait CANNOT wait, for the very reasons described in previous posts:
when multiple devices are polled, then poll_wait is called FOR EACH DEVICE.

Cheers,

..John

2004-03-03 01:16:36

by Richard B. Johnson

[permalink] [raw]

Subject: Re: poll() in 2.6 and beyond

On Tue, 2 Mar 2004, John Muir wrote:

> Richard B. Johnson wrote:
>
> > > Could you maybe go back to the initial report, which is that after
> > > poll() gets wrong status? It's nice to argue about where the process
> > > waits, but the issue is if it gets the same status with 2.4 and 2.6,
> > and
> > > if not which one should be fixed.
> > >
> > > Richard: can you show this with a small demo program? I assume you
> > > didn't find this just by reading code ;-)
> >
> > Yes. The code I attached earlier shows that the poll() in a driver
> > gets called (correctly), then it calls poll_wait(). Unfortunately
> > the call to poll_wait() returns immediately so that the return
> > value from the driver's poll() is whatever it was before some
> > event occurred that the driver was going to signal with
> > wake_up_interruptible().
> >
> The documentation referred to earlier
> (http://www.xml.com/ldd/chapter/book/ch05.html#t3) clearly states that
> the poll function implementation for a device should:
> "1. Call /poll_wait/ on one or more wait queues that could indicate a
> change in the poll status.
> 2. Return a bit mask describing operations that could be immediately
> performed without blocking."
> What happens is if your device has data ready RIGHT NOW (without any
> waiting), the information regarding that is returned.
>
> Now, if you look closely at the implementation of do_poll(), it will
> loop forever until any device returns that data is available (or the
> timeout occurs). After data is available, do_pollfd() function no longer
> adds the devices to the wait queue.
>
> What this means is that the first time through each device is added to
> the wait queue. After the process is woken up from schedule_timeout (or
> the timeout occurs), then it will loop through each device again to
> either add that device to the wait queue again, or determine that there
> is data ready to be read/written, or an error or whatever. do_pollfd()
> then increases count, and do_poll() exits from it's loop (and wait
> queues are cleaned-up in sys_poll()).
>
> So, yes, the poll_wait() returns immediately, it should not wait, and
> your poll method should return the device's CURRENT status in the poll
> mask. Don't worry, because when your device wakes the waiting process
> when data is ready, the device's poll function is called again and you
> again can return the CURRENT status.
>

Well the device's poll function isn't getting called the second time
with 2.6.0. I never checked it in 2.4.x because it always worked.
This problem occurs in a driver that only returns the fact that
one event occurred. When it failed to report the event when built
with a newer kernel, I added diagnostics which showed that the
poll in the driver was only called once --and that the return from
poll_wait happened immediately.

So, if the poll_wait isn't a wait-function, but just some add-wakeup
to the queue function, then its name probably should have been
changed when it changed. At one time it did, truly, wait until
it was awakened with wake_up_interruptible.

> Please let me know if I'm misunderstanding this, but quite frankly,
> poll_wait CANNOT wait, for the very reasons described in previous posts:
> when multiple devices are polled, then poll_wait is called FOR EACH DEVICE.
>
> Cheers,
>
> ..John
>
>

Cheers,
Dick Johnson
Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips).
Note 96.31% of all statistics are fiction.

2004-03-03 03:06:20

by George Spelvin

[permalink] [raw]

Subject: Re: poll() in 2.6 and beyond

> I'm talking about the driver! When a open fd called poll() or select(),
> in user-mode code, the driver's poll() was called, and the driver's poll()
> would call poll_wait(). Poll_wait() used to NOT return until the driver
> executed wake_up_interruptible() on that wait-queue. When poll_wait()
> returned, the driver would return to the caller with the new poll-
> status.

poll_wait has ALWAYS, since it was introduces during the select/poll
changeover in 2.1 development, been a non-sleeping, immediately returning
function.

Its predecessor, select_wait(), has been a non-sleeping function since 1.0.

> So, if the poll_wait isn't a wait-function, but just some add-wakeup
> to the queue function, then its name probably should have been
> changed when it changed. At one time it did, truly, wait until
> it was awakened with wake_up_interruptible.

There is no "when it changed". It has never changed. Go look at
the 2.2.20 code on http://lxr.linux.no/.

***************************************************************
* *
* poll_wait: *
* - DOES NOT sleep. *
* - NEVER HAS slept, in any kernel version, EVER. *
* - WOULD NOT WORK if it did sleep, for reasons that are *
* so BLATANTLY OBVIOUS that arguing about it after it's *
* been REPEATEDLY pointed out is a sign that the person *
* arguing needs to go and visit the rest home with those *
* nice, young men in their clean, white coats. *
* *
***************************************************************

It has aways, since select_wait in Linux 1.0, been nothing more than
an "add-wakeup-to-the-queue" function. The last time the code changed
significantly was the select/poll changeover in 2.1.x, and even then it
was very similar.

When a particular filp's poll method is called, there are two things
that have to get done:
1) Check if the wakeup conditions are already satisfied
(the "no-wait" case), and
2) Schedule the task for wakeup when the filp's condition changes
(the "wait" case)

Now, 2) only has to be done if 1) fails, but we can't do things in that
order because there's a race condition. if the condition changed between
the two steps, but doesn't change after that, we'll never wake up.

So we have to do 2), and THEN check for 1). This is the fundamental
race condition of sleeping until a condition becomes true, so anyone who
entertains the remotest channce of ever writing functioning kernel code
should be excruciatingly familiar with it, and its solution.

For the terminally dim, everyone hold hands and follow me. Remember
when we read Kernighan & Ritchie aloud and don't get lost...

In the 2.2.20 kernel, pollwait is defined (include/linux/poll.h) as:

14 struct poll_table_entry {
15 struct file * filp;
16 struct wait_queue wait;
17 struct wait_queue ** wait_address;
18 };
19
20 typedef struct poll_table_struct {
21 struct poll_table_struct * next;
22 unsigned int nr;
23 struct poll_table_entry * entry;
24 } poll_table;
25
26 #define __MAX_POLL_TABLE_ENTRIES ((PAGE_SIZE - sizeof (poll_table)) / sizeof (struct poll_table_entry))
27
28 extern void __pollwait(struct file * filp, struct wait_queue ** wait_address, poll_table *p);
29
30 extern inline void poll_wait(struct file * filp, struct wait_queue ** wait_address, poll_table *p)
31 {
32 if (p && wait_address)
33 __pollwait(filp, wait_address, p);
34 }

This ia a trivial wrapper around __pollwait(). Nothing else in the
function could possibly take more than a few clock cycles.

__pollwait is defined (fs/select.c) as:

94 void __pollwait(struct file * filp, struct wait_queue ** wait_address, poll_table *p)
95 {
96 for (;;) {
97 if (p->nr < __MAX_POLL_TABLE_ENTRIES) {
98 struct poll_table_entry * entry;
99 entry = p->entry + p->nr;
100 entry->filp = filp;
101 filp->f_count++;
102 entry->wait_address = wait_address;
103 entry->wait.task = current;
104 entry->wait.next = NULL;
105 add_wait_queue(wait_address,&entry->wait);
106 p->nr++;
107 return;
108 }
109 p = p->next;
110 }
111 }

This does a little bit of bookkeeping and calls add_wait_queue().
Nothing else in the function could possibly take more than a few dozen
clock cycles.

Now look at add_wait_queue(), which is defined (include/linux/sched.h) as
nothing more than calls to three other functions:

745 extern inline void __add_wait_queue(struct wait_queue ** p, struct wait_queue * wait)
746 {
747 wait->next = *p ? : WAIT_QUEUE_HEAD(p);
748 *p = wait;
749 }
750
751 extern rwlock_t waitqueue_lock;
752
753 extern inline void add_wait_queue(struct wait_queue ** p, struct wait_queue * wait)
754 {
755 unsigned long flags;
756
757 write_lock_irqsave(&waitqueue_lock, flags);
758 __add_wait_queue(p, wait);
759 write_unlock_irqrestore(&waitqueue_lock, flags);
760 }

write_lock_irqsave() and write_unlock_irqsave() can get a little
bit complicated, so pay careful attention.

We're going to consider just the i386, non-SMP case. SMP locking is
for big boys and girls who did all their homework and got As on their
tests.

They are defined (include/asm-i386/spinlock.h) as:
115 #define write_lock_irqsave(lock, flags) \
116 do { save_flags(flags); cli(); } while (0)
117 #define write_unlock_irqrestore(lock, flags) \
118 restore_flags(flags)

These, in turn, go through some wrappers in include/asm-i386/system.h:
198 #define cli() __cli()
199 #define sti() __sti()
200 #define save_flags(x) __save_flags(x)
201 #define restore_flags(x) __restore_flags(x)

and end up calling the primitive assembly routines in the same file:

176 /* interrupt control.. */
177 #define __sti() __asm__ __volatile__ ("sti": : :"memory")
178 #define __cli() __asm__ __volatile__ ("cli": : :"memory")
179 #define __save_flags(x) \
180 __asm__ __volatile__("pushfl ; popl %0":"=g" (x): /* no input */ :"memory")
181 #define __restore_flags(x) \
182 __asm__ __volatile__("pushl %0 ; popfl": /* no output */ :"g" (x):"memory")

These are one or two machine instructions each.

Finally, _add__wait_queue, above, is just two assignments. Just to
be absolutely sure we've explored every function-like thing that
the most deluded paranoid could think might sleep, I'll mention that
WAIT_QUEUE_HEAD is defined in include/linux/wait.h as:

22 #define WAIT_QUEUE_HEAD(x) ((struct wait_queue *)((x)-1))

This, also, cannot possibly sleep.

To recap, looking back in the wayback machine to 2.2.20, the complete
call/macro graph of poll_wait is:

poll_wait
__pollwait
add_wait_queue
write_lock_irqsave
save_flags
__save_flags
cli
__cli
__add_wait_queue
WAIT_QUEUE_HEAD
write_unlock_irqrestore
restore_flags
__restore_flags

and NONE OF THESE FUNCTIONS SLEEP. Therefore, poll_wait DID NOT USED TO
SLEEP.

2.4 and later allocate the poll_table pages on demand (with GFP_KERNEL)
rather than preallocating everything, so *that* part can sleep, but it
doesn't depend on the filp being polled, and the rest CANNOT.

2004-03-03 03:57:32

by David Dillow

[permalink] [raw]

Subject: Re: poll() in 2.6 and beyond

On Tue, 2004-03-02 at 18:32, Richard B. Johnson wrote:
> Yes. The code I attached earlier shows that the poll() in a driver
> gets called (correctly), then it calls poll_wait(). Unfortunately
> the call to poll_wait() returns immediately so that the return
> value from the driver's poll() is whatever it was before some
> event occurred that the driver was going to signal with
> wake_up_interruptible().

You've been handed a clue enough times now that you should understand
that poll_wait() does not, and has never, put the process to sleep.

If you can show a case where do_poll() returns stale data, then by all
means do so. We will be happy to fix any such error in the kernel.

You say do_poll() looses the status returned from your driver's poll
method. If your driver is truly returning a nonzero status from the
poll() method call, then a simple read of the code in do_pollfd() will
show that the only way it looses information from that event mask is if
your user space is not setting that event type in pollfd.events.

If I were you, I'd check two things:
1) that your poll method is really returning a non-zero status when you
think it is
2) that your user space program is really asking for all events you
think it is

I think you'll find your problem is not this well-used mechanism in the
kernel.

Dave

2004-03-03 04:04:23

by Roland Dreier

[permalink] [raw]

Subject: Re: poll() in 2.6 and beyond

Richard> Well the device's poll function isn't getting called the
Richard> second time with 2.6.0. I never checked it in 2.4.x
Richard> because it always worked. This problem occurs in a
Richard> driver that only returns the fact that one event
Richard> occurred. When it failed to report the event when built
Richard> with a newer kernel, I added diagnostics which showed
Richard> that the poll in the driver was only called once --and
Richard> that the return from poll_wait happened immediately.

Your driver is buggy. It's not surprising since you fundamentally
don't understand the kernel interface you're trying to use.

Richard> So, if the poll_wait isn't a wait-function, but just some
Richard> add-wakeup to the queue function, then its name probably
Richard> should have been changed when it changed. At one time it
Richard> did, truly, wait until it was awakened with
Richard> wake_up_interruptible.

When did it change? Show me a kernel version where poll_wait() waited
until the driver woke it up. (Kernel versions at least as far back as
1.0 are readily available from kernel.org, so it should be easy for
you)

- Roland

2004-03-03 12:49:25

by Richard B. Johnson

[permalink] [raw]

Subject: Re: poll() in 2.6 and beyond

On Tue, 2 Mar 2004, Roland Dreier wrote:

> Richard> Well the device's poll function isn't getting called the
> Richard> second time with 2.6.0. I never checked it in 2.4.x
> Richard> because it always worked. This problem occurs in a
> Richard> driver that only returns the fact that one event
> Richard> occurred. When it failed to report the event when built
> Richard> with a newer kernel, I added diagnostics which showed
> Richard> that the poll in the driver was only called once --and
> Richard> that the return from poll_wait happened immediately.
>
> Your driver is buggy. It's not surprising since you fundamentally
> don't understand the kernel interface you're trying to use.
>
> Richard> So, if the poll_wait isn't a wait-function, but just some
> Richard> add-wakeup to the queue function, then its name probably
> Richard> should have been changed when it changed. At one time it
> Richard> did, truly, wait until it was awakened with
> Richard> wake_up_interruptible.
>
> When did it change? Show me a kernel version where poll_wait() waited
> until the driver woke it up. (Kernel versions at least as far back as
> 1.0 are readily available from kernel.org, so it should be easy for
> you)
>
> - Roland
>

I never said the DRIVER woke up, but that poll sleeps.

FLAGS UID PID PPID PRI NI SIZE RSS WCHAN STA TTY TIME COMMAND
100 0 1 0 6 0 224 148 do_select S ? 0:01 /sbin/init auto
[SNIPPED....]

40 0 115 114 9 0 1452 728 pipe_wait S ? 0:00 /usr/sbin/nmbd -D
100000 0 11109 9459 16 0 1044 476 R 1 0:00 ps -laxw
0 0 11107 9504 12 0 692 216 do_poll S 2 0:00 ./tester

.... and if it DOESN'T then ps is buggy and/or the entry in /proc
in buggy.

Clearly this task is sleeping in do_poll.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips).
Note 96.31% of all statistics are fiction.

2004-03-03 14:29:27

by Davide Libenzi

[permalink] [raw]

Subject: Re: poll() in 2.6 and beyond

On Wed, 3 Mar 2004, Richard B. Johnson wrote:

> Clearly this task is sleeping in do_poll.

I don't believe anyone ever argued that do_poll() sleeps. You were saying
that poll_wait() was sleeping, that is different.

- Davide

2004-03-03 18:21:32

by Richard B. Johnson

[permalink] [raw]

Subject: Re: poll() in 2.6 and beyond

On Tue, 2 Mar 2004, David Dillow wrote:

> On Tue, 2004-03-02 at 18:32, Richard B. Johnson wrote:
> > Yes. The code I attached earlier shows that the poll() in a driver
> > gets called (correctly), then it calls poll_wait(). Unfortunately
> > the call to poll_wait() returns immediately so that the return
> > value from the driver's poll() is whatever it was before some
> > event occurred that the driver was going to signal with
> > wake_up_interruptible().
>
> You've been handed a clue enough times now that you should understand
> that poll_wait() does not, and has never, put the process to sleep.
>
> If you can show a case where do_poll() returns stale data, then by all
> means do so. We will be happy to fix any such error in the kernel.
>
> You say do_poll() looses the status returned from your driver's poll
> method. If your driver is truly returning a nonzero status from the
> poll() method call, then a simple read of the code in do_pollfd() will
> show that the only way it looses information from that event mask is if
> your user space is not setting that event type in pollfd.events.
>
> If I were you, I'd check two things:
> 1) that your poll method is really returning a non-zero status when you
> think it is
> 2) that your user space program is really asking for all events you
> think it is
>
> I think you'll find your problem is not this well-used mechanism in the
> kernel.
>
> Dave

The very great problems that exist with poll on linux-2.6.0
are being quashed by those who just like to argue. Therefore,
I wrote some code that emulates the environment in which I
discovered the poll failure. Experts can decide whatever they
want about the inner workings of poll(). I supposed that if
`ps` showed that a task was sleeping in poll() then it must
be sleeping in poll(). So, even it that's wrong, here is
irrefutable proof that there is a problem with polling events
getting lost on 2.6.0. The attached code will execute without
errors in 2.4.24 and below.

Here is the test code running on Linux-2.4.24

Script started on Wed Mar 3 10:24:05 2004
# make
gcc -Wall -O2 -D__KERNEL__ -DMODULE -DMAJOR_NR=199 -I/usr/src/linux-2.4.24/include -c -o thingy.o thingy.c
as -o rtc_hwr.o rtc_hwr.S
ld -i -o driver.o thingy.o rtc_hwr.o
gcc -Wall -o tester -O2 tester.c
rm -f THING
mknod THING c 199 0
# insmod driver.o
# lsmod
Module Size Used by
driver 1928 0 (unused)
nfs 48008 0 (autoclean)
lockd 37308 0 (autoclean) [nfs]
sunrpc 66236 0 (autoclean) [nfs lockd]
vxidrvr 2748 0 (unused)
vximsg 5328 0 [vxidrvr]
ramdisk 5596 0 (unused)
ipchains 42204 0
nls_cp437 4376 4 (autoclean)
3c59x 28252 1
isofs 17368 0 (unused)
loop 8856 0
sr_mod 12188 0 (unused)
cdrom 27936 0 [sr_mod]
BusLogic 35832 7
sd_mod 10472 14
scsi_mod 56236 3 [sr_mod BusLogic sd_mod]
# ./tester

You have mail in /var/spool/mail/root
# exit
exit
Script done on Wed Mar 3 11:05:56 2004

This ran for about 1/2 hour ( 10:24 -> 11:05) with no
errors.

This is the same thing, running on this exact same machine, booted with
Linux-2.6.0-test8

Script started on Wed Mar 3 13:00:18 2004
# make
gcc -Wall -O2 -D__KERNEL__ -DMODULE -DMAJOR_NR=199 -I/usr/src/linux-2.6.0-test8/include -c -o thingy.o thingy.c
as -o rtc_hwr.o rtc_hwr.S
ld -i -o driver.o thingy.o rtc_hwr.o
gcc -Wall -o tester -O2 tester.c
rm -f THING
mknod THING c 199 0
# insmod driver.o
# ./tester
ERROR: (51 != 1) Lost events : 50
ERROR: (59371 != 59369) Lost events : 52
ERROR: (75501 != 75498) Lost events : 55
ERROR: (94511 != 94509) Lost events : 57
ERROR: (164660 != 164658) Lost events : 59
ERROR: (194867 != 194864) Lost events : 62
ERROR: (206892 != 206890) Lost events : 64
ERROR: (214232 != 214230) Lost events : 66
ERROR: (308340 != 308337) Lost events : 69
ERROR: (353062 != 353059) Lost events : 72
ERROR: (444541 != 444539) Lost events : 74
ERROR: (446516 != 446514) Lost events : 76
ERROR: (447906 != 447904) Lost events : 78

# exit
exit
Script done on Wed Mar 3 13:04:39 2004

It ran from 13:00 to 13:04, accumulating 78 errors.

A review of the test code shows that the driver uses IRQ8
from the RTC timer chip. The driver doesn't care if some
interrupt ticks are lost because it just increments a memory
variable every time that an interrupt occurs. After that variable
is incremented, the interrupt service routine sets a global
flag to POLLIN and sends a wake_up_interruptible() to whomever
is sleeping in poll(). Spin-locks are put around critical
sections to prevent undefined results.

In the user-mode code, the task sleeping in poll() reads
the variable using read(). The user-mode code doesn't
care what the value is, only that it must be one-greater than
the last one read. If it isn't, an error message is written
to standard error.

Observations:
If the call to mlockall() is removed in the test-code, the
behavior seems to be better, just the opposite of what one
would suspect.

The priority (nice value) doesn't seem to affect anything.

The code runs without any errors on 2.4.24, 2.4.22, 2.4.21, and
2.4.18. The only source I have on this machine for 2.6.x is
for 2.6.0-test8.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips).
Note 96.31% of all statistics are fiction.

Attachments:

modules.tar.gz (3.49 kB)

2004-03-03 19:29:30

by David Dillow

[permalink] [raw]

Subject: Re: poll() in 2.6 and beyond

On Wed, 2004-03-03 at 13:23, Richard B. Johnson wrote:
> The very great problems that exist with poll on linux-2.6.0
> are being quashed by those who just like to argue.

No, the argument has always been that your understanding of poll()'s
internals is not entirely correct. We have simply asked you to post code
that shows poll()'s problems, which you have finally provided. Sort of.

> Therefore,
> I wrote some code that emulates the environment in which I
> discovered the poll failure. Experts can decide whatever they
> want about the inner workings of poll(). I supposed that if
> `ps` showed that a task was sleeping in poll() then it must
> be sleeping in poll().

This we all agree on -- poll() sleeps. Duh. No argument there.
poll_wait() doesn't and never has, which was your original assertion.

But on to the code!

> So, even it that's wrong, here is
> irrefutable proof that there is a problem with polling events
> getting lost on 2.6.0.

Ahem, no, not so much. What you have here is proof that your user
program is not getting control again withing 0.488ms of the interrupt
happening. That does not mean poll() is loosing events.

You are definately seeing some significant latency -- 50 lost increments
is ~25ms.

What else is running when you perform this test? Can you repeat with a
more recent kernel? Can you repeat in single user mode, with it being
the only process present? With as few extra modules loaded as possible?

I still think your problem is not poll() -- if there were problems
there, bug reports would be coming out of the woodwork.
--
Dave Dillow <[email protected]>

2004-03-03 20:08:02

by Richard B. Johnson

[permalink] [raw]

Subject: Re: poll() in 2.6 and beyond

On Wed, 3 Mar 2004, Dave Dillow wrote:

> On Wed, 2004-03-03 at 13:23, Richard B. Johnson wrote:
> > The very great problems that exist with poll on linux-2.6.0
> > are being quashed by those who just like to argue.
>
> No, the argument has always been that your understanding of poll()'s
> internals is not entirely correct. We have simply asked you to post code
> that shows poll()'s problems, which you have finally provided. Sort of.
>
> > Therefore,
> > I wrote some code that emulates the environment in which I
> > discovered the poll failure. Experts can decide whatever they
> > want about the inner workings of poll(). I supposed that if
> > `ps` showed that a task was sleeping in poll() then it must
> > be sleeping in poll().
>
> This we all agree on -- poll() sleeps. Duh. No argument there.
> poll_wait() doesn't and never has, which was your original assertion.
>
> But on to the code!
>
> > So, even it that's wrong, here is
> > irrefutable proof that there is a problem with polling events
> > getting lost on 2.6.0.
>
> Ahem, no, not so much. What you have here is proof that your user
> program is not getting control again withing 0.488ms of the interrupt
> happening. That does not mean poll() is loosing events.
>

Well that should mean the same thing in the final wash.

> You are definately seeing some significant latency -- 50 lost increments
> is ~25ms.
>
> What else is running when you perform this test? Can you repeat with a
> more recent kernel? Can you repeat in single user mode, with it being
> the only process present? With as few extra modules loaded as possible?
>

I would need to install a more recent kernel and I think I should. That
will mean some re-write of the module code, so I am told. I wrote the
module and another Engineer, Terry Skillman, wrote the test-code and the
Makefile. He also performed the tests. He says that it will not compile
on a newer 2.6.x so I would have to do more work if true. I will
have to put that off until Monday because I have to take a work-break.

Nothing else is running. Although there is a network board, there is no
network line from the hub to that machine when the tests are made.

> I still think your problem is not poll() -- if there were problems
> there, bug reports would be coming out of the woodwork.
> --
> Dave Dillow <[email protected]>
>

Cheers,
Dick Johnson
Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips).
Note 96.31% of all statistics are fiction.

2004-03-03 22:20:08

by Linus Torvalds

[permalink] [raw]

Subject: Re: poll() in 2.6 and beyond

On Wed, 3 Mar 2004, Richard B. Johnson wrote:
>
> are being quashed by those who just like to argue. Therefore,
> I wrote some code that emulates the environment in which I
> discovered the poll failure.

No. I think you wrote some code that shows the bug you have.

Your "poll()" function IS BUGGY.

Look at this:

static size_t poll(struct file *fp, struct poll_table_struct *wait)
{
size_t poll_flag;
size_t flags;
DEB(printk(KERN_INFO"%s : poll() called\n", devname));
poll_wait(fp, &pwait, wait);
DEB(printk(KERN_INFO"%s : poll() returned\n", devname));
spin_lock_irqsave(&rtc_lock, flags);
poll_flag = global_poll_flag;
*** global_poll_flag = 0;
spin_unlock_irqrestore(&rtc_lock, flags);
return poll_flag;
}

you are clearing your own flag that says "there are events pending", so if
you call your "poll()" function twice, on the second time it will say
"there are no events pending".

You should clear the "events pending" flag only when you literally remove
the event (ie at "read()" time, not at "poll()" time). Because the
select() code _will_ call down to the "poll()" functions multiple times if
it gets woken up for any bogus reason.

See if that fixes anything.

It may well be that 2.6.x calls down to the low-level driver "poll()"
function more than it should. That would be a mis-feature, and worth
looking at, but I think you should try to fix your test first, since right
now the bug is questionable.

Linus

2004-03-03 22:40:45

by Richard B. Johnson

[permalink] [raw]

Subject: Re: poll() in 2.6 and beyond

On Wed, 3 Mar 2004, Linus Torvalds wrote:

>
>
> On Wed, 3 Mar 2004, Richard B. Johnson wrote:
> >
> > are being quashed by those who just like to argue. Therefore,
> > I wrote some code that emulates the environment in which I
> > discovered the poll failure.
>
> No. I think you wrote some code that shows the bug you have.
>
> Your "poll()" function IS BUGGY.
>
> Look at this:
>
> static size_t poll(struct file *fp, struct poll_table_struct *wait)
> {
> size_t poll_flag;
> size_t flags;
> DEB(printk(KERN_INFO"%s : poll() called\n", devname));
> poll_wait(fp, &pwait, wait);
> DEB(printk(KERN_INFO"%s : poll() returned\n", devname));
> spin_lock_irqsave(&rtc_lock, flags);
> poll_flag = global_poll_flag;
> *** global_poll_flag = 0;
> spin_unlock_irqrestore(&rtc_lock, flags);
> return poll_flag;
> }
>
> you are clearing your own flag that says "there are events pending", so if
> you call your "poll()" function twice, on the second time it will say
> "there are no events pending".
>

But I am clearing it under a spin-lock.

> You should clear the "events pending" flag only when you literally remove
> the event (ie at "read()" time, not at "poll()" time). Because the
> select() code _will_ call down to the "poll()" functions multiple times if
> it gets woken up for any bogus reason.

Oh wow. That is the BUG! I didn't know it could call in multiple times.

>
> See if that fixes anything.
>
> It may well be that 2.6.x calls down to the low-level driver "poll()"
> function more than it should. That would be a mis-feature, and worth
> looking at, but I think you should try to fix your test first, since right
> now the bug is questionable.
>
> Linus

And YES! If I clear the flag only after it is read. It "fixes" the
observed problem!

I don't know if it is a BUG or a FEATURE, but you put your
finger right on it! Thanks.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips).
Note 96.31% of all statistics are fiction.

2004-03-03 22:46:54

by Linus Torvalds

[permalink] [raw]

Subject: Re: poll() in 2.6 and beyond

On Wed, 3 Mar 2004, Linus Torvalds wrote:
>
> You should clear the "events pending" flag only when you literally remove
> the event (ie at "read()" time, not at "poll()" time). Because the
> select() code _will_ call down to the "poll()" functions multiple times if
> it gets woken up for any bogus reason.

Hmm.. The above is all still true and your poll() implementation is bad,
but looking at your test program, the problematic case really shouldn't
trigger (we should call poll() multiple times only when it returns zero).

To trigger that bug, you'd have to occasionally call poll() with the
POLLIN bit clear in the incoming events, which your test program doesn't
ever do.

Anyway, you should move the clearing to read(), but there may well be
something else going on too.

What's the frequency you are programming the thing to send interrupts at?

Linus

2004-03-03 23:03:39

by Richard B. Johnson

[permalink] [raw]

Subject: Re: poll() in 2.6 and beyond

On Wed, 3 Mar 2004, Linus Torvalds wrote:

>
>
> On Wed, 3 Mar 2004, Linus Torvalds wrote:
> >
> > You should clear the "events pending" flag only when you literally remove
> > the event (ie at "read()" time, not at "poll()" time). Because the
> > select() code _will_ call down to the "poll()" functions multiple times if
> > it gets woken up for any bogus reason.
>
> Hmm.. The above is all still true and your poll() implementation is bad,
> but looking at your test program, the problematic case really shouldn't
> trigger (we should call poll() multiple times only when it returns zero).
>
> To trigger that bug, you'd have to occasionally call poll() with the
> POLLIN bit clear in the incoming events, which your test program doesn't
> ever do.
>

Well, when I removed the local poll_flag, using only the global one,
and cleared it to zero after the long long was fetched under the lock
(in the read routine), my observed problems go away.

static size_t poll(struct file *fp, struct poll_table_struct *wait)
{
DEB(printk(KERN_INFO"%s : poll() called\n", devname));
poll_wait(fp, &pwait, wait);
DEB(printk(KERN_INFO"%s : poll() returned\n", devname));
return global_poll_flag;
}

static int read(struct file *fp, char *buf, size_t count, loff_t *ppos)
{
long long tmp;
size_t flags;
spin_lock_irqsave(&rtc_lock, flags);
tmp = rtc_tick;
global_poll_flag = 0;
spin_unlock_irqrestore(&rtc_lock, flags);
if(copy_to_user(buf, &tmp, sizeof(tmp)))
return -EFAULT;
return sizeof(tmp);
}

> Anyway, you should move the clearing to read(), but there may well be
> something else going on too.
>
> What's the frequency you are programming the thing to send interrupts at?
>

2048 ticks/second, trivial to handle.

> Linus
> -

Cheers,
Dick Johnson
Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips).
Note 96.31% of all statistics are fiction.

2004-03-03 23:08:33

by Linus Torvalds

[permalink] [raw]

Subject: Re: poll() in 2.6 and beyond

On Wed, 3 Mar 2004, Richard B. Johnson wrote:
>
> And YES! If I clear the flag only after it is read. It "fixes" the
> observed problem!

Ok. That solves one problem, but it does seem to point to the fact that
2.6.x calls down to poll() more than 2.4.x does.

It really _is_ normal to have multiple calls to "poll()" for the same fd
as a result of a single "poll()" system call, but if we actually get a
positive hit (ie the ->poll() call returns a bit that we consider to be a
success by comparing it to what we _wanted_ to see), then we should have
stopped doing that.

And since your test program always sets "POLLIN", any ->poll() call that
has the POLLIN flag set should have ended up being the last one (since it
would have marked a "success").

So there's still something I don't understand, and that seems to differ
between 2.4.x and 2.6.x. Can anybody else see it?

Linus