2005-10-15 09:58:30

by Gabriele Brugnoni

[permalink] [raw]
Subject: interruptible_sleep_on, interrupts and device drivers

Hello,

I'm writing a device driver for a UART to be used to drive a RS485 line.
I would use the interruptible_sleep_on function to wait for receiver ready,
or for the transmitter to finish.
I've take a look at how other device driver use this function, and a big
question is born on my mind:

In several device driver I've seen that:

The code check for the availability of data.
If not available, a call the interruptible_sleep_on is made.
In the interrupt handler, when data becomes ready, a wakeup_interruptible
call is made.

Often the variables that indicates if data is available or not are tested
between spin_lock and unlock (or save_irq and restore_irq) so that the
interrupt handler cannot change the status during check.

But:
If the test is made with IRQ closed, and IRQ are then enabled after the test
but before the call to interruptible_sleep_on, what happen if the handler
break the procedure immediately before entering the interruptible_sleep_on
function ?
I beleave that the interrupt handler, calling the wakeup function, will not
wake our process, because is not in the waiting list. But at return from
IRQ handler, the process will continue execution calling the sleep
function, and nobody will wake it because the data is now already available.

I try to explain better with an example:

This may be a function that should wait that the UART send all chars.
When the interrupt for shift empty will be serviced, the
wakeup_interruptible function will wake all waiting processes in wait list.


10: void waitEndTX ( void )
11: {
12: unsigned long flags;
13:
14: //....
15:
16: spin_lock_irqsave(&mylock, flags);
17: if( tx_done ) {
18: spin_lock_irqrestore(&mylock, flags);
19: return;
20: }
21: spin_lock_irqrestore(&mylock, flags);
22: interruptible_sleep_on ( &uartwait );
23:}


This type of code is very used in a lot of device drivers, in kernel 2.6.x
(the char/specialix, for example).

OK, Now we suppose that the UART finish to send the character exactely when
the process has closed the IRQ and is about to the the tx_done variable.
(At line 17 for example).
The hardware will asserts the IRQ line, but the processor has the IRQ
closed, so the execution will continue in our process.
Now at line 21 our process enable the IRQ.
The CPU will immediately jumps to irq handler, breaking the execution at
line 22. Our handler will set the variable tx_done to one.
When the cpu control returns to our calling process, at line 21, the tx_done
has already tested and it's value was zero. The execution continue with the
call to interruptible_sleep_on function, that will put our process in the
uartwait queue, but nobody will wakeup it.

If this is true, a lot of device driver in linux kernel may have this
problem. This sound me very strange, so where I'm wrong ?


Thanks in advance.
Gabriele

(Note: I've tried to post this message with another program but I think
without success. I apologize if this message will appears twice.)


2005-10-15 10:05:42

by Arjan van de Ven

[permalink] [raw]
Subject: Re: interruptible_sleep_on, interrupts and device drivers

On Sat, 2005-10-15 at 12:00 +0200, Gabriele Brugnoni wrote:
> Hello,
>
> I'm writing a device driver for a UART to be used to drive a RS485 line.
> I would use the interruptible_sleep_on function to wait for receiver ready,
> or for the transmitter to finish.

don't.

interruptible_sleep_on() is a broken interface (see the comments in the
header) and should not be used in any new code (where "new" is "since
the year 2000" :)

Just use the wait_event() interfaces .... or just a simple semaphore
even if what you want to do is simple and performance isn't too
critical.


2005-10-15 10:27:31

by Gabriele Brugnoni

[permalink] [raw]
Subject: Re: interruptible_sleep_on, interrupts and device drivers

Arjan van de Ven wrote:

>
> don't.
>
> interruptible_sleep_on() is a broken interface (see the comments in the
> header) and should not be used in any new code (where "new" is "since
> the year 2000" :)
>
> Just use the wait_event() interfaces .... or just a simple semaphore
> even if what you want to do is simple and performance isn't too
> critical.
>

OK, i'll not use, but the kernel has a lot of device drivers using it, that
may present the problem explained in my message.

In my code i've try the following:

save_flags(flags); cli();
if( !rs.txdone ) {
if( arg < 0 ) arg = rs.ttimeout;
if( arg > 0 )
interruptible_sleep_on_timeout ( &rs.txwait, arg );
else
interruptible_sleep_on ( &rs.txwait );
}
restore_flags(flags);
return rs.txdone;

Before testing the flag, interrupt are closed and if the TX is not terminated,
the interruptible_sleep_on_timeout will be called with interrupt DISABLED.
This is not a problem, because the scheduler will switch to another process,
and the new process will enable again the IRQ. When the scheduler will give
back the control to my process, it will continue with interrupt disabled, and
will be reenabled before exit.
This seems to work very fine.

Calling the waiting function with IRQ enabled may expose the device driver to
the risk of infinite wait (until a signal, obvious).


Thanks
Gabriele

2005-10-15 10:36:40

by Arjan van de Ven

[permalink] [raw]
Subject: Re: interruptible_sleep_on, interrupts and device drivers

On Sat, 2005-10-15 at 12:29 +0200, Gabriele Brugnoni wrote:
> Arjan van de Ven wrote:
>
> >
> > don't.
> >
> > interruptible_sleep_on() is a broken interface (see the comments in the
> > header) and should not be used in any new code (where "new" is "since
> > the year 2000" :)
> >
> > Just use the wait_event() interfaces .... or just a simple semaphore
> > even if what you want to do is simple and performance isn't too
> > critical.
> >
>
> OK, i'll not use, but the kernel has a lot of device drivers using it, that
> may present the problem explained in my message.
>
> In my code i've try the following:
>
> save_flags(flags); cli();

this is broken code; cli() cannot and should not be used. (and isn't
even available on SMP kernels anymore)

> if( !rs.txdone ) {
> if( arg < 0 ) arg = rs.ttimeout;
> if( arg > 0 )
> interruptible_sleep_on_timeout ( &rs.txwait, arg );
> else
> interruptible_sleep_on ( &rs.txwait );
> }
> restore_flags(flags);

and this is missing a sti()

> return rs.txdone;


your code is just wrong. Yes I know many really old drivers do this, but
those are getting extinct fast..

2005-10-15 10:57:32

by Russell King

[permalink] [raw]
Subject: Re: interruptible_sleep_on, interrupts and device drivers

On Sat, Oct 15, 2005 at 12:36:29PM +0200, Arjan van de Ven wrote:
> On Sat, 2005-10-15 at 12:29 +0200, Gabriele Brugnoni wrote:
> > save_flags(flags); cli();
>
> this is broken code; cli() cannot and should not be used. (and isn't
> even available on SMP kernels anymore)
>
> > if( !rs.txdone ) {
> > if( arg < 0 ) arg = rs.ttimeout;
> > if( arg > 0 )
> > interruptible_sleep_on_timeout ( &rs.txwait, arg );
> > else
> > interruptible_sleep_on ( &rs.txwait );
> > }
> > restore_flags(flags);
>
> and this is missing a sti()

Err, no, that's wrong. sti() unconditionally enables interrupts - if
an sti() was placed here, it's pointless using save_flags() (which
saves the old interrupt enable state) and restore_flags() (which
restores the interrupt enable state).

Also remember that interruptible_sleep_* is only safe on UP machines
provided it's called with interrupts disabled.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2005-10-15 13:01:10

by Tushar Adeshara

[permalink] [raw]
Subject: Re: interruptible_sleep_on, interrupts and device drivers

On 10/15/05, Gabriele Brugnoni <[email protected]> wrote:
> Hello,
>
> I'm writing a device driver for a UART to be used to drive a RS485 line.
> I would use the interruptible_sleep_on function to wait for receiver ready,
> or for the transmitter to finish.
> I've take a look at how other device driver use this function, and a big
> question is born on my mind:
>
> In several device driver I've seen that:
>
> The code check for the availability of data.
> If not available, a call the interruptible_sleep_on is made.
> In the interrupt handler, when data becomes ready, a wakeup_interruptible
> call is made.
>
> Often the variables that indicates if data is available or not are tested
> between spin_lock and unlock (or save_irq and restore_irq) so that the
> interrupt handler cannot change the status during check.
>
> But:
> If the test is made with IRQ closed, and IRQ are then enabled after the test
> but before the call to interruptible_sleep_on, what happen if the handler
> break the procedure immediately before entering the interruptible_sleep_on
> function ?
> I beleave that the interrupt handler, calling the wakeup function, will not
> wake our process, because is not in the waiting list. But at return from
> IRQ handler, the process will continue execution calling the sleep
> function, and nobody will wake it because the data is now already available.
>
> I try to explain better with an example:
>
> This may be a function that should wait that the UART send all chars.
> When the interrupt for shift empty will be serviced, the
> wakeup_interruptible function will wake all waiting processes in wait list.
>
>
> 10: void waitEndTX ( void )
> 11: {
> 12: unsigned long flags;
> 13:
> 14: //....
> 15:
> 16: spin_lock_irqsave(&mylock, flags);
> 17: if( tx_done ) {
> 18: spin_lock_irqrestore(&mylock, flags);
> 19: return;
> 20: }
> 21: spin_lock_irqrestore(&mylock, flags);
> 22: interruptible_sleep_on ( &uartwait );
> 23:}
>
>
> This type of code is very used in a lot of device drivers, in kernel 2.6.x
> (the char/specialix, for example).
>
> OK, Now we suppose that the UART finish to send the character exactely when
> the process has closed the IRQ and is about to the the tx_done variable.
> (At line 17 for example).
> The hardware will asserts the IRQ line, but the processor has the IRQ
> closed, so the execution will continue in our process.
> Now at line 21 our process enable the IRQ.
> The CPU will immediately jumps to irq handler, breaking the execution at
> line 22. Our handler will set the variable tx_done to one.
> When the cpu control returns to our calling process, at line 21, the tx_done
> has already tested and it's value was zero. The execution continue with the
> call to interruptible_sleep_on function, that will put our process in the
> uartwait queue, but nobody will wakeup it.
>
> If this is true, a lot of device driver in linux kernel may have this
> problem. This sound me very strange, so where I'm wrong ?

That is the reason why wait_event and friends are introduced.
See LDD3 , Ch 10. Interrupt Handling. It contains a good example
of simple char driver which also takes care of missed Interrupt.

>
>
> Thanks in advance.
> Gabriele
>
> (Note: I've tried to post this message with another program but I think
> without success. I apologize if this message will appears twice.)
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>


--
Regards,
Tushar
--------------------
It's not a problem, it's an opportunity for improvement. Lets improve.

2005-10-15 20:23:14

by Robert Hancock

[permalink] [raw]
Subject: Re: interruptible_sleep_on, interrupts and device drivers

Gabriele Brugnoni wrote:
> But:
> If the test is made with IRQ closed, and IRQ are then enabled after the test
> but before the call to interruptible_sleep_on, what happen if the handler
> break the procedure immediately before entering the interruptible_sleep_on
> function ?
> I beleave that the interrupt handler, calling the wakeup function, will not
> wake our process, because is not in the waiting list. But at return from
> IRQ handler, the process will continue execution calling the sleep
> function, and nobody will wake it because the data is now already available.

This is why interruptible_sleep_on is deprecated and should not be used
anymore. The wait_event_interruptible, etc. functions avoid this race
since the condition is tested after the caller is put into the wait
queue, so if the condition is true you are guaranteed to be woken up.

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/