The macro spin_event_timeout() takes a condition and timeout value
(in microseconds) as parameters. It spins until either the condition is true
or the timeout expires. It returns zero if the timeout expires first, non-zero
otherwise.
This primary purpose of this macro is to poll on a hardware register until a
status bit changes. The timeout ensures that the loop still terminates if the
bit doesn't change as expected. This macro makes it easier for driver
developers to perform this kind of operation properly.
Signed-off-by: Timur Tabi <[email protected]>
---
v4: removed cpu_relax (redundant), changed timeout to unsigned long
v3: eliminated secondary evaluation of condition, replaced jiffies with udelay
v2: added cpu_relax and time_before
include/linux/delay.h | 27 +++++++++++++++++++++++++++
1 files changed, 27 insertions(+), 0 deletions(-)
diff --git a/include/linux/delay.h b/include/linux/delay.h
index fd832c6..4af6df6 100644
--- a/include/linux/delay.h
+++ b/include/linux/delay.h
@@ -51,4 +51,31 @@ static inline void ssleep(unsigned int seconds)
msleep(seconds * 1000);
}
+/**
+ * spin_event_timeout - spin until a condition gets true or a timeout elapses
+ * @condition: a C expression to evalate
+ * @timeout: timeout, in microseconds
+ *
+ * The process spins until the @condition evaluates to true (non-zero) or
+ * the @timeout elapses.
+ *
+ * This primary purpose of this macro is to poll on a hardware register
+ * until a status bit changes. The timeout ensures that the loop still
+ * terminates if the bit never changes.
+ *
+ * The return value is the number of microseconds left in the timeout, so if
+ * the return value is non-zero, then it means the condition is true.
+ *
+ * Short-circuit evaluation in the while-loop ensures that if the condition
+ * becomes true exactly when the timeout expires, non-zero will still be
+ * returned.
+ */
+#define spin_event_timeout(condition, timeout) \
+({ \
+ unsigned long __timeout = timeout; \
+ while (!(condition) && --__timeout) \
+ udelay(1); \
+ __timeout; \
+})
+
#endif /* defined(_LINUX_DELAY_H) */
--
1.6.1.3
> This primary purpose of this macro is to poll on a hardware register until a
> status bit changes. The timeout ensures that the loop still terminates if the
> bit doesn't change as expected. This macro makes it easier for driver
> developers to perform this kind of operation properly.
NAK this - on a lot of platforms 1uS is the wrong timescale. Also we
shouldn't be encouraging this kind of polling by making it very easy to
write.
It might be a useful internal macro for some freescale drivers but if so
it doesn't belong in the core headers
Alan
Alan Cox wrote:
> NAK this - on a lot of platforms 1uS is the wrong timescale. Also we
> shouldn't be encouraging this kind of polling by making it very easy to
> write.
Well, I can agree that the time scale might be wrong on some platforms.
The original version of spin_event_timeout() used jiffies, but some
people said that a jiffy is too long of a timescale, so I changed it to
udelay.
However, I disagree about the encouragement part. Polling a register
until a status bit changes is a common task that cannot be handled any
other way. If the status bit change does not generate an interrupt, but
the wait for the change takes a few microseconds, what else are you
going to do?
The way I see it, I'm just extending the idea behind cpu_relax(). Just
doing a search for cpu_relax shows dozens, maybe hundreds, of drivers
doing stuff like this:
while((inb(ioaddr+DAYNA_CARD_STATUS)&DAYNA_TX_READY)==0)
cpu_relax();
This code doesn't even have a timeout! In fact, I'd say that at least
90% of all uses of cpu_relax() are in a while loop reading some register
without a timeout.
Ironically, in the situations where there is a timeout, the drivers use
jiffies, not a delay.
Frankly, I just don't see how spin_event_timeout() is not an improvement
over the current code that drivers use.
--
Timur Tabi
Linux kernel developer at Freescale
On Tue, Mar 10, 2009 at 3:50 PM, Timur Tabi <[email protected]> wrote:
> Alan Cox wrote:
>
>> NAK this - on a lot of platforms 1uS is the wrong timescale. Also we
>> shouldn't be encouraging this kind of polling by making it very easy to
>> write.
>
> Well, I can agree that the time scale might be wrong on some platforms.
> ?The original version of spin_event_timeout() used jiffies, but some
> people said that a jiffy is too long of a timescale, so I changed it to
> udelay.
The correct timescale is rather application dependant - for some
accesses that cross clock domains it can be a requirement to wait for
a small number of core clock cycles (2 - 20) for a condition to become
true, for others, e.g. PIO, it is more appropriate to wait for a few
100 cycles.
Will Newton wrote:
> The correct timescale is rather application dependant - for some
> accesses that cross clock domains it can be a requirement to wait for
> a small number of core clock cycles (2 - 20) for a condition to become
> true, for others, e.g. PIO, it is more appropriate to wait for a few
> 100 cycles.
The timeout is only needed to avoid hangs in the driver. If the
response normally comes within 20 clocks, but you waited two
milliseconds until you gave up, that's not a bad thing. At least after
two milliseconds you've aborted the loop and returned an error.
--
Timur Tabi
Linux kernel developer at Freescale
On Tue, Mar 10, 2009 at 9:30 AM, Timur Tabi <[email protected]> wrote:
> The macro spin_event_timeout() takes a condition and timeout value
> (in microseconds) as parameters. ?It spins until either the condition is true
> or the timeout expires. ?It returns zero if the timeout expires first, non-zero
> otherwise.
>
> This primary purpose of this macro is to poll on a hardware register until a
> status bit changes. ?The timeout ensures that the loop still terminates if the
> bit doesn't change as expected. ?This macro makes it easier for driver
> developers to perform this kind of operation properly.
>
> Signed-off-by: Timur Tabi <[email protected]>
> ---
>
> v4: removed cpu_relax (redundant), changed timeout to unsigned long
>
> v3: eliminated secondary evaluation of condition, replaced jiffies with udelay
>
> v2: added cpu_relax and time_before
>
> ?include/linux/delay.h | ? 27 +++++++++++++++++++++++++++
> ?1 files changed, 27 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/delay.h b/include/linux/delay.h
> index fd832c6..4af6df6 100644
> --- a/include/linux/delay.h
> +++ b/include/linux/delay.h
> @@ -51,4 +51,31 @@ static inline void ssleep(unsigned int seconds)
> ? ? ? ?msleep(seconds * 1000);
> ?}
>
> +/**
> + * spin_event_timeout - spin until a condition gets true or a timeout elapses
> + * @condition: a C expression to evalate
> + * @timeout: timeout, in microseconds
> + *
> + * The process spins until the @condition evaluates to true (non-zero) or
> + * the @timeout elapses.
> + *
> + * This primary purpose of this macro is to poll on a hardware register
> + * until a status bit changes. ?The timeout ensures that the loop still
> + * terminates if the bit never changes.
> + *
> + * The return value is the number of microseconds left in the timeout, so if
> + * the return value is non-zero, then it means the condition is true.
> + *
> + * Short-circuit evaluation in the while-loop ensures that if the condition
> + * becomes true exactly when the timeout expires, non-zero will still be
> + * returned.
> + */
> +#define spin_event_timeout(condition, timeout) ? ? ? ? ? ? ? ? ? ? ? ? \
> +({ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? \
> + ? ? ? unsigned long __timeout = timeout; ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?\
> + ? ? ? while (!(condition) && --__timeout) ? ? ? ? ? ? ? ? ? ? ? ? ? ? \
> + ? ? ? ? ? ? ? udelay(1); ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?\
Do you really want to do a udelay() here? Or any other kind of delay
for that matter? When using a delay then, unless the condition is
immediately true, the code will burn a minimum of 1us of cycles, even
if the condition does become true right after the first read. If the
CPU is spinning anyway, then it should be doing using those cycles to
try and bail out as soon as possible.
I typically use something in the form of this pattern on powerpc code
when I absolutely have to busywait.
int end_ts = get_tbl() + some_delay; /* note that this is signed */
while (end_ts - get_tbl() > 0)
if (condition)
break;
This macro also looks like a tempting unbounded latency tarpit to fall
into when people start using it in critical regions, but that goes
for code using a non-timed-out busywait also.
g.
--
Grant Likely, B.Sc., P.Eng.
Secret Lab Technologies Ltd.
Grant Likely wrote:
> I typically use something in the form of this pattern on powerpc code
> when I absolutely have to busywait.
> int end_ts = get_tbl() + some_delay; /* note that this is signed */
> while (end_ts - get_tbl() > 0)
> if (condition)
> break;
Funny you mention this, because I was just about to implement the same
thing. Of course, it will be a PowerPC-only function, but I suppose
that's its fate regardless.
> This macro also looks like a tempting unbounded latency tarpit to fall
> into when people start using it in critical regions, but that goes
> for code using a non-timed-out busywait also.
I can't fix everything!
--
Timur Tabi
Linux kernel developer at Freescale
On Tue, 2009-03-10 at 15:35 +0000, Alan Cox wrote:
> > This primary purpose of this macro is to poll on a hardware register until a
> > status bit changes. The timeout ensures that the loop still terminates if the
> > bit doesn't change as expected. This macro makes it easier for driver
> > developers to perform this kind of operation properly.
>
> NAK this - on a lot of platforms 1uS is the wrong timescale. Also we
> shouldn't be encouraging this kind of polling by making it very easy to
> write.
>
> It might be a useful internal macro for some freescale drivers but if so
> it doesn't belong in the core headers
I don't totally agree with your reasoning here Alan.
A simple fact of life is that drivers -will- do that sort spinning. They
don't always have a choice. Now do we want all drivers to do it
differently and get it wrong (such as not having timeouts etc...) or do
we provide a helper that has the added advantage of allowing us a lot
more easily to audit them ?
Hell, we can even make the helper warn if called with a too high timeout
value or that sort of thing...
I think it's all benefit to move that sort of cruft to a generic helper
like that in the long run.
Cheers,
Ben.
> A simple fact of life is that drivers -will- do that sort spinning. They
> don't always have a choice. Now do we want all drivers to do it
> differently and get it wrong (such as not having timeouts etc...) or do
> we provide a helper that has the added advantage of allowing us a lot
> more easily to audit them ?
Given the proposed helper isn't a sane default for x86 I think it needs a
good deal more work. It also hides details like that timing which is bad
sometimes.
> I think it's all benefit to move that sort of cruft to a generic helper
> like that in the long run.
Perhaps - but the helper needs to be right
On Tue, Mar 10, 2009 at 7:37 PM, Alan Cox <[email protected]> wrote:
> Given the proposed helper isn't a sane default for x86 I think it needs a
> good deal more work.
Alan, I'm happy to put whatever effort is requirement to make this
code satisfactory. I believe that this is a useful macro to have in
the kernel. I do appreciate the attention you are giving it.
> It also hides details like that timing which is bad
> sometimes.
Are you talking about the udelay() inside the loop? If so, I agree
that this is bad and have removed it in the PowerPC-specific version:
#define spin_event_timeout(condition, timeout) \
({ \
unsigned long __start = get_tbl(); \
unsigned long __loops = tb_ticks_per_usec * timeout; \
int __ret = 1; \
while (!(condition)) { \
if (tb_ticks_since(__start) > __loops) { \
__ret = 0; \
break; \
} \
cpu_relax(); \
} \
__ret; \
})
tb_ticks_since() is a front-end to get_tbl(), which is similar to the
rdtsc instruction. In this case, we're not adding arbitrary delays
into the loop, and we're not using jiffies, but we are
architecture-dependent.
--
Timur Tabi
Linux kernel developer at Freescale
> Are you talking about the udelay() inside the loop? If so, I agree
> that this is bad and have removed it in the PowerPC-specific version:
The behaviour you want there is system specific - 10uS is a minimum
politeness value for x86 PCI bus for example.
> rdtsc instruction. In this case, we're not adding arbitrary delays
> into the loop, and we're not using jiffies, but we are
> architecture-dependent.
and not useful
A macro of this form really needs to be able to look like
spin_until_timeout(readb(foo) & 0x80, 30 * HZ) {
udelay(10);
/* Maybe do other stuff */
}
to be more generally useful
Alan Cox wrote:
>> Are you talking about the udelay() inside the loop? If so, I agree
>> that this is bad and have removed it in the PowerPC-specific version:
>
> The behaviour you want there is system specific - 10uS is a minimum
> politeness value for x86 PCI bus for example.
So we need to allow for delays between successive rights? We can
provide that with a third parameter to the macro.
>> rdtsc instruction. In this case, we're not adding arbitrary delays
>> into the loop, and we're not using jiffies, but we are
>> architecture-dependent.
>
> and not useful
Is there an architecture-independent method for reading a timebase
register that's not jiffies?
> A macro of this form really needs to be able to look like
>
> spin_until_timeout(readb(foo) & 0x80, 30 * HZ) {
> udelay(10);
> /* Maybe do other stuff */
> }
>
> to be more generally useful
You mean something like this:
#define spin_until_timeout(condition, timeout) \
for (unsigned long __timeout = jiffies + (timeout); \
(!(condition) && time_after(jiffies, __timeout)); )
How do I return a value indicating whether a timeout occurred or
condition came true?
--
Timur Tabi
Linux kernel developer at Freescale
On Wed, 2009-03-11 at 13:18 -0500, Timur Tabi wrote:
> Alan Cox wrote:
> >> Are you talking about the udelay() inside the loop? If so, I agree
> >> that this is bad and have removed it in the PowerPC-specific version:
> >
> > The behaviour you want there is system specific - 10uS is a minimum
> > politeness value for x86 PCI bus for example.
>
> So we need to allow for delays between successive rights? We can
> provide that with a third parameter to the macro.
I prefer Alan's method of having the macro be followed by { and } so we
can add things in there. The delay between access will often be somewhat
platform or device specific, and some drivers might be able to do useful
things while spinning.
The other big advantage of that approach is that drivers that aren't in
an atomic section can use msleep() and allow the kernel to schedule on
that processor.
> Is there an architecture-independent method for reading a timebase
> register that's not jiffies?
Not really since there isn't an architecture independent "timebase
register" anyway. But Jiffies for the timeout value doesn't sound
necessarily -that- bad.
Cheers,
Ben.
On Wed, Mar 11, 2009 at 3:58 PM, Benjamin Herrenschmidt
<[email protected]> wrote:
> On Wed, 2009-03-11 at 13:18 -0500, Timur Tabi wrote:
>> Alan Cox wrote:
>> >> Are you talking about the udelay() inside the loop? ?If so, I agree
>> >> that this is bad and have removed it in the PowerPC-specific version:
>> >
>> > The behaviour you want there is system specific - 10uS is a minimum
>> > politeness value for x86 PCI bus for example.
>>
>> So we need to allow for delays between successive rights? ?We can
>> provide that with a third parameter to the macro.
>
> I prefer Alan's method of having the macro be followed by { and } so we
> can add things in there. The delay between access will often be somewhat
> platform or device specific, and some drivers might be able to do useful
> things while spinning.
>
> The other big advantage of that approach is that drivers that aren't in
> an atomic section can use msleep() and allow the kernel to schedule on
> that processor.
Ack! I totally agree.
g.
--
Grant Likely, B.Sc., P.Eng.
Secret Lab Technologies Ltd.
On Wed, Mar 11, 2009 at 9:45 PM, Grant Likely <[email protected]> wrote:
>> The other big advantage of that approach is that drivers that aren't in
>> an atomic section can use msleep() and allow the kernel to schedule on
>> that processor.
>
> Ack! I totally agree.
I'm glad everyone agrees. I still don't know how to solve the
problem, though. I came up with this:
#define spin_until_timeout(condition, timeout) \
for (unsigned long __timeout = jiffies + (timeout); \
(!(condition) && time_after(jiffies, __timeout)); )
Now how do I modify this so that the caller knows whether the loop
terminated because of a timeout or the condition became true?
--
Timur Tabi
Linux kernel developer at Freescale
On Thu, Mar 12, 2009 at 9:54 AM, Timur Tabi <[email protected]> wrote:
> On Wed, Mar 11, 2009 at 9:45 PM, Grant Likely <[email protected]> wrote:
>
>>> The other big advantage of that approach is that drivers that aren't in
>>> an atomic section can use msleep() and allow the kernel to schedule on
>>> that processor.
>>
>> Ack! ?I totally agree.
>
> I'm glad everyone agrees. ?I still don't know how to solve the
> problem, though. ?I came up with this:
>
> #define spin_until_timeout(condition, timeout) ? ? ? ? ?\
> ? ? ? for (unsigned long __timeout = jiffies + (timeout); ? ? \
> ? ? ? ? ? ? ? (!(condition) && time_after(jiffies, __timeout)); )
>
> Now how do I modify this so that the caller knows whether the loop
> terminated because of a timeout or the condition became true?
How about this:
#define spin_until_timeout(condition, timeout, rc) \
for (unsigned long __timeout = jiffies + (timeout); \
(!(rc = (condition)) && time_after(jiffies, __timeout)); )
g.
--
Grant Likely, B.Sc., P.Eng.
Secret Lab Technologies Ltd.
Grant Likely wrote:
> #define spin_until_timeout(condition, timeout, rc) \
> for (unsigned long __timeout = jiffies + (timeout); \
> (!(rc = (condition)) && time_after(jiffies, __timeout)); )
Ooo, that's good.
I'm still not crazy about using jiffies, since it doesn't get
incremented when interrupts are disabled, and I'd like this function to
work in those cases. How about get_cycles()? I know it's not supported
on all platforms, but I'm willing to live with that.
The other problem with get_cycles() is that there doesn't appear to be a
num_cycles_per_usec() function, so there's no way for me to scale the
count to a fixed time period.
--
Timur Tabi
Linux kernel developer at Freescale
On Thu, 2009-03-12 at 11:19 -0500, Timur Tabi wrote:
> Grant Likely wrote:
>
> > #define spin_until_timeout(condition, timeout, rc) \
> > for (unsigned long __timeout = jiffies + (timeout); \
> > (!(rc = (condition)) && time_after(jiffies, __timeout)); )
>
> Ooo, that's good.
>
> I'm still not crazy about using jiffies, since it doesn't get
> incremented when interrupts are disabled, and I'd like this function to
> work in those cases. How about get_cycles()? I know it's not supported
> on all platforms, but I'm willing to live with that.
>
> The other problem with get_cycles() is that there doesn't appear to be a
> num_cycles_per_usec() function, so there's no way for me to scale the
> count to a fixed time period.
sched_clock() does that, but:
- it falls back to jiffies on poor platforms
- it requires to be called with IRQs disabled
- it can basically jump any random way on funky hardware
then there is cpu_clock(int cpu):
- still falls back to jiffies on poor platforms
- is monotonic when used on the same cpu
- can drift up to a few jiffies when used between cpus
But something that seems to always work, is simply count loops and rely
on whatever delay is in the specified loop.
#define spin_until_timeout(condition, timeout, rc) \
for (unsigned long __timeout = 0; \
!(rc = (condition)) && __timeout < (timeout); \
__timeout++)
On Thu, Mar 12, 2009 at 11:50 AM, Peter Zijlstra <[email protected]> wrote:
> sched_clock() does that, but:
> - it falls back to jiffies on poor platforms
I think it's ok to fall back to jiffies where necessary, but these two
functions are too heavy-weight for my tastes.
> But something that seems to always work, is simply count loops and rely
> on whatever delay is in the specified loop.
>
> #define spin_until_timeout(condition, timeout, rc) \
> for (unsigned long __timeout = 0; \
> !(rc = (condition)) && __timeout < (timeout); \
> __timeout++)
But that's the thing - I don't want a required delay inside the loop.
I guess I'm going to have to think about this for a while. I'd like
to see something like cycles_per_usec() as a companion function to
get_cycles().
--
Timur Tabi
Linux kernel developer at Freescale
> But that's the thing - I don't want a required delay inside the loop.
>
> I guess I'm going to have to think about this for a while. I'd like
> to see something like cycles_per_usec() as a companion function to
> get_cycles().
I think that's where you're wrong :-)
Just require the delay inside the loop, it will make everything nicer.
There are also some good reasons to do that:
- The delay between "polls" of the register may have to be controlled,
for example some HW will choke if polled too fast
- If you aren't in an atomic section, you may want to use msleep() and
thus be schedule friendly
- It fixes all the problems mentioned earlier
Cheers,
Ben.
On Thu, Mar 12, 2009 at 9:03 PM, Benjamin Herrenschmidt
<[email protected]> wrote:
>
>> But that's the thing - I don't want a required delay inside the loop.
>>
>> I guess I'm going to have to think about this for a while. ?I'd like
>> to see something like cycles_per_usec() as a companion function to
>> get_cycles().
>
> I think that's where you're wrong :-)
And I disagree. :-)
> Just require the delay inside the loop, it will make everything nicer.
> There are also some good reasons to do that:
>
> ?- The delay between "polls" of the register may have to be controlled,
> for example some HW will choke if polled too fast
Yes, but not all HW does. There have been several times that I've
needed to spin on a register without any delay. Requiring the loop
block to include an explicit delay nullifies most of usefulness to me.
> ?- If you aren't in an atomic section, you may want to use msleep() and
> thus be schedule friendly
This I agree with, and I see a real use for. However, if the contents
of the block don't have any form of constant delay, then it is less
clear how long the loop is going to run for.
OTOH, for the kind of delays that I see myself using it for, it if
doesn't have a resolution in the range of timebase ticks, then it
probably isn't going to be useful for me. I certainly am not
interested in spinning for more than a handful of ticks if I can help
it.
> ?- It fixes all the problems mentioned earlier
On another note; I'd consider calling it loop_event_timeout() instead
of spin_event_timeout() since it would allow the block contents to
sleep, and hence it wouldn't be spinning anymore. :-)
g.
--
Grant Likely, B.Sc., P.Eng.
Secret Lab Technologies Ltd.