2010-07-28 19:33:18

by Patrick Pannuto

[permalink] [raw]
Subject: [PATCH v2 0/4] timer: Added usleep[_range] timer

After writing both documentation and a checkpatch rule explaining
why the usleep API should never be used, it occurred to me that
perhaps such an API should never be added :) - at least not in its
previous form.

This iteration is similar, with the notable difference that now
usleep has a "built-in slack" of 200%. This is analogous to msleep,
which has a built-in slack of 0.4% (since it relies on legacy timers,
which have a built-in slack of 0.4%). 200% slack is significantly
greater than 0.4%, but the scale of usleep is also significantly
different than that of msleep, and I believe 200% to be a sane
default.

It is my opinion that this interface will most often mirror what
developers actually intend - indeed some people who have begun
trying to use the API raised this point -, however, I would like
some input as it is possibly confusing that the API will "double
your sleep" by default.

The usleep_range API is still included, since it provides an
interface to override the "default slack" of 200% by providing
an explicit range, or to allow callers to specify an even larger
slack if possible.


This patch series is NOT based off of tip (read: it will conflict
with the previous usleep patch in -mm) since it is slighly
different, and Andrew Morton lamented to loss of the detailed
changelog info in the previous iteration - it is included here.

This series also includes the "full set", that is
1: Adds the usleep[_range] API
2: Adds timers-howto documentation
3: Checkpatch: prefer usleep over udelay
4: Checkpatch: warn about unexpectedly long msleep's


*** Changes in v2
* Add "default slack" to usleep
* Fix missing usec->nsec for delta in do_usleep_range
* Fix Documentation typos
* Better checkpatch regex's


2010-07-28 19:33:25

by Patrick Pannuto

[permalink] [raw]
Subject: [PATCH 4/4] Checkpatch: warn about unexpectedly long msleep's

As explained in Documentation/timers/delays.txt, msleep's
of < 20ms may sleep for as long as 20ms. Caller's of
msleep(1) or msleep(2), etc are likely not to expect this
quirky behavior - warn them.

Signed-off-by: Patrick Pannuto <[email protected]>
---
scripts/checkpatch.pl | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 892ae62..0ca2ea5 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2578,6 +2578,13 @@ sub process {
}
}

+# warn about unexpectedly long msleep's
+ if ($line =~ /\bmsleep\s*\((\d+)\);/) {
+ if ($1 < 20) {
+ WARN("msleep < 20ms can sleep for up to 20ms; see Documentation/timers/delays.txt\n" . $line);
+ }
+ }
+
# warn about #ifdefs in C files
# if ($line =~ /^.\s*\#\s*if(|n)def/ && ($realfile =~ /\.c$/)) {
# print "#ifdef in C files should be avoided\n";
--
1.7.2

2010-07-28 19:33:22

by Patrick Pannuto

[permalink] [raw]
Subject: [PATCH 1/4] timer: Added usleep[_range] timer

usleep[_range] are finer precision implementations of msleep
and are designed to be drop-in replacements for udelay where
a precise sleep / busy-wait is unnecessary. They also allow
an easy interface to specify slack when a precise (ish)
wakeup is unnecessary to help minimize wakeups

By default, usleep introduces a "slack" of twice the requested
duration. If callers require a more precise sleep (?), the
provided usleep_range call can be used to provide a tighter,
or much looser, bound.

*** INTRO ***

As discussed here ( http://lkml.org/lkml/2007/8/3/250 ), msleep(1) is not
precise enough for many drivers (yes, sleep precision is an unfair notion,
but consistently sleeping for ~an order of magnitude greater than requested
is worth fixing). This patch adds a usleep API so that udelay does not have
to be used. Obviously not every udelay can be replaced (those in atomic
contexts or being used for simple bitbanging come to mind), but there are
many, many examples of

mydriver_write(...)
/* Wait for hardware to latch */
udelay(100)

in various drivers where a busy-wait loop is neither beneficial nor
necessary, but msleep simply does not provide enough precision and people
are using a busy-wait loop instead.

*** CONCERNS FROM THE RFC ***

Why is udelay a problem / necessary? Most callers of udelay are in device/
driver initialization code, which is serial...

As I see it, there is only benefit to sleeping over a delay; the
notion of "refactoring" areas that use udelay was presented, but
I see usleep as the refactoring. Consider i2c, if the bus is busy,
you need to wait a bit (say 100us) before trying again, your
current options are:

* udelay(100)
* msleep(1) <-- As noted above, actually as high as ~20ms
on some platforms, so not really an option
* Manually set up an hrtimer to try again in 100us (which
is what usleep does anyway...)

People choose the udelay route because it is EASY; we need to
provide a better easy route.

Device / driver / boot code is *currently* serial, but every few
months someone makes noise about parallelizing boot, and IMHO, a
little forward-thinking now is one less thing to worry about
if/when that ever happens

udelay's could be preempted

Sure, but if udelay plans on looping 1000 times, and it gets
preempted on loop 200, whenever it's scheduled again, it is
going to do the next 800 loops.

Is the interruptible case needed?

Probably not, but I see usleep as a very logical parallel to msleep,
so it made sense to include the "full" API. Processors are getting
faster (albeit not as quickly as they are becoming more parallel),
so if someone wanted to be interruptible for a few usecs, why not
let them? If this is a contentious point, I'm happy to remove it.

*** OTHER THOUGHTS ***

I believe there is also value in exposing the usleep_range option; it gives
the scheduler a lot more flexibility and allows the programmer to express
his intent much more clearly; it's something I would hope future driver
writers will take advantage of.

To get the results in the NUMBERS section below, I literally s/udelay/usleep
the kernel tree; I had to go in and undo the changes to the USB drivers, but
everything else booted successfully; I find that extremely telling in and
of itself -- many people are using a delay API where a sleep will suit them
just fine.

*** SOME ATTEMPTS AT NUMBERS ***

It turns out that calculating quantifiable benefit on this is challenging,
so instead I will simply present the current state of things, and I hope
this to be sufficient:

How many udelay calls are there in 2.6.35-rc5?

udealy(ARG) >= | COUNT
1000 | 319
500 | 414
100 | 1146
20 | 1832

I am working on Android, so that is my focus for this. The following table
is a modified usleep that simply printk's the amount of time requested to
sleep; these tests were run on a kernel with udelay >= 20 --> usleep

"boot" is power-on to lock screen
"power collapse" is when the power button is pushed and the device suspends
"resume" is when the power button is pushed and the lock screen is displayed
(no touchscreen events or anything, just turning on the display)
"use device" is from the unlock swipe to clicking around a bit; there is no
sd card in this phone, so fail loading music, video, camera

ACTION | TOTAL NUMBER OF USLEEP CALLS | NET TIME (us)
boot | 22 | 1250
power-collapse | 9 | 1200
resume | 5 | 500
use device | 59 | 7700

The most interesting category to me is the "use device" field; 7700us of
busy-wait time that could be put towards better responsiveness, or at the
least less power usage.

Signed-off-by: Patrick Pannuto <[email protected]>
---
include/linux/delay.h | 6 ++++++
kernel/timer.c | 22 ++++++++++++++++++++++
2 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/include/linux/delay.h b/include/linux/delay.h
index fd832c6..2538c95 100644
--- a/include/linux/delay.h
+++ b/include/linux/delay.h
@@ -45,6 +45,12 @@ extern unsigned long lpj_fine;
void calibrate_delay(void);
void msleep(unsigned int msecs);
unsigned long msleep_interruptible(unsigned int msecs);
+void usleep_range(unsigned long min, unsigned long max);
+
+static inline void usleep(unsigned long usecs)
+{
+ usleep_range(usecs, usecs * 2);
+}

static inline void ssleep(unsigned int seconds)
{
diff --git a/kernel/timer.c b/kernel/timer.c
index ee305c8..c2253dd 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1750,3 +1750,25 @@ unsigned long msleep_interruptible(unsigned int msecs)
}

EXPORT_SYMBOL(msleep_interruptible);
+
+static int __sched do_usleep_range(unsigned long min, unsigned long max)
+{
+ ktime_t kmin;
+ unsigned long delta;
+
+ kmin = ktime_set(0, min * NSEC_PER_USEC);
+ delta = (max - min) * NSEC_PER_USEC;
+ return schedule_hrtimeout_range(&kmin, delta, HRTIMER_MODE_REL);
+}
+
+/**
+ * usleep_range - Drop in replacement for udelay where wakeup is flexible
+ * @min: Minimum time in usecs to sleep
+ * @max: Maximum time in usecs to sleep
+ */
+void usleep_range(unsigned long min, unsigned long max)
+{
+ __set_current_state(TASK_UNINTERRUPTIBLE);
+ do_usleep_range(min, max);
+}
+EXPORT_SYMBOL(usleep_range);
--
1.7.2

2010-07-28 19:34:01

by Patrick Pannuto

[permalink] [raw]
Subject: [PATCH 2/4] Documentation: Add timers/timers-howto.txt

This file seeks to explain the nuances in various delays;
many driver writers are not necessarily familiar with the
various kernel timers, their shortfalls, and quirks. When
faced with

ndelay, udelay, mdelay, usleep, usleep_range, msleep, and
msleep_interrubtible

the question "How do I just wait 1 ms for my hardware to
latch?" has the non-intuitive "best" answer:
usleep(1000)

This patch is followed by a series of checkpatch additions
that seek to help kernel hackers pick the best delay.

Signed-off-by: Patrick Pannuto <[email protected]>
---
Documentation/timers/timers-howto.txt | 106 +++++++++++++++++++++++++++++++++
1 files changed, 106 insertions(+), 0 deletions(-)
create mode 100644 Documentation/timers/timers-howto.txt

diff --git a/Documentation/timers/timers-howto.txt b/Documentation/timers/timers-howto.txt
new file mode 100644
index 0000000..ea10a3a
--- /dev/null
+++ b/Documentation/timers/timers-howto.txt
@@ -0,0 +1,106 @@
+delays - Information on the various kernel delay / sleep mechanisms
+-------------------------------------------------------------------
+
+This document seeks to answer the common question: "What is the
+RightWay (TM) to insert a delay?"
+
+This question is most often faced by driver writers who have to
+deal with hardware delays and who may not be the most intimately
+familiar with the inner workings of the Linux Kernel.
+
+
+Inserting Delays
+----------------
+
+The first, and most important, question you need to ask is "Is my
+code in an atomic context?" This should be followed closely by "Does
+it really need to delay in atomic context?" If so...
+
+ATOMIC CONTEXT:
+ You must use the *delay family of functions. These
+ functions use the jiffie estimation of clock speed
+ and will busy wait for enough loop cycles to achieve
+ the desired delay:
+
+ ndelay(unsigned long nsecs)
+ udelay(unsigned long usecs)
+ mdelay(unsgined long msecs)
+
+ udelay is the generally preferred API; ndelay-level
+ precision may not actually exist on many non-PC devices.
+
+ mdelay is macro wrapper around udelay, to account for
+ possible overflow when passing large arguments to udelay.
+ In general, use of mdelay is discouraged.
+
+NON-ATOMIC CONTEXT:
+ You should use the *sleep[_range] family of functions.
+ There are a few more options here, while any of them may
+ work correctly, using the "right" sleep function will
+ help the scheduler, power management, and just make your
+ driver better :)
+
+ -- Backed by busy-wait loop:
+ udelay(unsigned long usecs)
+ -- Backed by hrtimers:
+ usleep(unsigned long usecs)
+ usleep_range(unsigned long min, unsigned long max)
+ -- Backed by jiffies / legacy_timers
+ msleep(unsigned long msecs)
+ msleep_interruptible(unsigned long msecs)
+
+ Unlike the *delay family, the underlying mechanism
+ driving each of these calls varies, thus there are
+ quirks you should be aware of.
+
+
+ SLEEPING FOR "A FEW" USECS ( < ~10us? ):
+ * Use udelay
+
+ - Why not usleep?
+ On slower systems, (embedded, OR perhaps a speed-
+ stepped PC!) the overhead of setting up the hrtimers
+ for usleep *may* not be worth it. Such an evaluation
+ will obviously depend on your specific situation, but
+ it is something to be aware of.
+
+ SLEEPING FOR ~USECS OR SMALL MSECS ( 10us - 20ms):
+ * Use usleep[_range]
+
+ - Why not msleep for (1ms - 20ms)?
+ Explained originally here:
+ http://lkml.org/lkml/2007/8/3/250
+ msleep(1~20) may not do what the caller intends, and
+ will often sleep longer (~20 ms actual sleep for any
+ value given in the 1~20ms range). In many cases this
+ is not the desired behavior.
+
+ - What is the difference between usleep and usleep_range?
+ Fundamentally nothing; usleep(unsigned long usecs)
+ simply calls usleep_range(usecs, usecs * 2). This
+ built-in slack is similar in spirit to the slack
+ now built-in to legacy timers (0.4%) - and thus
+ built-in to msleep -, although it is significantly
+ larger.
+
+ The reasoning for slack is that usleep is built upon
+ hrtimers, and mindless callers of usleep will actually
+ introduce many undesired interrupts. By allowing a
+ range, wakeups can be coalesced. Given the scale, a
+ 200% slack seems a reasonable default value.
+
+ If your application can afford it, use usleep_range
+ to extend the slack to an even larger window; or, if
+ you need a more precise sleep (?), you can use
+ usleep_range to restrict the slack to a tighter bound.
+
+ SLEEPING FOR LARGER MSECS ( 10ms+ )
+ * Use msleep or possibly msleep_interruptible
+
+ - What's the difference?
+ msleep sets the current task to TASK_UNINTERRUPTIBLE
+ whereas msleep_interruptible sets the current task to
+ TASK_INTERRUPTIBLE before scheduling the sleep. In
+ short, the difference is whether the sleep can be ended
+ early by a signal. In general, just use msleep unless
+ you know you have a need for the interruptible variant.
--
1.7.2

2010-07-28 19:33:57

by Patrick Pannuto

[permalink] [raw]
Subject: [PATCH 3/4] Checkpatch: prefer usleep over udelay

When possible, sleeping is (usually) better than delaying;
however, don't bother callers of udelay < 10us, as those
cases are generally not worth the switch to usleep

Signed-off-by: Patrick Pannuto <[email protected]>
---
scripts/checkpatch.pl | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index bd88f11..892ae62 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2570,6 +2570,14 @@ sub process {
}
}

+# prefer usleep over udelay
+ if (($line =~ /\budelay\s*\(\s*(\w+)\s*\)/ {
+ # ignore udelay's < 10, however
+ if (! (($1 =~ /(\d+)/) && ($1 < 10)) ) {
+ CHK("usleep is preferred over udelay; see Documentation/timers/delays.txt\n" . $line);
+ }
+ }
+
# warn about #ifdefs in C files
# if ($line =~ /^.\s*\#\s*if(|n)def/ && ($realfile =~ /\.c$/)) {
# print "#ifdef in C files should be avoided\n";
--
1.7.2

2010-07-28 20:24:18

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/4] timer: Added usleep[_range] timer

On Wed, 28 Jul 2010 12:33:04 -0700
Patrick Pannuto <[email protected]> wrote:

> diff --git a/include/linux/delay.h b/include/linux/delay.h
> index fd832c6..2538c95 100644
> --- a/include/linux/delay.h
> +++ b/include/linux/delay.h
> @@ -45,6 +45,12 @@ extern unsigned long lpj_fine;
> void calibrate_delay(void);
> void msleep(unsigned int msecs);
> unsigned long msleep_interruptible(unsigned int msecs);
> +void usleep_range(unsigned long min, unsigned long max);
> +
> +static inline void usleep(unsigned long usecs)
> +{
> + usleep_range(usecs, usecs * 2);
> +}
>
> static inline void ssleep(unsigned int seconds)
> {
> diff --git a/kernel/timer.c b/kernel/timer.c
> index ee305c8..c2253dd 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -1750,3 +1750,25 @@ unsigned long msleep_interruptible(unsigned int msecs)
> }
>
> EXPORT_SYMBOL(msleep_interruptible);
> +
> +static int __sched do_usleep_range(unsigned long min, unsigned long max)
> +{
> + ktime_t kmin;
> + unsigned long delta;
> +
> + kmin = ktime_set(0, min * NSEC_PER_USEC);
> + delta = (max - min) * NSEC_PER_USEC;
> + return schedule_hrtimeout_range(&kmin, delta, HRTIMER_MODE_REL);
> +}
> +
> +/**
> + * usleep_range - Drop in replacement for udelay where wakeup is flexible
> + * @min: Minimum time in usecs to sleep
> + * @max: Maximum time in usecs to sleep
> + */
> +void usleep_range(unsigned long min, unsigned long max)
> +{
> + __set_current_state(TASK_UNINTERRUPTIBLE);
> + do_usleep_range(min, max);
> +}
> +EXPORT_SYMBOL(usleep_range);

This is different from the patch I merged and I'm not seeing any
explanation for the change.

The implementation of usleep() looks odd. The longer we sleep, the
greater the possible inaccuracy. A code comment which explains the
thinking and which warns people about the implications is needed.

2010-07-28 20:24:41

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 3/4] Checkpatch: prefer usleep over udelay

On Wed, 28 Jul 2010 12:33:06 -0700
Patrick Pannuto <[email protected]> wrote:

> When possible, sleeping is (usually) better than delaying;
> however, don't bother callers of udelay < 10us, as those
> cases are generally not worth the switch to usleep
>
> Signed-off-by: Patrick Pannuto <[email protected]>
> ---
> scripts/checkpatch.pl | 8 ++++++++
> 1 files changed, 8 insertions(+), 0 deletions(-)
>
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index bd88f11..892ae62 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -2570,6 +2570,14 @@ sub process {
> }
> }
>
> +# prefer usleep over udelay
> + if (($line =~ /\budelay\s*\(\s*(\w+)\s*\)/ {
> + # ignore udelay's < 10, however
> + if (! (($1 =~ /(\d+)/) && ($1 < 10)) ) {
> + CHK("usleep is preferred over udelay; see Documentation/timers/delays.txt\n" . $line);
> + }
> + }

It'd be better to recommend usleep_range(), IMO. To make people aware
of what they're doing, and to think about it.

2010-07-28 20:25:34

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 4/4] Checkpatch: warn about unexpectedly long msleep's

On Wed, 28 Jul 2010 12:33:07 -0700
Patrick Pannuto <[email protected]> wrote:

> As explained in Documentation/timers/delays.txt, msleep's
> of < 20ms may sleep for as long as 20ms. Caller's of
> msleep(1) or msleep(2), etc are likely not to expect this
> quirky behavior - warn them.
>
> Signed-off-by: Patrick Pannuto <[email protected]>
> ---
> scripts/checkpatch.pl | 7 +++++++
> 1 files changed, 7 insertions(+), 0 deletions(-)
>
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 892ae62..0ca2ea5 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -2578,6 +2578,13 @@ sub process {
> }
> }
>
> +# warn about unexpectedly long msleep's
> + if ($line =~ /\bmsleep\s*\((\d+)\);/) {
> + if ($1 < 20) {
> + WARN("msleep < 20ms can sleep for up to 20ms; see Documentation/timers/delays.txt\n" . $line);

Should be timers-howto.txt?

2010-07-28 20:48:15

by Patrick Pannuto

[permalink] [raw]
Subject: Re: [PATCH 1/4] timer: Added usleep[_range] timer

> This is different from the patch I merged and I'm not seeing any
> explanation for the change.
>
> The implementation of usleep() looks odd. The longer we sleep, the
> greater the possible inaccuracy. A code comment which explains the
> thinking and which warns people about the implications is needed.

Yes it is different; the explanation was in the cover message. I should
probably include a copy of the explanation in the commit message as
well? It was becoming a very long commit message...

// FROM COVER MESSAGE:
This iteration is similar, with the notable difference that now
usleep has a "built-in slack" of 200%. This is analogous to msleep,
which has a built-in slack of 0.4% (since it relies on legacy timers,
which have a built-in slack of 0.4%). 200% slack is significantly
greater than 0.4%, but the scale of usleep is also significantly
different than that of msleep, and I believe 200% to be a sane
default.

It is my opinion that this interface will most often mirror what
developers actually intend - indeed some people who have begun
trying to use the API raised this point -, however, I would like
some input as it is possibly confusing that the API will "double
your sleep" by default.

The usleep_range API is still included, since it provides an
interface to override the "default slack" of 200% by providing
an explicit range, or to allow callers to specify an even larger
slack if possible.

The problem that was raised by a few people trying to use usleep here
was that the API as written was very awkward -- there was never really
a good reason to use "usleep" as it was written. The intention was
to make usleep a usable / sensible API; the obvious alternative I see
is to drop the usleep function entirely and only provide usleep_range -
which would probably fit well in your request for callers to think
about what they are doing, if providing a somewhat awkward API.

The complaint was something to the effect of:

"Well, I understand that I should probably give a range, but I have
no idea what a good range would be. I really just want it to sleep
for a little bit, but I probably shouldn't trigger an extra interrupt.
Given the limitations, what's the point of even having a usleep call
at all?"


Thoughts?

--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum

2010-07-28 20:48:37

by Patrick Pannuto

[permalink] [raw]
Subject: Re: [PATCH 4/4] Checkpatch: warn about unexpectedly long msleep's

>> +# warn about unexpectedly long msleep's
>> + if ($line =~ /\bmsleep\s*\((\d+)\);/) {
>> + if ($1 < 20) {
>> + WARN("msleep < 20ms can sleep for up to 20ms; see Documentation/timers/delays.txt\n" . $line);
>
> Should be timers-howto.txt?
>
>

Yes, will fix

--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum

2010-07-28 20:59:19

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/4] timer: Added usleep[_range] timer

On Wed, 28 Jul 2010 13:47:46 -0700
Patrick Pannuto <[email protected]> wrote:

> > This is different from the patch I merged and I'm not seeing any
> > explanation for the change.
> >
> > The implementation of usleep() looks odd. The longer we sleep, the
> > greater the possible inaccuracy. A code comment which explains the
> > thinking and which warns people about the implications is needed.

I wanna code comment!

> Yes it is different; the explanation was in the cover message. I should
> probably include a copy of the explanation in the commit message as
> well? It was becoming a very long commit message...
>
> // FROM COVER MESSAGE:
> This iteration is similar, with the notable difference that now
> usleep has a "built-in slack" of 200%. This is analogous to msleep,
> which has a built-in slack of 0.4% (since it relies on legacy timers,
> which have a built-in slack of 0.4%). 200% slack is significantly
> greater than 0.4%, but the scale of usleep is also significantly
> different than that of msleep, and I believe 200% to be a sane
> default.
>
> It is my opinion that this interface will most often mirror what
> developers actually intend - indeed some people who have begun
> trying to use the API raised this point -, however, I would like
> some input as it is possibly confusing that the API will "double
> your sleep" by default.
>
> The usleep_range API is still included, since it provides an
> interface to override the "default slack" of 200% by providing
> an explicit range, or to allow callers to specify an even larger
> slack if possible.
>
> The problem that was raised by a few people trying to use usleep here
> was that the API as written was very awkward -- there was never really
> a good reason to use "usleep" as it was written. The intention was
> to make usleep a usable / sensible API; the obvious alternative I see
> is to drop the usleep function entirely and only provide usleep_range -
> which would probably fit well in your request for callers to think
> about what they are doing, if providing a somewhat awkward API.
>
> The complaint was something to the effect of:
>
> "Well, I understand that I should probably give a range, but I have
> no idea what a good range would be. I really just want it to sleep
> for a little bit, but I probably shouldn't trigger an extra interrupt.
> Given the limitations, what's the point of even having a usleep call
> at all?"
>
>
> Thoughts?

My main concern is that someone will type usleep(50) and won't realise
that it goes and sleeps for 100 usecs and their code gets slow as a
result. This sort of thing takes *years* to discover and fix. If we'd
forced them to type usleep_range() instead, it would never have happened.



Another question: what is the typical overhead of a usleep()? IOW, at
what delay value does it make more sense to use udelay()? Another way
of asking that would be "how long does a usleep(1) take"? If it
reliably consumes 2us CPU time then we shouldn't do it.

But it's not just CPU time, is it? A smart udelay() should put the CPU
into a lower power state, so a udelay(3) might consume less energy than
a usleep(2), because the usleep() does much more work in schedule() and
friends?

2010-07-28 21:04:50

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH 1/4] timer: Added usleep[_range] timer

On 7/28/2010 1:58 PM, Andrew Morton wrote:
>
> My main concern is that someone will type usleep(50) and won't realise
> that it goes and sleeps for 100 usecs and their code gets slow as a
> result. This sort of thing takes *years* to discover and fix. If we'd
> forced them to type usleep_range() instead, it would never have happened.
>
>
>
> Another question: what is the typical overhead of a usleep()? IOW, at
> what delay value does it make more sense to use udelay()? Another way
> of asking that would be "how long does a usleep(1) take"? If it
> reliably consumes 2us CPU time then we shouldn't do it.
>
> But it's not just CPU time, is it? A smart udelay() should put the CPU
> into a lower power state, so a udelay(3) might consume less energy than
> a usleep(2), because the usleep() does much more work in schedule() and
> friends?
>

for very low values of udelay() you're likely right.... but we could and
should catch that inside usleep imo and fall back to a udelay
it'll likely be 10 usec or so where we'd cut off.

now there is no such thing as a "low power udelay", not really anyway....

but the opposite is true; the cpu idle code will effectively do the
equivalent of udelay() if you're asking for a very short delay, so
short that any power saving thing isn't giong to be worth it. ( +
hitting scheduler overhead

2010-07-28 21:05:41

by Patrick Pannuto

[permalink] [raw]
Subject: Re: [PATCH 1/4] timer: Added usleep[_range] timer

On 07/28/2010 01:58 PM, Andrew Morton wrote:
> On Wed, 28 Jul 2010 13:47:46 -0700
> Patrick Pannuto <[email protected]> wrote:
>
>>> This is different from the patch I merged and I'm not seeing any
>>> explanation for the change.
>>>
>>> The implementation of usleep() looks odd. The longer we sleep, the
>>> greater the possible inaccuracy. A code comment which explains the
>>> thinking and which warns people about the implications is needed.
>
> I wanna code comment!
>

I understand -- will do (if this even survives, which is unlikely)

> My main concern is that someone will type usleep(50) and won't realise
> that it goes and sleeps for 100 usecs and their code gets slow as a
> result. This sort of thing takes *years* to discover and fix. If we'd
> forced them to type usleep_range() instead, it would never have happened.

In that case, it would push me in the direction of only providing
usleep_range, and thus forcing people to think about it that way;
leave slack decisions to people who know what tolerances are acceptable.

--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum

2010-07-28 21:11:16

by Patrick Pannuto

[permalink] [raw]
Subject: Re: [PATCH 1/4] timer: Added usleep[_range] timer

On 07/28/2010 02:04 PM, Arjan van de Ven wrote:
> On 7/28/2010 1:58 PM, Andrew Morton wrote:
>>
>> My main concern is that someone will type usleep(50) and won't realise
>> that it goes and sleeps for 100 usecs and their code gets slow as a
>> result. This sort of thing takes *years* to discover and fix. If we'd
>> forced them to type usleep_range() instead, it would never have happened.
>>
>>
>>
>> Another question: what is the typical overhead of a usleep()? IOW, at
>> what delay value does it make more sense to use udelay()? Another way
>> of asking that would be "how long does a usleep(1) take"? If it
>> reliably consumes 2us CPU time then we shouldn't do it.
>>
>> But it's not just CPU time, is it? A smart udelay() should put the CPU
>> into a lower power state, so a udelay(3) might consume less energy than
>> a usleep(2), because the usleep() does much more work in schedule() and
>> friends?
>>
>
> for very low values of udelay() you're likely right.... but we could and
> should catch that inside usleep imo and fall back to a udelay
> it'll likely be 10 usec or so where we'd cut off.
>

You're saying:

usleep(usecs) {
if (usecs <= 10) /* or some other cutoff */
return udelay (usecs)
...
}

ish?

> now there is no such thing as a "low power udelay", not really anyway....
>
> but the opposite is true; the cpu idle code will effectively do the
> equivalent of udelay() if you're asking for a very short delay, so
> short that any power saving thing isn't giong to be worth it. ( +
> hitting scheduler overhead
>
>

I think the cpu idle code covers you in the case where there's nothing
else to do, but if schedule tries to let something else run and then
is almost immediately interrupted, this is probably a net loss / BadThing.

I have no idea what an appropriate cutoff for this would be, I found two dated
(2007) papers discussing the overhead of a context switch:

http://www.cs.rochester.edu/u/cli/research/switch.pdf
IBM eServer, dual 2.0GHz Pentium Xeon; 512 KB L2, cache line 128B
Linux 2.6.17, RHEL 9, gcc 3.2.2 (-O0)
3.8 us / context switch

http://delivery.acm.org/10.1145/1290000/1281703/a3-david.pdf
ARMv5, ARM926EJ-S on an OMAP1610 (set to 120MHz clock)
Linux 2.6.20-rc5-omap1
48 us / context switch

but nothing more recent on a quick search.


Any thoughts on how to determine an appropriate cutoff?

--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum

2010-07-28 21:23:10

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/4] timer: Added usleep[_range] timer

On Wed, 28 Jul 2010 14:04:47 -0700
Arjan van de Ven <[email protected]> wrote:

> On 7/28/2010 1:58 PM, Andrew Morton wrote:
> >
> > My main concern is that someone will type usleep(50) and won't realise
> > that it goes and sleeps for 100 usecs and their code gets slow as a
> > result. This sort of thing takes *years* to discover and fix. If we'd
> > forced them to type usleep_range() instead, it would never have happened.
> >
> >
> >
> > Another question: what is the typical overhead of a usleep()? IOW, at
> > what delay value does it make more sense to use udelay()? Another way
> > of asking that would be "how long does a usleep(1) take"? If it
> > reliably consumes 2us CPU time then we shouldn't do it.
> >
> > But it's not just CPU time, is it? A smart udelay() should put the CPU
> > into a lower power state, so a udelay(3) might consume less energy than
> > a usleep(2), because the usleep() does much more work in schedule() and
> > friends?
> >
>
> for very low values of udelay() you're likely right.... but we could and
> should catch that inside usleep imo and fall back to a udelay
> it'll likely be 10 usec or so where we'd cut off.
>
> now there is no such thing as a "low power udelay", not really anyway....

Yup. I can't find any arch which tries to do anything fancy.

x86's rep_nop() tries to save a bit of juice, doesn't it? Should we be
using that?

Because we use udelay() in many places - it wouldn't surprise me if
some people's machines were consuming significant amounts of
time/energy in there, if they have suitably broken hardware or drivers.

> but the opposite is true; the cpu idle code will effectively do the
> equivalent of udelay() if you're asking for a very short delay, so
> short that any power saving thing isn't giong to be worth it. ( +
> hitting scheduler overhead

hm, point.

2010-07-28 21:23:53

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/4] timer: Added usleep[_range] timer

On Wed, 28 Jul 2010 14:05:39 -0700
Patrick Pannuto <[email protected]> wrote:

> On 07/28/2010 01:58 PM, Andrew Morton wrote:
> > On Wed, 28 Jul 2010 13:47:46 -0700
> > Patrick Pannuto <[email protected]> wrote:
> >
> >>> This is different from the patch I merged and I'm not seeing any
> >>> explanation for the change.
> >>>
> >>> The implementation of usleep() looks odd. The longer we sleep, the
> >>> greater the possible inaccuracy. A code comment which explains the
> >>> thinking and which warns people about the implications is needed.
> >
> > I wanna code comment!
> >
>
> I understand -- will do (if this even survives, which is unlikely)
>
> > My main concern is that someone will type usleep(50) and won't realise
> > that it goes and sleeps for 100 usecs and their code gets slow as a
> > result. This sort of thing takes *years* to discover and fix. If we'd
> > forced them to type usleep_range() instead, it would never have happened.
>
> In that case, it would push me in the direction of only providing
> usleep_range, and thus forcing people to think about it that way;
> leave slack decisions to people who know what tolerances are acceptable.

Well, I _think_ that would be a good approach. I'm 45%/55% on that one
and would be interested in other opinions ;)

2010-07-28 21:25:42

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH 1/4] timer: Added usleep[_range] timer

On 7/28/2010 2:22 PM, Andrew Morton wrote:
> On Wed, 28 Jul 2010 14:04:47 -0700
> Arjan van de Ven<[email protected]> wrote:
>
>
>> On 7/28/2010 1:58 PM, Andrew Morton wrote:
>>
>>> My main concern is that someone will type usleep(50) and won't realise
>>> that it goes and sleeps for 100 usecs and their code gets slow as a
>>> result. This sort of thing takes *years* to discover and fix. If we'd
>>> forced them to type usleep_range() instead, it would never have happened.
>>>
>>>
>>>
>>> Another question: what is the typical overhead of a usleep()? IOW, at
>>> what delay value does it make more sense to use udelay()? Another way
>>> of asking that would be "how long does a usleep(1) take"? If it
>>> reliably consumes 2us CPU time then we shouldn't do it.
>>>
>>> But it's not just CPU time, is it? A smart udelay() should put the CPU
>>> into a lower power state, so a udelay(3) might consume less energy than
>>> a usleep(2), because the usleep() does much more work in schedule() and
>>> friends?
>>>
>>>
>> for very low values of udelay() you're likely right.... but we could and
>> should catch that inside usleep imo and fall back to a udelay
>> it'll likely be 10 usec or so where we'd cut off.
>>
>> now there is no such thing as a "low power udelay", not really anyway....
>>
> Yup. I can't find any arch which tries to do anything fancy.
>
> x86's rep_nop() tries to save a bit of juice, doesn't it? Should we be
> using that?
>

it doesn't save juice so much as it is a "yield to my hyperthreading
brother"
(there is some power saved as well potentially...)

afaik we already use this in udelay() (cpu_relax is rep_nop after all)

> Because we use udelay() in many places - it wouldn't surprise me if
> some people's machines were consuming significant amounts of
> time/energy in there, if they have suitably broken hardware or drivers.
>

the only real place I've seen it used (based on profiles) is in libata
in the intel piix sata driver
(the non-AHCI one)... and those are completly wrong, Alan Cox had
patches to fix it but those somehow went nowhere

2010-07-28 21:26:38

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH 1/4] timer: Added usleep[_range] timer

On 7/28/2010 2:23 PM, Andrew Morton wrote

, it would never have happened.

>> In that case, it would pus
>> h me in the direction of only providing
>> usleep_range, and thus forcing people to think about it that way;
>> leave slack decisions to people who know what tolerances are acceptable.
>>
> Well, I _think_ that would be a good approach. I'm 45%/55% on that one
> and would be interested in other opinions ;)
>
>

for new interfaces I'd really like a "range only" option... make people
think.
And passing in a 0 is easy.

2010-08-04 07:08:15

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] timer: Added usleep[_range] timer

On Wed 2010-07-28 12:33:03, Patrick Pannuto wrote:
> After writing both documentation and a checkpatch rule explaining
> why the usleep API should never be used, it occurred to me that
> perhaps such an API should never be added :) - at least not in its
> previous form.
>
> This iteration is similar, with the notable difference that now
> usleep has a "built-in slack" of 200%. This is analogous to msleep,
> which has a built-in slack of 0.4% (since it relies on legacy timers,
> which have a built-in slack of 0.4%). 200% slack is significantly
> greater than 0.4%, but the scale of usleep is also significantly
> different than that of msleep, and I believe 200% to be a sane
> default.

So, I do msleep(1 second) and it will delay for 3 seconds? Thats
excessive, and will be annoying/plain to see with just eyes. Better
select reasonable default (1%?) and let people who care switch to
msleep_range...
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html