LinuxLists.cc - [PATCH] msleep() with hrtimers

[permalink] [raw]

Subject: Re: [PATCH] msleep() with hrtimers

Hi,

On Fri, 3 Aug 2007, Jonathan Corbet wrote:

> Most comments last time were favorable. The one dissenter was Roman,
> who worries about the overhead of using hrtimers for this operation; my
> understanding is that he would rather see a really_msleep() function for
> those who actually want millisecond resolution. I'm not sure how to
> characterize what the cost could be, but it can only be buried by the
> fact that every call sleeps for some number of milliseconds. On my
> system, the several hundred total msleep() calls can't cause any real
> overhead, and almost all happen at initialization time.

The main point is still that these are two _different_ APIs for different
usages, so I still prefer to add a hrsleep() instead.

bye, Roman

2007-08-03 19:48:08

[permalink] [raw]

Subject: Re: [PATCH] msleep() with hrtimers

On Fri, 2007-08-03 at 21:19 +0200, Roman Zippel wrote:
> Hi,
>
> On Fri, 3 Aug 2007, Jonathan Corbet wrote:
>
> > Most comments last time were favorable. The one dissenter was Roman,
> > who worries about the overhead of using hrtimers for this operation; my
> > understanding is that he would rather see a really_msleep() function for
> > those who actually want millisecond resolution. I'm not sure how to
> > characterize what the cost could be, but it can only be buried by the
> > fact that every call sleeps for some number of milliseconds. On my
> > system, the several hundred total msleep() calls can't cause any real
> > overhead, and almost all happen at initialization time.
>
> The main point is still that these are two _different_ APIs for different
> usages, so I still prefer to add a hrsleep() instead.

I would actually prefer it the other way around; call the
not-so-accurate one "msleep_approx()" or somesuch, to make it explicit
that the sleep is only approximate...

2007-08-03 19:58:28

[permalink] [raw]

Subject: Re: [PATCH] msleep() with hrtimers

Hi,

On Fri, 3 Aug 2007, Arjan van de Ven wrote:

> On Fri, 2007-08-03 at 21:19 +0200, Roman Zippel wrote:
> > Hi,
> >
> > On Fri, 3 Aug 2007, Jonathan Corbet wrote:
> >
> > > Most comments last time were favorable. The one dissenter was Roman,
> > > who worries about the overhead of using hrtimers for this operation; my
> > > understanding is that he would rather see a really_msleep() function for
> > > those who actually want millisecond resolution. I'm not sure how to
> > > characterize what the cost could be, but it can only be buried by the
> > > fact that every call sleeps for some number of milliseconds. On my
> > > system, the several hundred total msleep() calls can't cause any real
> > > overhead, and almost all happen at initialization time.
> >
> > The main point is still that these are two _different_ APIs for different
> > usages, so I still prefer to add a hrsleep() instead.
>
>
> I would actually prefer it the other way around; call the
> not-so-accurate one "msleep_approx()" or somesuch, to make it explicit
> that the sleep is only approximate...

Actually the hrsleep() function would allow for submillisecond sleeps,
which might be what some of the 450 users really want and they only use
msleep(1) because it's the next best thing.
A hrsleep() function is really what makes most sense from an API
perspective.

bye, Roman

2007-08-03 23:55:15

[permalink] [raw]

Subject: Re: [PATCH] msleep() with hrtimers

>
> Actually the hrsleep() function would allow for submillisecond sleeps,
> which might be what some of the 450 users really want and they only use
> msleep(1) because it's the next best thing.
> A hrsleep() function is really what makes most sense from an API
> perspective.

I respectfully disagree. The power of msleep is that the unit of sleep
time is in the name; so in your proposal it would be hr_msleep or
somesuch. I much rather do the opposite in that case; make the "short"
name be the best implementation of the requested behavior, and have
qualifiers for allowing exceptions to that... least surprise and all
that.
--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2007-08-04 03:00:16

[permalink] [raw]

Subject: Re: [PATCH] msleep() with hrtimers

Hi,

On Fri, 3 Aug 2007, Arjan van de Ven wrote:

> > Actually the hrsleep() function would allow for submillisecond sleeps,
> > which might be what some of the 450 users really want and they only use
> > msleep(1) because it's the next best thing.
> > A hrsleep() function is really what makes most sense from an API
> > perspective.
>
> I respectfully disagree. The power of msleep is that the unit of sleep
> time is in the name; so in your proposal it would be hr_msleep or
> somesuch. I much rather do the opposite in that case; make the "short"
> name be the best implementation of the requested behavior, and have
> qualifiers for allowing exceptions to that... least surprise and all
> that.

hr_msleep makes no sense. Why should we tie this interface to millisecond
resolution?
Your suggested msleep_approx makes not much sense to me either, since
neither interface guarantees anything and may "approximate" the sleep
(and if the user is surprised by that something else already went wrong).
If you don't like the hrsleep name, we can also call it nanosleep and so
match what we already do for userspace.

bye, Roman

2007-08-04 19:20:43

[permalink] [raw]

Subject: Re: [PATCH] msleep() with hrtimers

On Sat, 2007-08-04 at 05:00 +0200, Roman Zippel wrote:
> Hi,
>
> On Fri, 3 Aug 2007, Arjan van de Ven wrote:
>
> > > Actually the hrsleep() function would allow for submillisecond sleeps,
> > > which might be what some of the 450 users really want and they only use
> > > msleep(1) because it's the next best thing.
> > > A hrsleep() function is really what makes most sense from an API
> > > perspective.
> >
> > I respectfully disagree. The power of msleep is that the unit of sleep
> > time is in the name; so in your proposal it would be hr_msleep or
> > somesuch. I much rather do the opposite in that case; make the "short"
> > name be the best implementation of the requested behavior, and have
> > qualifiers for allowing exceptions to that... least surprise and all
> > that.
>
> hr_msleep makes no sense. Why should we tie this interface to millisecond
> resolution?

because a lot of parts of the kernel think and work in milliseconds,
it's logical and USEFUL to at least provide an interface that works on
milliseconds.

> Your suggested msleep_approx makes not much sense to me either, since
> neither interface guarantees anything and may "approximate" the sleep
> (and if the user is surprised by that something else already went wrong).

an interface should try to map to the implementation that provides the
best implementation quality of the requested thing in general. That's
the hrtimers based msleep(). For cases where you're ok to compromise on
this behavior, you can add to the api.. it's the logical thing to do to
get the least surprises and the best normal behavior.

> If you don't like the hrsleep name, we can also call it nanosleep and so
> match what we already do for userspace.

having a nanosleep *in addition* to msleep (or maybe nsleep() and
usleep() to have consistent naming) sounds reasonable to me.

Do you have something against hrtimer use in general? From your emails
on this msleep topic it sort of seems you do....
--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2007-08-06 00:09:14

[permalink] [raw]

Subject: Re: [PATCH] msleep() with hrtimers

Hi,

On Sat, 4 Aug 2007, Arjan van de Ven wrote:

> > hr_msleep makes no sense. Why should we tie this interface to millisecond
> > resolution?
>
> because a lot of parts of the kernel think and work in milliseconds,
> it's logical and USEFUL to at least provide an interface that works on
> milliseconds.

If the millisecond resolution is enough for these users, that means the
current msleep will work fine for them.

> > Your suggested msleep_approx makes not much sense to me either, since
> > neither interface guarantees anything and may "approximate" the sleep
> > (and if the user is surprised by that something else already went wrong).
>
> an interface should try to map to the implementation that provides the
> best implementation quality of the requested thing in general. That's
> the hrtimers based msleep().

This generalization is simply not true. First it requires the
HIGH_RES_TIMERS option to be enabled to really make a real difference.
Second a hrtimers based msleep has a higher setup cost, which can't be
completely ignored. "Best" is a subjective term here and can't be that
easily generalized to all current users.

> > If you don't like the hrsleep name, we can also call it nanosleep and so
> > match what we already do for userspace.
>
> having a nanosleep *in addition* to msleep (or maybe nsleep() and
> usleep() to have consistent naming) sounds reasonable to me.

We only need one sleep implementation of both and msleep is a fine name
for the current implementation - not only does it describe the unit, but
it also describe the best resolution one can expect from it.

> Do you have something against hrtimer use in general? From your emails
> on this msleep topic it sort of seems you do....

I can give the question back, what do you have against simple timers, that
you want to make them as awkward as possible to use?
hrtimer have a higher usage cost depending on the clock source, so simply
using them only because they are the new cool kid in town doesn't make
sense. It may not be that critical for a simple sleep implementation, but
that only means we should keep the API as simple as possible, that means
one low resolution, cheap msleep and one high resolution nanosleep is
enough. Why do you insist on making more complex than necessary?

bye, Roman

2007-08-06 00:44:29

[permalink] [raw]

Subject: Re: [PATCH] msleep() with hrtimers

> > because a lot of parts of the kernel think and work in milliseconds,
> > it's logical and USEFUL to at least provide an interface that works on
> > milliseconds.
>
> If the millisecond resolution is enough for these users, that means the
> current msleep will work fine for them.

except that you get a 20ms minimum, and 10ms increment.

> This generalization is simply not true. First it requires the
> HIGH_RES_TIMERS option to be enabled to really make a real difference.

so? you provide the best possible for the config options selected...

> > > If you don't like the hrsleep name, we can also call it nanosleep and so
> > > match what we already do for userspace.
> >
> > having a nanosleep *in addition* to msleep (or maybe nsleep() and
> > usleep() to have consistent naming) sounds reasonable to me.
>
> We only need one sleep implementation of both and msleep is a fine name
> for the current implementation - not only does it describe the unit, but
> it also describe the best resolution one can expect from it.

that's... combining 2 independent things into one. That's not a really
good idea.

> I can give the question back, what do you have against simple timers, that
> you want to make them as awkward as possible to use?

msleep() isn't about timers. The timer type used is an implementation
detail behind the interface!!!!

Timers are course resolution that is highly HZ-value dependent. For
cases where you want a finer resolution, the kernel now has a way to
provide that functionality... so why not use the quality of service this
provides..

> hrtimer have a higher usage cost depending on the clock source, so simply
> using them only because they are the new cool kid in town doesn't make
> sense.

no but they DO provide a much better quality of the api implementation;
instead of a 20ms timeout you get really close to what you asked for!

There have been drivers that did if (HZ<1000) mdelay(x); else msleep(x);
Yes that's horrible, but it's a clear sign that this matters.

> It may not be that critical for a simple sleep implementation, but
> that only means we should keep the API as simple as possible, that means
> one low resolution, cheap msleep and one high resolution nanosleep is
> enough. Why do you insist on making more complex than necessary?

ehm it was you who insisted on adding complexity to this; the initial
proposal was to just replace the msleep() implementation with one that
has a more gentle behavior (you ask for 1ms you get 1ms, not 20ms)

What really is your problem with that? "It may be more expensive on some
hardware?"

--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2007-08-06 01:03:37

[permalink] [raw]

Subject: Re: [PATCH] msleep() with hrtimers

Hi,

On Sun, 5 Aug 2007, Arjan van de Ven wrote:

> Timers are course resolution that is highly HZ-value dependent. For
> cases where you want a finer resolution, the kernel now has a way to
> provide that functionality... so why not use the quality of service this
> provides..

We're going in circles here. We have two different timer APIs for a
reason, only because hrtimer provide better resolution, doesn't
automatically make them the better generic timer.
There's no problem to provide a high resolution sleep, but there is also
no reason to mess with msleep, don't fix what ain't broken...

bye, Roman

2007-08-06 05:41:27

[permalink] [raw]

Subject: Re: [PATCH] msleep() with hrtimers

On Mon, 2007-08-06 at 03:03 +0200, Roman Zippel wrote:

> There's no problem to provide a high resolution sleep, but there is also
> no reason to mess with msleep, don't fix what ain't broken...

John Corbet provided the patch because he had a problem with the current
msleep... in that it didn't provide as good a common case as he
wanted... so I think your statement is wrong ;)

--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2007-08-06 10:03:19

[permalink] [raw]

Subject: Re: [PATCH] msleep() with hrtimers

Hi,

On Sun, 5 Aug 2007, Arjan van de Ven wrote:

> > There's no problem to provide a high resolution sleep, but there is also
> > no reason to mess with msleep, don't fix what ain't broken...
>
> John Corbet provided the patch because he had a problem with the current
> msleep... in that it didn't provide as good a common case as he
> wanted... so I think your statement is wrong ;)

Only under the assumptation, that msleep _must_ be "fixed" for all other
current users too.
Give users a choice to use msleep or nanosleep, how do you know what's
"best" for them?

bye, Roman

2007-08-06 10:20:26

[permalink] [raw]

Subject: Re: [PATCH] msleep() with hrtimers

On 8/6/07, Roman Zippel <[email protected]> wrote:
> Hi,
>
> On Sun, 5 Aug 2007, Arjan van de Ven wrote:
>
> > > There's no problem to provide a high resolution sleep, but there is also
> > > no reason to mess with msleep, don't fix what ain't broken...
> >
> > John Corbet provided the patch because he had a problem with the current
> > msleep... in that it didn't provide as good a common case as he
> > wanted... so I think your statement is wrong ;)
>
> Only under the assumptation, that msleep _must_ be "fixed" for all other
> current users too.
> Give users a choice to use msleep or nanosleep, how do you know what's
> "best" for them?
>

You mean to say, the granularity of msleep is in mS with a tolerance
of +/- "n" mS whereas nanosleep would have the tolerance in nS ?
(ignoring all the discussions about hrtimers and their internal
design)

I guess many people are/were confused on the aspect that, a msleep(1)
meant sleep for a 1mS and nothing more. Well, this would explain, some
of my hair raising incidents, if i understood you correctly.

Since it is such a confusion, maybe it needs to be documented some
place, that people don't fall into the same trap.

Manu

2007-08-06 15:55:25

[permalink] [raw]

Subject: Re: [PATCH] msleep() with hrtimers

On Mon, 2007-08-06 at 12:03 +0200, Roman Zippel wrote:
> Hi,
>
> On Sun, 5 Aug 2007, Arjan van de Ven wrote:
>
> > > There's no problem to provide a high resolution sleep, but there is also
> > > no reason to mess with msleep, don't fix what ain't broken...
> >
> > John Corbet provided the patch because he had a problem with the current
> > msleep... in that it didn't provide as good a common case as he
> > wanted... so I think your statement is wrong ;)
>
> Only under the assumptation, that msleep _must_ be "fixed" for all other
> current users too.
> Give users a choice to use msleep or nanosleep, how do you know what's
> "best" for them?

do you have any actual technical objections, or do you just hate
hrtimers in general?

I really don't see what you hate so much about making the msleep()
implementation provide a more precise (typical sleep time of 1msec
rather than 20msec) behavior than the current one. Trying to distract
that by proposing a very different API (working on a totally different
time unit, while a lot of kernel users are using miliseconds; don't get
me wrong, a usleep() and nsleep() might be useful if there's users that
want to sleep in such times) is just trying to distract the issue.

So, let me ask a direct question: What do you think is specifically
wrong about changing the msleep() implementation as is done here? The
behavior is clearly an improvement, so what is your objection on the
flipside?

--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2007-08-07 10:40:55

[permalink] [raw]

Subject: Re: [PATCH] msleep() with hrtimers

Hi Arjan,

On 8/6/07, Arjan van de Ven <[email protected]> wrote:
> On Mon, 2007-08-06 at 12:03 +0200, Roman Zippel wrote:
> > Hi,
> >
> > On Sun, 5 Aug 2007, Arjan van de Ven wrote:
> >
> > > > There's no problem to provide a high resolution sleep, but there is also
> > > > no reason to mess with msleep, don't fix what ain't broken...
> > >
> > > John Corbet provided the patch because he had a problem with the current
> > > msleep... in that it didn't provide as good a common case as he
> > > wanted... so I think your statement is wrong ;)
> >
> > Only under the assumptation, that msleep _must_ be "fixed" for all other
> > current users too.
> > Give users a choice to use msleep or nanosleep, how do you know what's
> > "best" for them?
>
> do you have any actual technical objections, or do you just hate
> hrtimers in general?
>
> I really don't see what you hate so much about making the msleep()
> implementation provide a more precise (typical sleep time of 1msec
> rather than 20msec) behavior than the current one. Trying to distract
> that by proposing a very different API (working on a totally different
> time unit, while a lot of kernel users are using miliseconds; don't get
> me wrong, a usleep() and nsleep() might be useful if there's users that
> want to sleep in such times) is just trying to distract the issue.
>
> So, let me ask a direct question: What do you think is specifically
> wrong about changing the msleep() implementation as is done here? The
> behavior is clearly an improvement, so what is your objection on the
> flipside?
>

I think there could be a flipside based on what Roman said, but it is
my guess only.
currently wherever msleep is used now it sleeps with an ambiguous amount.

Eventhough the ambiguous sleep period is not appreciable, fixing
msleep might have a bad side effect. ie, drivers which would be
sleeping for a specified period would sleep less, just as compared to
the more precise sleep.

I think, it would be hard to fix, such breakages. Maybe a newer sleep
method would be much better, such that it doesn't cause any breakage.

2007-08-07 12:45:04

[permalink] [raw]

Subject: Re: [PATCH] msleep() with hrtimers

Hi,

On Mon, 6 Aug 2007, Arjan van de Ven wrote:

> So, let me ask a direct question: What do you think is specifically
> wrong about changing the msleep() implementation as is done here? The
> behavior is clearly an improvement, so what is your objection on the
> flipside?

Again, we have two different timer APIs for a reason - as long as you
ignore this, we won't get much further...
As long as resolution is not an issue, simple timer are generally often
the better choice as they are cheaper. By extension it also makes sense to
offer a sleep based on simple timer.
hrtimer allows for finer resolution, so it makes more sense to offer a
hrtimer based sleep which is not limited to milliseconds, once we have
this sleep, you'll also get the better behaviour you're asking for, which
removes the urgent need having to "fix" msleep.

Contrary to what you think I don't hate hrtimer, they're quite useful, but
they have their use cases as simple timer have. What I want is that people
_think_ before start using them, as in general hrtimer come with a higher
base cost, so one should _think_ before thoughtlessly using them only
because they're the cool new thing. I don't care much about drivers, but
in generic code (which msleep is part of) I'm going to look closer and
I'll continue to ask, whether it really needs to use hrtimer.
In this case I simply see no reason to force hrtimer on msleep, if users
can as well use nanosleep to get the same behaviour.

bye, Roman

2007-08-07 19:40:46

[permalink] [raw]

Subject: Re: [PATCH] msleep() with hrtimers

On Fri, 03 Aug 2007 12:37:12 -0600
Jonathan Corbet <[email protected]> wrote:

>
> Here's the second (and probably final) posting of the msleep() with
> hrtimers patch. The problem being addressed here is that the current
> msleep() will stop for a minimum of two jiffies, meaning that, on a
> HZ=100 system, msleep(1) delays for for about 20ms. In a driver with
> one such delay for each of 150 or so register setting operations, the
> extra time adds up to a few seconds.
>
> This patch addresses the situation by using hrtimers. On tickless
> systems with working timers, msleep(1) now sleeps for 1ms, even with
> HZ=100.
>
> Most comments last time were favorable. The one dissenter was Roman,
> who worries about the overhead of using hrtimers for this operation; my
> understanding is that he would rather see a really_msleep() function for
> those who actually want millisecond resolution. I'm not sure how to
> characterize what the cost could be, but it can only be buried by the
> fact that every call sleeps for some number of milliseconds. On my
> system, the several hundred total msleep() calls can't cause any real
> overhead, and almost all happen at initialization time.

I'd be surprised if there was significant overhead - the maximum frequency
at which msleep() can be called is 1000Hz. We'd need an awful lot of
overhead for that to cause problems, surely?

<thinks he's missing something again>

2007-08-07 23:16:36

[permalink] [raw]

Subject: Re: [PATCH] msleep() with hrtimers

Hi,

On Tue, 7 Aug 2007, Andrew Morton wrote:

> I'd be surprised if there was significant overhead - the maximum frequency
> at which msleep() can be called is 1000Hz. We'd need an awful lot of
> overhead for that to cause problems, surely?
>
> <thinks he's missing something again>

_Anybody_ has yet to answer what's wrong with adding a nanosleep() and
using that instead.

bye, Roman

2007-08-07 23:31:18

[permalink] [raw]

Subject: Re: [PATCH] msleep() with hrtimers

On Wed, 8 Aug 2007 01:16:49 +0200 (CEST)
Roman Zippel <[email protected]> wrote:

> Hi,
>
> On Tue, 7 Aug 2007, Andrew Morton wrote:
>
> > I'd be surprised if there was significant overhead - the maximum frequency
> > at which msleep() can be called is 1000Hz. We'd need an awful lot of
> > overhead for that to cause problems, surely?
> >
> > <thinks he's missing something again>
>
> _Anybody_ has yet to answer what's wrong with adding a nanosleep() and
> using that instead.
>

You mean that the implementation could be simplified if msleep() were to
simply call do_nanosleep()?

That would work, although a bit of refactoring would be needed so that we
could implement the TASK_UNINTERRUPTIBLE msleep() that way.

2007-08-08 03:46:48

[permalink] [raw]

Subject: Re: [PATCH] msleep() with hrtimers

On Tue, 7 Aug 2007, Andrew Morton wrote:

> On Wed, 8 Aug 2007 01:16:49 +0200 (CEST)
> Roman Zippel <[email protected]> wrote:
>
> > Hi,
> >
> > On Tue, 7 Aug 2007, Andrew Morton wrote:
> >
> > > I'd be surprised if there was significant overhead - the maximum frequency
> > > at which msleep() can be called is 1000Hz. We'd need an awful lot of
> > > overhead for that to cause problems, surely?
> > >
> > > <thinks he's missing something again>
> >
> > _Anybody_ has yet to answer what's wrong with adding a nanosleep() and
> > using that instead.
> >
>
> You mean that the implementation could be simplified if msleep() were to
> simply call do_nanosleep()?

The current msleep is fine and doesn't need any "fixing".
Not all the world is i386, _please_ keep hrtimer usage where it's
required. Simple timer should be given preference unless the higher
resolution is really needed, which is not the case here.

> That would work, although a bit of refactoring would be needed so that we
> could implement the TASK_UNINTERRUPTIBLE msleep() that way.

The function is not that big, so below is a nanosleep implementation based
on Jonathan's patch. This will user give a choice, so there is no need to
force all users to use hrtimer for a simple sleep.
Please apply this patch instead.

bye, Roman

Signed-off-by: Roman Zippel <[email protected]>

---
include/linux/delay.h | 5 +++++
kernel/timer.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 54 insertions(+)

Index: linux-2.6/include/linux/delay.h
===================================================================
--- linux-2.6.orig/include/linux/delay.h
+++ linux-2.6/include/linux/delay.h
@@ -9,6 +9,7 @@

extern unsigned long loops_per_jiffy;

+#include <linux/ktime.h>
#include <asm/delay.h>

/*
@@ -36,6 +37,10 @@ extern unsigned long loops_per_jiffy;
#endif

void calibrate_delay(void);
+
+void nanosleep(ktime_t time);
+int nanosleep_interruptible(ktime_t *time);
+
void msleep(unsigned int msecs);
unsigned long msleep_interruptible(unsigned int msecs);

Index: linux-2.6/kernel/timer.c
===================================================================
--- linux-2.6.orig/kernel/timer.c
+++ linux-2.6/kernel/timer.c
@@ -1377,3 +1377,52 @@ unsigned long msleep_interruptible(unsig
}

EXPORT_SYMBOL(msleep_interruptible);
+
+static int do_nanosleep(ktime_t *time, int sigs)
+{
+ enum hrtimer_mode mode = HRTIMER_MODE_REL;
+ int state = sigs ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
+ struct hrtimer_sleeper sleeper;
+
+ hrtimer_init_sleeper(&sleeper, current);
+ hrtimer_init(&sleeper.timer, CLOCK_MONOTONIC, mode);
+ sleeper.timer.expires = *time;
+
+ do {
+ set_current_state(state);
+ hrtimer_start(&sleeper.timer, sleeper.timer.expires, mode);
+ if (sleeper.task)
+ schedule();
+ hrtimer_cancel(&sleeper.timer);
+ mode = HRTIMER_MODE_ABS;
+ if (!sleeper.task)
+ return 1;
+ } while (!sigs || !signal_pending(current));
+
+ *time = sleeper.timer.expires;
+ return 0;
+}
+
+/**
+ * nanosleep - sleep safely even with waitqueue interruptions
+ * @time: Time to sleep for
+ */
+void nanosleep(ktime_t time)
+{
+ do_nanosleep(&time, 0);
+}
+EXPORT_SYMBOL(nanosleep);
+
+/**
+ * nanosleep_interruptible - sleep waiting for signals
+ * @time: Time to sleep for
+ */
+int nanosleep_interruptible(ktime_t *time)
+{
+ if (do_nanosleep(time, 1))
+ return 1;
+
+ *time = ktime_sub(*time, ktime_get());
+ return 0;
+}
+EXPORT_SYMBOL(nanosleep_interruptible);

2007-08-08 04:15:45

[permalink] [raw]

Subject: Re: [PATCH] msleep() with hrtimers

On Wed, 8 Aug 2007 05:47:02 +0200 (CEST) Roman Zippel <[email protected]> wrote:

>
>
> On Tue, 7 Aug 2007, Andrew Morton wrote:
>
> > On Wed, 8 Aug 2007 01:16:49 +0200 (CEST)
> > Roman Zippel <[email protected]> wrote:
> >
> > > Hi,
> > >
> > > On Tue, 7 Aug 2007, Andrew Morton wrote:
> > >
> > > > I'd be surprised if there was significant overhead - the maximum frequency
> > > > at which msleep() can be called is 1000Hz. We'd need an awful lot of
> > > > overhead for that to cause problems, surely?
> > > >
> > > > <thinks he's missing something again>
> > >
> > > _Anybody_ has yet to answer what's wrong with adding a nanosleep() and
> > > using that instead.
> > >
> >
> > You mean that the implementation could be simplified if msleep() were to
> > simply call do_nanosleep()?
>
> The current msleep is fine and doesn't need any "fixing".
> Not all the world is i386, _please_ keep hrtimer usage where it's
> required. Simple timer should be given preference unless the higher
> resolution is really needed, which is not the case here.

Hang on. Having msleep(1) sleep for 20 milliseconds is really awful
behaviour. Possibly worse is the fact that with other configs, it will
delay for eight milliseconds. Or two. That's an order of magnitude of
unpredictability which can actually cause driver breakage.

Fixing that *bug* is a good thing. I see no reason why we should "keep
hrtimer usage where it is required"? The implementation details are hidden
from the caller.

It seems to be a sensible change to me and I fail to understand the
objection.

> so below is a nanosleep implementation based
> on Jonathan's patch. This will user give a choice, so there is no need to
> force all users to use hrtimer for a simple sleep.

But apart from needlessly fattening the kernel API, that leaves us in the
current situation where an unknown number of the msleep() callers actually
care that they are calling a function which by a huge margin fails to do
what they are asking it to do. It will take a long time to hunt down all
the problematic callsites and migrate them to nanosleep().

2007-08-08 04:17:17

[permalink] [raw]

Subject: Re: [PATCH] msleep() with hrtimers

On Tue, 2007-08-07 at 14:45 +0200, Roman Zippel wrote:
> Hi,
>
> On Mon, 6 Aug 2007, Arjan van de Ven wrote:
>
> > So, let me ask a direct question: What do you think is specifically
> > wrong about changing the msleep() implementation as is done here? The
> > behavior is clearly an improvement, so what is your objection on the
> > flipside?
>
> Again, we have two different timer APIs for a reason

yes because we have different usage patterns for timers. (exact/course
or expiring/not-typically-expiring; I know you have some other opinion
here than other people).

For this case it's relatively simple imo: The existing implementation
has a *typical* behavior which is 100% to 2000% worse than what the user
of the API asks for. And that is totally unneeded to be so crappy; it
can be much more exact easily as shown by this patch.

You keep claiming that hrtimers are so incredibly expensive; but for
msleep()... which is mostly called during driver init ... I really don't
buy that it's really expensive. We're not doing this a gazilion times
per second obviously...

--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2007-08-08 11:01:20

by Andi Kleen

[permalink] [raw]

Subject: Re: [PATCH] msleep() with hrtimers

Andrew Morton <[email protected]> writes:
>
> I'd be surprised if there was significant overhead - the maximum frequency
> at which msleep() can be called is 1000Hz. We'd need an awful lot of
> overhead for that to cause problems, surely?

The bigger risk is probably to break drivers that rely on msleep(1)
really being msleep( very long depending on HZ )

But the only way to find out is to try it.

-Andi

2007-08-08 11:09:23

[permalink] [raw]

Subject: Re: [PATCH] msleep() with hrtimers

On 08 Aug 2007 13:55:28 +0200, Andi Kleen <[email protected]> wrote:
> Andrew Morton <[email protected]> writes:
> >
> > I'd be surprised if there was significant overhead - the maximum frequency
> > at which msleep() can be called is 1000Hz. We'd need an awful lot of
> > overhead for that to cause problems, surely?
>
> The bigger risk is probably to break drivers that rely on msleep(1)
> really being msleep( very long depending on HZ )
>
> But the only way to find out is to try it.

Well, i am quite sure a lot of drivers will be broken. But well .. if
it is for the better, guess, never mind ...

Manu

2007-08-08 11:52:29

by Andi Kleen

[permalink] [raw]

Subject: Re: [PATCH] msleep() with hrtimers

On Wed, Aug 08, 2007 at 03:09:14PM +0400, Manu Abraham wrote:
> On 08 Aug 2007 13:55:28 +0200, Andi Kleen <[email protected]> wrote:
> > Andrew Morton <[email protected]> writes:
> > >
> > > I'd be surprised if there was significant overhead - the maximum frequency
> > > at which msleep() can be called is 1000Hz. We'd need an awful lot of
> > > overhead for that to cause problems, surely?
> >
> > The bigger risk is probably to break drivers that rely on msleep(1)
> > really being msleep( very long depending on HZ )
> >
> > But the only way to find out is to try it.
>
> Well, i am quite sure a lot of drivers will be broken. But well .. if

If you know of specific examples you should list them so that
they can be fixed proactively.

-Andi

2007-08-08 11:59:39

[permalink] [raw]

Subject: Re: [PATCH] msleep() with hrtimers

On 8/8/07, Andi Kleen <[email protected]> wrote:
> On Wed, Aug 08, 2007 at 03:09:14PM +0400, Manu Abraham wrote:
> > On 08 Aug 2007 13:55:28 +0200, Andi Kleen <[email protected]> wrote:
> > > Andrew Morton <[email protected]> writes:
> > > >
> > > > I'd be surprised if there was significant overhead - the maximum frequency
> > > > at which msleep() can be called is 1000Hz. We'd need an awful lot of
> > > > overhead for that to cause problems, surely?
> > >
> > > The bigger risk is probably to break drivers that rely on msleep(1)
> > > really being msleep( very long depending on HZ )
> > >
> > > But the only way to find out is to try it.
> >
> > Well, i am quite sure a lot of drivers will be broken. But well .. if
>
> If you know of specific examples you should list them so that
> they can be fixed proactively.

A bit hard to state, which all since quite some are RE'd. Well you can
hear the cries when it is broken. :)

Manu

2007-08-09 07:17:52

[permalink] [raw]

Subject: Re: [PATCH] msleep() with hrtimers

On Fri, 03 Aug 2007 12:37:12 -0600 Jonathan Corbet <[email protected]> wrote:

> Here's the second (and probably final) posting of the msleep() with
> hrtimers patch. The problem being addressed here is that the current
> msleep() will stop for a minimum of two jiffies, meaning that, on a
> HZ=100 system, msleep(1) delays for for about 20ms. In a driver with
> one such delay for each of 150 or so register setting operations, the
> extra time adds up to a few seconds.
>
> This patch addresses the situation by using hrtimers. On tickless
> systems with working timers, msleep(1) now sleeps for 1ms, even with
> HZ=100.
>
> Most comments last time were favorable. The one dissenter was Roman,
> who worries about the overhead of using hrtimers for this operation; my
> understanding is that he would rather see a really_msleep() function for
> those who actually want millisecond resolution. I'm not sure how to
> characterize what the cost could be, but it can only be buried by the
> fact that every call sleeps for some number of milliseconds. On my
> system, the several hundred total msleep() calls can't cause any real
> overhead, and almost all happen at initialization time.
>
> I still think it would be useful for msleep() to do what it says it does
> and not vastly oversleep with small arguments. A quick grep turns up
> 450 msleep(1) calls in the current mainline. Andrew, if you agree, can
> you drop this into -mm? If not, I guess I'll let it go.
>
> jon
>
> ---
>
> Use hrtimers so that msleep() sleeps for the requested time
>
> Current msleep() snoozes for at least two jiffies, causing msleep(1) to
> sleep for at least 20ms on HZ=100 systems. Using hrtimers allows
> msleep() to sleep for something much closer to the requested time.
>
> Signed-off-by: Jonathan Corbet <[email protected]>
>
> --- 2.6.23-rc1/kernel/timer.c.orig 2007-08-02 13:45:20.000000000 -0600
> +++ 2.6.23-rc1/kernel/timer.c 2007-08-03 12:34:48.000000000 -0600
> @@ -1349,18 +1349,43 @@ void __init init_timers(void)
> open_softirq(TIMER_SOFTIRQ, run_timer_softirq, NULL);
> }
>
> +
> +
> +
> +static void do_msleep(unsigned int msecs, struct hrtimer_sleeper *sleeper,
> + int sigs)
> +{
> + enum hrtimer_mode mode = HRTIMER_MODE_REL;
> + int state = sigs ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
> +
> + /*
> + * This is really just a reworked and simplified version
> + * of do_nanosleep().
> + */
> + hrtimer_init(&sleeper->timer, CLOCK_MONOTONIC, mode);
> + sleeper->timer.expires = ktime_set(0, msecs*NSEC_PER_MSEC);
> + hrtimer_init_sleeper(sleeper, current);
> +
> + do {
> + set_current_state(state);
> + hrtimer_start(&sleeper->timer, sleeper->timer.expires, mode);
> + if (sleeper->task)
> + schedule();
> + hrtimer_cancel(&sleeper->timer);
> + mode = HRTIMER_MODE_ABS;
> + } while (sleeper->task && !(sigs && signal_pending(current)));
> +}
> +
> /**
> * msleep - sleep safely even with waitqueue interruptions
> * @msecs: Time in milliseconds to sleep for
> */
> void msleep(unsigned int msecs)
> {
> - unsigned long timeout = msecs_to_jiffies(msecs) + 1;
> + struct hrtimer_sleeper sleeper;
>
> - while (timeout)
> - timeout = schedule_timeout_uninterruptible(timeout);
> + do_msleep(msecs, &sleeper, 0);
> }
> -
> EXPORT_SYMBOL(msleep);
>
> /**
> @@ -1369,11 +1394,15 @@ EXPORT_SYMBOL(msleep);
> */
> unsigned long msleep_interruptible(unsigned int msecs)
> {
> - unsigned long timeout = msecs_to_jiffies(msecs) + 1;
> + struct hrtimer_sleeper sleeper;
> + ktime_t left;
>
> - while (timeout && !signal_pending(current))
> - timeout = schedule_timeout_interruptible(timeout);
> - return jiffies_to_msecs(timeout);
> -}
> + do_msleep(msecs, &sleeper, 1);
>
> + if (!sleeper.task)
> + return 0;
> + left = ktime_sub(sleeper.timer.expires,
> + sleeper.timer.base->get_time());
> + return max(((long) ktime_to_ns(left))/NSEC_PER_MSEC, 1L);
> +}
> EXPORT_SYMBOL(msleep_interruptible);

This failed the Vaio test. I guess it triggered a USB bug.

With this patch applied, when I hotplug my wireless mouse, the little LED
on the mouse comes on for a second or so then goes out and no pointy clicky
for me.

It says:

[ 152.481522] usb 1-1: new low speed USB device using uhci_hcd and address 2

Without this patch applied, I get

[ 195.935445] usb 2-1: new low speed USB device using uhci_hcd and address 2
[ 196.116183] usb 2-1: configuration #1 chosen from 1 choice
[ 196.198362] input: Microsoft Microsoft Wireless Optical Mouse 1.00 as /class/input/input7
[ 196.223724] input: USB HID v1.11 Mouse [Microsoft Microsoft Wireless Optical Mouse 1.00] on usb-0000:00:1d.1-1
[ 196.224570] usb 2-1: new device found, idVendor=045e, idProduct=00e1
[ 196.224579] usb 2-1: new device strings: Mfr=1, Product=2, SerialNumber=0
[ 196.224585] usb 2-1: Product: Microsoft Wireless Optical Mouse 1.00
[ 196.224590] usb 2-1: Manufacturer: Microsoft

and lots of pointy clickiness.

I would assume that there is some msleep() in USB which is too short, and
the present wild rounding-up which msleep() does covered up the
incorrectly-chosen sleep duration.

I'm using HZ=250 (http://userweb.kernel.org/~akpm/config-sony.txt) and it
could well be that the mouse would fail just by going to HZ=1000, but I
didn't bother testing that.

Could one of the USB developers please suggest which msleep()(s) I should
start looking at?

Thanks.

2007-08-09 15:08:20

by Alan Stern

[permalink] [raw]

Subject: Re: [linux-usb-devel] [PATCH] msleep() with hrtimers

On Thu, 9 Aug 2007, Andrew Morton wrote:

> This failed the Vaio test. I guess it triggered a USB bug.
>
> With this patch applied, when I hotplug my wireless mouse, the little LED
> on the mouse comes on for a second or so then goes out and no pointy clicky
> for me.
>
> It says:
>
> [ 152.481522] usb 1-1: new low speed USB device using uhci_hcd and address 2
>
>
> Without this patch applied, I get
>
> [ 195.935445] usb 2-1: new low speed USB device using uhci_hcd and address 2
> [ 196.116183] usb 2-1: configuration #1 chosen from 1 choice
> [ 196.198362] input: Microsoft Microsoft Wireless Optical Mouse 1.00 as /class/input/input7
> [ 196.223724] input: USB HID v1.11 Mouse [Microsoft Microsoft Wireless Optical Mouse 1.00] on usb-0000:00:1d.1-1
> [ 196.224570] usb 2-1: new device found, idVendor=045e, idProduct=00e1
> [ 196.224579] usb 2-1: new device strings: Mfr=1, Product=2, SerialNumber=0
> [ 196.224585] usb 2-1: Product: Microsoft Wireless Optical Mouse 1.00
> [ 196.224590] usb 2-1: Manufacturer: Microsoft
>
> and lots of pointy clickiness.
>
> I would assume that there is some msleep() in USB which is too short, and
> the present wild rounding-up which msleep() does covered up the
> incorrectly-chosen sleep duration.
>
> I'm using HZ=250 (http://userweb.kernel.org/~akpm/config-sony.txt) and it
> could well be that the mouse would fail just by going to HZ=1000, but I
> didn't bother testing that.
>
> Could one of the USB developers please suggest which msleep()(s) I should
> start looking at?

It could be one of the calls in drivers/usb/core/hub.c, however almost
all of them are at least 10 ms already. There's a 0-ms msleep near the
start of hub_port_wait_reset() which might cause problems.

It will be easier to pin down the culprit if you turn on
CONFIG_USB_DEBUG.

Alan Stern

2007-08-09 19:31:40

by Denys Vlasenko

[permalink] [raw]

Subject: Re: [PATCH] msleep() with hrtimers

On 8/8/07, Arjan van de Ven <[email protected]> wrote:
> You keep claiming that hrtimers are so incredibly expensive; but for
> msleep()... which is mostly called during driver init ... I really don't
> buy that it's really expensive. We're not doing this a gazilion times
> per second obviously...

Yes. Optimizing delay or sleep functions for speed is a contradiction
of terms. IIRC we still optimize udelay for speed, not code size...
Read it again folks:

We optimize udelay for speed

How fast your udelay do you want to be today?

Oh well.
--
vda

2007-08-09 20:02:01

by Denys Vlasenko

[permalink] [raw]

Subject: Re: [PATCH] msleep() with hrtimers

On 8/9/07, Denis Vlasenko <[email protected]> wrote:
> On 8/8/07, Arjan van de Ven <[email protected]> wrote:
> > You keep claiming that hrtimers are so incredibly expensive; but for
> > msleep()... which is mostly called during driver init ... I really don't
> > buy that it's really expensive. We're not doing this a gazilion times
> > per second obviously...
>
> Yes. Optimizing delay or sleep functions for speed is a contradiction
> of terms. IIRC we still optimize udelay for speed, not code size...
> Read it again folks:
>
> We optimize udelay for speed
>
> How fast your udelay do you want to be today?

Just checked. i386 and x86-64 seems to be sane - udelay() and ndelay()
are not inlined.

Several arches are still frantically try to make udelay faster. Many have
the same comment:

/*
* Use only for very small delays ( < 1 msec). Should probably use a
* lookup table, really, as the multiplications take much too long with
* short delays. This is a "reasonable" implementation, though (and the
* first constant multiplications gets optimized away if the delay is
* a constant)
*/

and thus seem to be a cut-n-paste code.

BTW, almost all arched have __const_udelay(N) which obviously
does not delay for N usecs:

#define udelay(n) (__builtin_constant_p(n) ? \
((n) > 20000 ? __bad_udelay() : __const_udelay((n) * 0x10c7ul)) : \
__udelay(n))

Bad name.

Are patches which de-inline udelay and do s/__const_udelay/__const_delay/g
be accepted?

Arches with udelay's still inlined are below. mips is especially big.
frv has totally bogus ndelay().

include/asm-ppc/delay.h
extern __inline__ void __udelay(unsigned int x)
{
unsigned int loops;
__asm__("mulhwu %0,%1,%2" : "=r" (loops) :
"r" (x), "r" (loops_per_jiffy * 226));
__delay(loops);
}

include/asm-parisc/delay.h
static __inline__ void __udelay(unsigned long usecs) {
__cr16_delay(usecs * ((unsigned long)boot_cpu_data.cpu_hz / 1000000UL));
}

include/asm-mips/delay.h
static inline void __udelay(unsigned long usecs, unsigned long lpj)
{
unsigned long lo;

/*
* The rates of 128 is rounded wrongly by the catchall case
* for 64-bit. Excessive precission? Probably ...
*/
#if defined(CONFIG_64BIT) && (HZ == 128)
usecs *= 0x0008637bd05af6c7UL; /* 2**64 / (1000000 / HZ) */
#elif defined(CONFIG_64BIT)
usecs *= (0x8000000000000000UL / (500000 / HZ));
#else /* 32-bit junk follows here */
usecs *= (unsigned long) (((0x8000000000000000ULL / (500000 / HZ)) +
0x80000000ULL) >> 32);
#endif

if (sizeof(long) == 4)
__asm__("multu\t%2, %3"
: "=h" (usecs), "=l" (lo)
: "r" (usecs), "r" (lpj)
: GCC_REG_ACCUM);
else if (sizeof(long) == 8)
__asm__("dmultu\t%2, %3"
: "=h" (usecs), "=l" (lo)
: "r" (usecs), "r" (lpj)
: GCC_REG_ACCUM);

__delay(usecs);
}

include/asm-m68k/delay.h
static inline void __udelay(unsigned long usecs)
{
__const_udelay(usecs * 4295); /* 2**32 / 1000000 */
}

include/asm-h8300/delay.h
static inline void udelay(unsigned long usecs)
{
usecs *= 4295; /* 2**32 / 1000000 */
usecs /= (loops_per_jiffy*HZ);
if (usecs)
__delay(usecs);
}

include/asm-frv/delay.h
static inline void udelay(unsigned long usecs)
{
__delay(usecs * __delay_loops_MHz);
}
#define ndelay(n) udelay((n) * 5)

include/asm-xtensa/delay.h
static __inline__ void udelay (unsigned long usecs)
{
unsigned long start = xtensa_get_ccount();
unsigned long cycles = usecs * (loops_per_jiffy / (1000000UL / HZ));

/* Note: all variables are unsigned (can wrap around)! */
while (((unsigned long)xtensa_get_ccount()) - start < cycles)
;
}

include/asm-v850/delay.h
static inline void udelay(unsigned long usecs)
{
register unsigned long full_loops, part_loops;
full_loops = ((usecs * HZ) / 1000000) * loops_per_jiffy;
usecs %= (1000000 / HZ);
part_loops = (usecs * HZ * loops_per_jiffy) / 1000000;
__delay(full_loops + part_loops);
}

include/asm-cris/delay.h
static inline void udelay(unsigned long usecs)
{
__delay(usecs * loops_per_usec);
}

include/asm-blackfin/delay.h
static inline void udelay(unsigned long usecs)
{
extern unsigned long loops_per_jiffy;
__delay(usecs * loops_per_jiffy / (1000000 / HZ));
}
--
vda

2007-08-09 22:31:18

[permalink] [raw]

Subject: Re: [PATCH] msleep() with hrtimers

Hi,

On Tue, 7 Aug 2007, Andrew Morton wrote:

> > The current msleep is fine and doesn't need any "fixing".
> > Not all the world is i386, _please_ keep hrtimer usage where it's
> > required. Simple timer should be given preference unless the higher
> > resolution is really needed, which is not the case here.
>
> Hang on. Having msleep(1) sleep for 20 milliseconds is really awful
> behaviour. Possibly worse is the fact that with other configs, it will
> delay for eight milliseconds. Or two. That's an order of magnitude of
> unpredictability which can actually cause driver breakage.
>
> Fixing that *bug* is a good thing. I see no reason why we should "keep
> hrtimer usage where it is required"? The implementation details are hidden
> from the caller.

This is not a bug. You have to keep in mind that for hrtimer to make a
real difference HIGH_RES_TIMERS has to be enabled, OTOH if HZ is already
set to 1000, it doesn't make much difference.
The sleep was and will be only a minimum time, expecting something
different is actually a bug.

> > so below is a nanosleep implementation based
> > on Jonathan's patch. This will user give a choice, so there is no need to
> > force all users to use hrtimer for a simple sleep.
>
> But apart from needlessly fattening the kernel API, that leaves us in the
> current situation where an unknown number of the msleep() callers actually
> care that they are calling a function which by a huge margin fails to do
> what they are asking it to do. It will take a long time to hunt down all
> the problematic callsites and migrate them to nanosleep().

As I tried to say before this is foremost an API issue. Introducing
nanosleep() makes it clear that this user will benefit from enabling
HIGH_RES_TIMERS, whereas msleep() says that resolution is not that
important and thus it will work fine without HIGH_RES_TIMERS and/or
HZ_1000.

bye, Roman

2007-11-28 10:30:31