2016-12-15 04:27:32

by Nicholas Mc Guire

[permalink] [raw]
Subject: [PATCH] drm/i915: use udelay for very short delays

Even on fast systems a 2 microsecond delay is most likely more efficient
as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
change this to a udelay(2).

Signed-off-by: Nicholas Mc Guire <[email protected]>
---

Problem found by coccinelle:

Patch was compile tested with: x86_64_defconfig (implies CONFIG_DRM_I915)

Patch is against 4.9.0 (localversion-next is next-20161214)

drivers/gpu/drm/i915/intel_dsi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_dsi.c b/drivers/gpu/drm/i915/intel_dsi.c
index 5b72c50..19fe86b 100644
--- a/drivers/gpu/drm/i915/intel_dsi.c
+++ b/drivers/gpu/drm/i915/intel_dsi.c
@@ -379,7 +379,7 @@ static void bxt_dsi_device_ready(struct intel_encoder *encoder)
val &= ~ULPS_STATE_MASK;
val |= (ULPS_STATE_ENTER | DEVICE_READY);
I915_WRITE(MIPI_DEVICE_READY(port), val);
- usleep_range(2, 3);
+ udelay(2);

/* 3. Exit ULPS */
val = I915_READ(MIPI_DEVICE_READY(port));
--
2.1.4


2016-12-15 09:09:00

by Jani Nikula

[permalink] [raw]
Subject: Re: [PATCH] drm/i915: use udelay for very short delays

On Thu, 15 Dec 2016, Nicholas Mc Guire <[email protected]> wrote:
> Even on fast systems a 2 microsecond delay is most likely more efficient
> as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
> change this to a udelay(2).

Similar concerns as in [1]. We don't need the accuracy of udelay() here,
so this boils down to which is the better use of CPU. We could probably
relax the max delay more if that was helpful. But I'm not immediately
sold on "is most likely more efficient" which sounds like a gut feeling.

I'm sorry it's not clear in my other reply that I do appreciate
addressing incorrect/silly use of usleep_range(); I'm just not (yet)
convinced udelay() is the answer.

BR,
Jani.


[1] http://lkml.kernel.org/r/[email protected]


>
> Signed-off-by: Nicholas Mc Guire <[email protected]>
> ---
>
> Problem found by coccinelle:
>
> Patch was compile tested with: x86_64_defconfig (implies CONFIG_DRM_I915)
>
> Patch is against 4.9.0 (localversion-next is next-20161214)
>
> drivers/gpu/drm/i915/intel_dsi.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_dsi.c b/drivers/gpu/drm/i915/intel_dsi.c
> index 5b72c50..19fe86b 100644
> --- a/drivers/gpu/drm/i915/intel_dsi.c
> +++ b/drivers/gpu/drm/i915/intel_dsi.c
> @@ -379,7 +379,7 @@ static void bxt_dsi_device_ready(struct intel_encoder *encoder)
> val &= ~ULPS_STATE_MASK;
> val |= (ULPS_STATE_ENTER | DEVICE_READY);
> I915_WRITE(MIPI_DEVICE_READY(port), val);
> - usleep_range(2, 3);
> + udelay(2);
>
> /* 3. Exit ULPS */
> val = I915_READ(MIPI_DEVICE_READY(port));

--
Jani Nikula, Intel Open Source Technology Center

2016-12-15 09:25:27

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH] drm/i915: use udelay for very short delays

On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
> On Thu, 15 Dec 2016, Nicholas Mc Guire <[email protected]> wrote:
> > Even on fast systems a 2 microsecond delay is most likely more efficient
> > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
> > change this to a udelay(2).
>
> Similar concerns as in [1]. We don't need the accuracy of udelay() here,
> so this boils down to which is the better use of CPU. We could probably
> relax the max delay more if that was helpful. But I'm not immediately
> sold on "is most likely more efficient" which sounds like a gut feeling.
>
> I'm sorry it's not clear in my other reply that I do appreciate
> addressing incorrect/silly use of usleep_range(); I'm just not (yet)
> convinced udelay() is the answer.

So one reason why we unconditionally use *sleep variants is the
might_sleep check. Because in the past people have used udelay and mdelay,
those delays had to be increased a lot because hw, and at the same time
someone added users of these functions to our irq helper, resulting in irq
handling times measures in multiple ms. That's not good.

So until someone can demonstrate that there's a real benefit (which let's
be honest, for modeset code, will never be the case) I very highly prefer
to use *sleep* functions. They prevent some silly things from happening by
accident. Micro-optimizing modeset code and hampering maitainability in
the process is not good.
-Daniel

>
> BR,
> Jani.
>
>
> [1] http://lkml.kernel.org/r/[email protected]
>
>
> >
> > Signed-off-by: Nicholas Mc Guire <[email protected]>
> > ---
> >
> > Problem found by coccinelle:
> >
> > Patch was compile tested with: x86_64_defconfig (implies CONFIG_DRM_I915)
> >
> > Patch is against 4.9.0 (localversion-next is next-20161214)
> >
> > drivers/gpu/drm/i915/intel_dsi.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_dsi.c b/drivers/gpu/drm/i915/intel_dsi.c
> > index 5b72c50..19fe86b 100644
> > --- a/drivers/gpu/drm/i915/intel_dsi.c
> > +++ b/drivers/gpu/drm/i915/intel_dsi.c
> > @@ -379,7 +379,7 @@ static void bxt_dsi_device_ready(struct intel_encoder *encoder)
> > val &= ~ULPS_STATE_MASK;
> > val |= (ULPS_STATE_ENTER | DEVICE_READY);
> > I915_WRITE(MIPI_DEVICE_READY(port), val);
> > - usleep_range(2, 3);
> > + udelay(2);
> >
> > /* 3. Exit ULPS */
> > val = I915_READ(MIPI_DEVICE_READY(port));
>
> --
> Jani Nikula, Intel Open Source Technology Center
> _______________________________________________
> Intel-gfx mailing list
> [email protected]
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2016-12-15 09:28:07

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH] drm/i915: use udelay for very short delays

On Thu, Dec 15, 2016 at 10:25:19AM +0100, Daniel Vetter wrote:
> On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
> > On Thu, 15 Dec 2016, Nicholas Mc Guire <[email protected]> wrote:
> > > Even on fast systems a 2 microsecond delay is most likely more efficient
> > > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
> > > change this to a udelay(2).
> >
> > Similar concerns as in [1]. We don't need the accuracy of udelay() here,
> > so this boils down to which is the better use of CPU. We could probably
> > relax the max delay more if that was helpful. But I'm not immediately
> > sold on "is most likely more efficient" which sounds like a gut feeling.
> >
> > I'm sorry it's not clear in my other reply that I do appreciate
> > addressing incorrect/silly use of usleep_range(); I'm just not (yet)
> > convinced udelay() is the answer.
>
> So one reason why we unconditionally use *sleep variants is the
> might_sleep check. Because in the past people have used udelay and mdelay,
> those delays had to be increased a lot because hw, and at the same time
> someone added users of these functions to our irq helper, resulting in irq
> handling times measures in multiple ms. That's not good.
>
> So until someone can demonstrate that there's a real benefit (which let's
> be honest, for modeset code, will never be the case) I very highly prefer
> to use *sleep* functions. They prevent some silly things from happening by
> accident. Micro-optimizing modeset code and hampering maitainability in
> the process is not good.

Also, the entire premise seems backwards: usleep_range is inefficient for
certain parameter ranges and better replaced with udelay. That makes
sense.

But why exactly do we not fix udelay_range then, but instead do a cocci
job crawling through all the thousands of callers? Just fix usleep(_range)
to use udelay for very small values (and still keep the might_sleep check
ofc) if that's more efficient!
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2016-12-15 09:29:12

by Nicholas Mc Guire

[permalink] [raw]
Subject: Re: [PATCH] drm/i915: use udelay for very short delays

On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
> On Thu, 15 Dec 2016, Nicholas Mc Guire <[email protected]> wrote:
> > Even on fast systems a 2 microsecond delay is most likely more efficient
> > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
> > change this to a udelay(2).
>
> Similar concerns as in [1]. We don't need the accuracy of udelay() here,
> so this boils down to which is the better use of CPU. We could probably
> relax the max delay more if that was helpful. But I'm not immediately
> sold on "is most likely more efficient" which sounds like a gut feeling.
>
> I'm sorry it's not clear in my other reply that I do appreciate
> addressing incorrect/silly use of usleep_range(); I'm just not (yet)
> convinced udelay() is the answer.

if the delay is not critical and all that is needed
is an assurance that it is greater than X us then
usleep_range is fine with a relaxed limit.
So from what you wrote my patch proposal is wrong - the
udelay() is not the way to got.
My intent is to get rid of very small usleep_range() cases
so if usleep_range(20,50) causes no issues with this driver
and does not induce any performance penalty then that would
be the way to go I think.

thx!
hofrat

2016-12-15 09:55:58

by Jani Nikula

[permalink] [raw]
Subject: Re: [PATCH] drm/i915: use udelay for very short delays

On Thu, 15 Dec 2016, Nicholas Mc Guire <[email protected]> wrote:
> On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
>> On Thu, 15 Dec 2016, Nicholas Mc Guire <[email protected]> wrote:
>> > Even on fast systems a 2 microsecond delay is most likely more efficient
>> > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
>> > change this to a udelay(2).
>>
>> Similar concerns as in [1]. We don't need the accuracy of udelay() here,
>> so this boils down to which is the better use of CPU. We could probably
>> relax the max delay more if that was helpful. But I'm not immediately
>> sold on "is most likely more efficient" which sounds like a gut feeling.
>>
>> I'm sorry it's not clear in my other reply that I do appreciate
>> addressing incorrect/silly use of usleep_range(); I'm just not (yet)
>> convinced udelay() is the answer.
>
> if the delay is not critical and all that is needed
> is an assurance that it is greater than X us then
> usleep_range is fine with a relaxed limit.
> So from what you wrote my patch proposal is wrong - the
> udelay() is not the way to got.
> My intent is to get rid of very small usleep_range() cases
> so if usleep_range(20,50) causes no issues with this driver
> and does not induce any performance penalty then that would
> be the way to go I think.

Okay, so I looked at the code, and I looked at our spec, and I looked at
the MIPI D-PHY spec, and I cried a little.

Long story short, I think usleep_range(10, 50) will be fine.


BR,
Jani.


--
Jani Nikula, Intel Open Source Technology Center

2016-12-15 10:12:21

by Ville Syrjälä

[permalink] [raw]
Subject: Re: [PATCH] drm/i915: use udelay for very short delays

On Thu, Dec 15, 2016 at 11:52:34AM +0200, Jani Nikula wrote:
> On Thu, 15 Dec 2016, Nicholas Mc Guire <[email protected]> wrote:
> > On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
> >> On Thu, 15 Dec 2016, Nicholas Mc Guire <[email protected]> wrote:
> >> > Even on fast systems a 2 microsecond delay is most likely more efficient
> >> > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
> >> > change this to a udelay(2).
> >>
> >> Similar concerns as in [1]. We don't need the accuracy of udelay() here,
> >> so this boils down to which is the better use of CPU. We could probably
> >> relax the max delay more if that was helpful. But I'm not immediately
> >> sold on "is most likely more efficient" which sounds like a gut feeling.
> >>
> >> I'm sorry it's not clear in my other reply that I do appreciate
> >> addressing incorrect/silly use of usleep_range(); I'm just not (yet)
> >> convinced udelay() is the answer.
> >
> > if the delay is not critical and all that is needed
> > is an assurance that it is greater than X us then
> > usleep_range is fine with a relaxed limit.
> > So from what you wrote my patch proposal is wrong - the
> > udelay() is not the way to got.
> > My intent is to get rid of very small usleep_range() cases
> > so if usleep_range(20,50) causes no issues with this driver
> > and does not induce any performance penalty then that would
> > be the way to go I think.
>
> Okay, so I looked at the code, and I looked at our spec, and I looked at
> the MIPI D-PHY spec, and I cried a little.
>
> Long story short, I think usleep_range(10, 50) will be fine.

Note that I really want to see a comment next to every delay like this
documenting the actual hardware requirement, if the delay used by the
code doesn't 100% match it.

--
Ville Syrj?l?
Intel OTC

2016-12-15 10:20:58

by Jani Nikula

[permalink] [raw]
Subject: Re: [PATCH] drm/i915: use udelay for very short delays

On Thu, 15 Dec 2016, Ville Syrjälä <[email protected]> wrote:
> On Thu, Dec 15, 2016 at 11:52:34AM +0200, Jani Nikula wrote:
>> On Thu, 15 Dec 2016, Nicholas Mc Guire <[email protected]> wrote:
>> > On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
>> >> On Thu, 15 Dec 2016, Nicholas Mc Guire <[email protected]> wrote:
>> >> > Even on fast systems a 2 microsecond delay is most likely more efficient
>> >> > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
>> >> > change this to a udelay(2).
>> >>
>> >> Similar concerns as in [1]. We don't need the accuracy of udelay() here,
>> >> so this boils down to which is the better use of CPU. We could probably
>> >> relax the max delay more if that was helpful. But I'm not immediately
>> >> sold on "is most likely more efficient" which sounds like a gut feeling.
>> >>
>> >> I'm sorry it's not clear in my other reply that I do appreciate
>> >> addressing incorrect/silly use of usleep_range(); I'm just not (yet)
>> >> convinced udelay() is the answer.
>> >
>> > if the delay is not critical and all that is needed
>> > is an assurance that it is greater than X us then
>> > usleep_range is fine with a relaxed limit.
>> > So from what you wrote my patch proposal is wrong - the
>> > udelay() is not the way to got.
>> > My intent is to get rid of very small usleep_range() cases
>> > so if usleep_range(20,50) causes no issues with this driver
>> > and does not induce any performance penalty then that would
>> > be the way to go I think.
>>
>> Okay, so I looked at the code, and I looked at our spec, and I looked at
>> the MIPI D-PHY spec, and I cried a little.
>>
>> Long story short, I think usleep_range(10, 50) will be fine.
>
> Note that I really want to see a comment next to every delay like this
> documenting the actual hardware requirement, if the delay used by the
> code doesn't 100% match it.

Our spec says, "Wait for 2us for ULPS to complete". That's a simplistic
view wrt D-PHY, and our code doesn't even match the spec. Hence the
tears. Want to propose a wording for the comment so we can apply this
change, without going for a full rewrite of the sequence?


BR,
Jani.

--
Jani Nikula, Intel Open Source Technology Center

2016-12-15 10:58:27

by Nicholas Mc Guire

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH] drm/i915: use udelay for very short delays

On Thu, Dec 15, 2016 at 10:27:59AM +0100, Daniel Vetter wrote:
> On Thu, Dec 15, 2016 at 10:25:19AM +0100, Daniel Vetter wrote:
> > On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
> > > On Thu, 15 Dec 2016, Nicholas Mc Guire <[email protected]> wrote:
> > > > Even on fast systems a 2 microsecond delay is most likely more efficient
> > > > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
> > > > change this to a udelay(2).
> > >
> > > Similar concerns as in [1]. We don't need the accuracy of udelay() here,
> > > so this boils down to which is the better use of CPU. We could probably
> > > relax the max delay more if that was helpful. But I'm not immediately
> > > sold on "is most likely more efficient" which sounds like a gut feeling.
> > >
> > > I'm sorry it's not clear in my other reply that I do appreciate
> > > addressing incorrect/silly use of usleep_range(); I'm just not (yet)
> > > convinced udelay() is the answer.
> >
> > So one reason why we unconditionally use *sleep variants is the
> > might_sleep check. Because in the past people have used udelay and mdelay,
> > those delays had to be increased a lot because hw, and at the same time
> > someone added users of these functions to our irq helper, resulting in irq
> > handling times measures in multiple ms. That's not good.
> >
> > So until someone can demonstrate that there's a real benefit (which let's
> > be honest, for modeset code, will never be the case) I very highly prefer
> > to use *sleep* functions. They prevent some silly things from happening by
> > accident. Micro-optimizing modeset code and hampering maitainability in
> > the process is not good.
>
> Also, the entire premise seems backwards: usleep_range is inefficient for
> certain parameter ranges and better replaced with udelay. That makes
> sense.
>
> But why exactly do we not fix udelay_range then, but instead do a cocci
> job crawling through all the thousands of callers? Just fix usleep(_range)
> to use udelay for very small values (and still keep the might_sleep check
> ofc) if that's more efficient!

its actually not thousands more like a few dozen:

usleep_range(min,max) in linux-stable 4.9.0

1648 calls total
1488 pass numeric values only (90.29%)
27 min below 10us (1.81%)
40 min above 10ms (2.68%)
min out of spec 4.50%
76 preprocessor constants (4.61%)
1 min below 10us (1.31%)
8 min above 10ms (10.52%)
min out of spec 11.84%
85 expressions (5.15%)
1(0) min below 10us (1.50%)*
6(2) min above 10ms (7.50%)*
min out of spec 9.0%
Errors:
23 where min==max (1.39%)
0 where max < min (0.00%)

Total:
Bugs: 6.48%-10.70%*
Crit: 3.09%-3.15%* (min < 10 and min==max and max < min)
Detectable by coccinelle:
Bugs: 74/103 (71.8%)
Crit: 50/52 (96.1%)

*based on estimats as runtime values not known.


there is no usleep() as noted in Documentation/timers/timers-howto.txt
- Why not usleep?
On slower systems, (embedded, OR perhaps a speed-
stepped PC!) the overhead of setting up the hrtimers
for usleep *may* not be worth it. Such an evaluation
will obviously depend on your specific situation, but
it is something to be aware of.

and it suggests to use different API for different ranges - sounds sane
to me and seems to cover the needs of drivers.

thx!
hofrat

2016-12-15 10:58:26

by Nicholas Mc Guire

[permalink] [raw]
Subject: Re: [PATCH] drm/i915: use udelay for very short delays

On Thu, Dec 15, 2016 at 12:20:01PM +0200, Jani Nikula wrote:
> On Thu, 15 Dec 2016, Ville Syrj?l? <[email protected]> wrote:
> > On Thu, Dec 15, 2016 at 11:52:34AM +0200, Jani Nikula wrote:
> >> On Thu, 15 Dec 2016, Nicholas Mc Guire <[email protected]> wrote:
> >> > On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
> >> >> On Thu, 15 Dec 2016, Nicholas Mc Guire <[email protected]> wrote:
> >> >> > Even on fast systems a 2 microsecond delay is most likely more efficient
> >> >> > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
> >> >> > change this to a udelay(2).
> >> >>
> >> >> Similar concerns as in [1]. We don't need the accuracy of udelay() here,
> >> >> so this boils down to which is the better use of CPU. We could probably
> >> >> relax the max delay more if that was helpful. But I'm not immediately
> >> >> sold on "is most likely more efficient" which sounds like a gut feeling.
> >> >>
> >> >> I'm sorry it's not clear in my other reply that I do appreciate
> >> >> addressing incorrect/silly use of usleep_range(); I'm just not (yet)
> >> >> convinced udelay() is the answer.
> >> >
> >> > if the delay is not critical and all that is needed
> >> > is an assurance that it is greater than X us then
> >> > usleep_range is fine with a relaxed limit.
> >> > So from what you wrote my patch proposal is wrong - the
> >> > udelay() is not the way to got.
> >> > My intent is to get rid of very small usleep_range() cases
> >> > so if usleep_range(20,50) causes no issues with this driver
> >> > and does not induce any performance penalty then that would
> >> > be the way to go I think.
> >>
> >> Okay, so I looked at the code, and I looked at our spec, and I looked at
> >> the MIPI D-PHY spec, and I cried a little.
> >>
> >> Long story short, I think usleep_range(10, 50) will be fine.
> >
> > Note that I really want to see a comment next to every delay like this
> > documenting the actual hardware requirement, if the delay used by the
> > code doesn't 100% match it.
>
> Our spec says, "Wait for 2us for ULPS to complete". That's a simplistic
> view wrt D-PHY, and our code doesn't even match the spec. Hence the
> tears. Want to propose a wording for the comment so we can apply this
> change, without going for a full rewrite of the sequence?
>
is that suitable or am I overdoing it ?

- usleep_range(2, 3);
+ /* delay for at least 2us - relaxed to 10-50 to allow
+ * hrtimer subsystem to optimize uncritical timer handling
+ */
+ usleep_range(10, 50);

thx!
hofrat

2016-12-15 11:39:17

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH] drm/i915: use udelay for very short delays

On Thu, Dec 15, 2016 at 11:51 AM, Nicholas Mc Guire <[email protected]> wrote:
> On Thu, Dec 15, 2016 at 10:27:59AM +0100, Daniel Vetter wrote:
>> On Thu, Dec 15, 2016 at 10:25:19AM +0100, Daniel Vetter wrote:
>> > On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
>> > > On Thu, 15 Dec 2016, Nicholas Mc Guire <[email protected]> wrote:
>> > > > Even on fast systems a 2 microsecond delay is most likely more efficient
>> > > > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
>> > > > change this to a udelay(2).
>> > >
>> > > Similar concerns as in [1]. We don't need the accuracy of udelay() here,
>> > > so this boils down to which is the better use of CPU. We could probably
>> > > relax the max delay more if that was helpful. But I'm not immediately
>> > > sold on "is most likely more efficient" which sounds like a gut feeling.
>> > >
>> > > I'm sorry it's not clear in my other reply that I do appreciate
>> > > addressing incorrect/silly use of usleep_range(); I'm just not (yet)
>> > > convinced udelay() is the answer.
>> >
>> > So one reason why we unconditionally use *sleep variants is the
>> > might_sleep check. Because in the past people have used udelay and mdelay,
>> > those delays had to be increased a lot because hw, and at the same time
>> > someone added users of these functions to our irq helper, resulting in irq
>> > handling times measures in multiple ms. That's not good.
>> >
>> > So until someone can demonstrate that there's a real benefit (which let's
>> > be honest, for modeset code, will never be the case) I very highly prefer
>> > to use *sleep* functions. They prevent some silly things from happening by
>> > accident. Micro-optimizing modeset code and hampering maitainability in
>> > the process is not good.
>>
>> Also, the entire premise seems backwards: usleep_range is inefficient for
>> certain parameter ranges and better replaced with udelay. That makes
>> sense.
>>
>> But why exactly do we not fix udelay_range then, but instead do a cocci
>> job crawling through all the thousands of callers? Just fix usleep(_range)
>> to use udelay for very small values (and still keep the might_sleep check
>> ofc) if that's more efficient!
>
> its actually not thousands more like a few dozen:

There's 1k + usleep* calls you need to audit and teach people how to
exactly use it.

> usleep_range(min,max) in linux-stable 4.9.0
>
> 1648 calls total
> 1488 pass numeric values only (90.29%)
> 27 min below 10us (1.81%)
> 40 min above 10ms (2.68%)
> min out of spec 4.50%
> 76 preprocessor constants (4.61%)
> 1 min below 10us (1.31%)
> 8 min above 10ms (10.52%)
> min out of spec 11.84%
> 85 expressions (5.15%)
> 1(0) min below 10us (1.50%)*
> 6(2) min above 10ms (7.50%)*
> min out of spec 9.0%
> Errors:
> 23 where min==max (1.39%)
> 0 where max < min (0.00%)
>
> Total:
> Bugs: 6.48%-10.70%*
> Crit: 3.09%-3.15%* (min < 10 and min==max and max < min)
> Detectable by coccinelle:
> Bugs: 74/103 (71.8%)
> Crit: 50/52 (96.1%)
>
> *based on estimats as runtime values not known.
>
>
> there is no usleep() as noted in Documentation/timers/timers-howto.txt
> - Why not usleep?
> On slower systems, (embedded, OR perhaps a speed-
> stepped PC!) the overhead of setting up the hrtimers
> for usleep *may* not be worth it. Such an evaluation
> will obviously depend on your specific situation, but
> it is something to be aware of.
>
> and it suggests to use different API for different ranges - sounds sane
> to me and seems to cover the needs of drivers.

git grep usleep seems to disagree, at least I see a bunch of usleep()
calls. And per Rusty's api usability scale the ultimate fucntion is
dwim(). It just feels to me like pushing such trade-off decisions to
drivers is bad design because a) the tradeoffs depend upon the cpu
you're running on b) driver writers are pretty good at getting such
tradeoffs wrong in general. Aiming for a more dwim()-like api for this
here makes sense, instead of an eternal fight to correct drivers that
get it wrong all the time.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2016-12-15 11:54:56

by Ville Syrjälä

[permalink] [raw]
Subject: Re: [PATCH] drm/i915: use udelay for very short delays

On Thu, Dec 15, 2016 at 10:34:00AM +0000, Nicholas Mc Guire wrote:
> On Thu, Dec 15, 2016 at 12:20:01PM +0200, Jani Nikula wrote:
> > On Thu, 15 Dec 2016, Ville Syrj?l? <[email protected]> wrote:
> > > On Thu, Dec 15, 2016 at 11:52:34AM +0200, Jani Nikula wrote:
> > >> On Thu, 15 Dec 2016, Nicholas Mc Guire <[email protected]> wrote:
> > >> > On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
> > >> >> On Thu, 15 Dec 2016, Nicholas Mc Guire <[email protected]> wrote:
> > >> >> > Even on fast systems a 2 microsecond delay is most likely more efficient
> > >> >> > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
> > >> >> > change this to a udelay(2).
> > >> >>
> > >> >> Similar concerns as in [1]. We don't need the accuracy of udelay() here,
> > >> >> so this boils down to which is the better use of CPU. We could probably
> > >> >> relax the max delay more if that was helpful. But I'm not immediately
> > >> >> sold on "is most likely more efficient" which sounds like a gut feeling.
> > >> >>
> > >> >> I'm sorry it's not clear in my other reply that I do appreciate
> > >> >> addressing incorrect/silly use of usleep_range(); I'm just not (yet)
> > >> >> convinced udelay() is the answer.
> > >> >
> > >> > if the delay is not critical and all that is needed
> > >> > is an assurance that it is greater than X us then
> > >> > usleep_range is fine with a relaxed limit.
> > >> > So from what you wrote my patch proposal is wrong - the
> > >> > udelay() is not the way to got.
> > >> > My intent is to get rid of very small usleep_range() cases
> > >> > so if usleep_range(20,50) causes no issues with this driver
> > >> > and does not induce any performance penalty then that would
> > >> > be the way to go I think.
> > >>
> > >> Okay, so I looked at the code, and I looked at our spec, and I looked at
> > >> the MIPI D-PHY spec, and I cried a little.
> > >>
> > >> Long story short, I think usleep_range(10, 50) will be fine.
> > >
> > > Note that I really want to see a comment next to every delay like this
> > > documenting the actual hardware requirement, if the delay used by the
> > > code doesn't 100% match it.
> >
> > Our spec says, "Wait for 2us for ULPS to complete". That's a simplistic
> > view wrt D-PHY, and our code doesn't even match the spec. Hence the
> > tears. Want to propose a wording for the comment so we can apply this
> > change, without going for a full rewrite of the sequence?
> >
> is that suitable or am I overdoing it ?
>
> - usleep_range(2, 3);
> + /* delay for at least 2us - relaxed to 10-50 to allow
> + * hrtimer subsystem to optimize uncritical timer handling
> + */

That's entirely too verbose IMO, and the reason for using usleep_range()
is pretty obvious without spelling it out.

All we really want to know is what the spec says is the minimum
acceptable delay.

> + usleep_range(10, 50);
>
> thx!
> hofrat

--
Ville Syrj?l?
Intel OTC

2016-12-15 12:00:55

by Jani Nikula

[permalink] [raw]
Subject: Re: [PATCH] drm/i915: use udelay for very short delays

On Thu, 15 Dec 2016, Nicholas Mc Guire <[email protected]> wrote:
> On Thu, Dec 15, 2016 at 12:20:01PM +0200, Jani Nikula wrote:
>> On Thu, 15 Dec 2016, Ville Syrjälä <[email protected]> wrote:
>> > On Thu, Dec 15, 2016 at 11:52:34AM +0200, Jani Nikula wrote:
>> >> On Thu, 15 Dec 2016, Nicholas Mc Guire <[email protected]> wrote:
>> >> > On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
>> >> >> On Thu, 15 Dec 2016, Nicholas Mc Guire <[email protected]> wrote:
>> >> >> > Even on fast systems a 2 microsecond delay is most likely more efficient
>> >> >> > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
>> >> >> > change this to a udelay(2).
>> >> >>
>> >> >> Similar concerns as in [1]. We don't need the accuracy of udelay() here,
>> >> >> so this boils down to which is the better use of CPU. We could probably
>> >> >> relax the max delay more if that was helpful. But I'm not immediately
>> >> >> sold on "is most likely more efficient" which sounds like a gut feeling.
>> >> >>
>> >> >> I'm sorry it's not clear in my other reply that I do appreciate
>> >> >> addressing incorrect/silly use of usleep_range(); I'm just not (yet)
>> >> >> convinced udelay() is the answer.
>> >> >
>> >> > if the delay is not critical and all that is needed
>> >> > is an assurance that it is greater than X us then
>> >> > usleep_range is fine with a relaxed limit.
>> >> > So from what you wrote my patch proposal is wrong - the
>> >> > udelay() is not the way to got.
>> >> > My intent is to get rid of very small usleep_range() cases
>> >> > so if usleep_range(20,50) causes no issues with this driver
>> >> > and does not induce any performance penalty then that would
>> >> > be the way to go I think.
>> >>
>> >> Okay, so I looked at the code, and I looked at our spec, and I looked at
>> >> the MIPI D-PHY spec, and I cried a little.
>> >>
>> >> Long story short, I think usleep_range(10, 50) will be fine.
>> >
>> > Note that I really want to see a comment next to every delay like this
>> > documenting the actual hardware requirement, if the delay used by the
>> > code doesn't 100% match it.
>>
>> Our spec says, "Wait for 2us for ULPS to complete". That's a simplistic
>> view wrt D-PHY, and our code doesn't even match the spec. Hence the
>> tears. Want to propose a wording for the comment so we can apply this
>> change, without going for a full rewrite of the sequence?
>>
> is that suitable or am I overdoing it ?
>
> - usleep_range(2, 3);
> + /* delay for at least 2us - relaxed to 10-50 to allow
> + * hrtimer subsystem to optimize uncritical timer handling
> + */
> + usleep_range(10, 50);

I'm fine with that. Or maybe just make it "relaxed to allow" without the
values.

Jani.


>
> thx!
> hofrat

--
Jani Nikula, Intel Open Source Technology Center