2012-02-23 17:02:01

by Stephan Bärwolf

[permalink] [raw]
Subject: responsiveness: newer kernels causing lagging and blocking

Under various conditions linux since 2.6.39-rc1 laggs and blocks enormously the whole system.
(For example while starting "winecfg" (on a thinkpad X220) and parallel moving the
mousecursor you can observe a periodic blocking for some seconds)

After bisecting a little while, commit "4819d2e4310796c4e9eef674499af9b9caf36b5a"
(" drm: Retry i2c transfer of EDID block after failure ") seems to be responsible.

Because function "drm_do_probe_ddc_edid" loops trying "i2c_transfer" it consumes a lot
of time during errors. Reverting or changing "retries" from 5 to 1 extremly minimizes the
problem to "not perceptible".
It seems the locking within "i2c_transfer" slows everything down.
So maybe it is possible to yield() before calling it?


regards Stephan


2012-02-23 17:12:41

by Linus Torvalds

[permalink] [raw]
Subject: Re: responsiveness: newer kernels causing lagging and blocking

On Thu, Feb 23, 2012 at 8:30 AM, Stephan B?rwolf
<[email protected]> wrote:
> Under various conditions linux since 2.6.39-rc1 laggs and blocks enormously the whole system.
> (For example while starting "winecfg" (on a thinkpad X220) and parallel moving the
> mousecursor you can observe a periodic blocking for some seconds)
>
> After bisecting a little while, commit "4819d2e4310796c4e9eef674499af9b9caf36b5a"
> (" drm: Retry i2c transfer of EDID block after failure ") seems to be responsible.
>
> Because function "drm_do_probe_ddc_edid" loops trying "i2c_transfer" it consumes a lot
> of time during errors. Reverting or changing "retries" from 5 to 1 extremly minimizes the
> problem to "not perceptible".
> It seems the locking within "i2c_transfer" slows everything down.
> So maybe it is possible to yield() before calling it?

Ugh. The whole i2c thing is a mess. Most of the i2c drivers seem to
busy-loop using 'usleep()' too, so not only do they take a long time,
they take a long time while using CPU and being unresponsive in
general. So it's not just locking, I suspect.

I'm not surprised that it gets to the point that you can notice the
unresponsiveness. At some point I made a bug-report about the i915
driver using 7% of CPU (yes, really) just doing i2c all the time on an
Apple Mac Mini, because it just couldn't get happy with the results,
and the i915 driver would re-start it every five seconds or something
like that.

Your case sounds even worse - at least on that Mac Mini it didn't
cause all that noticeable hickups (possibly because it was
multi-core).

And looking at the code - not only does drm_do_probe_ddc_edid() have a
retry loop, the *callers* sometimes call that thing from two loops
deep. There's the block count, and for some reason there's that "i =
0..3" retry loop around it in drm_do_get_edid() that seems to be
*another* retry loop.

So if I read it right, drm_do_get_edid() actually retries *20* times
for the base block, and then potentially does that for each block.

I wonder if that double retry is really intentional at all.

But yeah, if nothing else, let's *please* add a

if (need_resched()) schedule();

at the top of drm_do_probe_ddc_edid().

I'm assuming you don't have PREEMPT enabled? Does that hide the problem?

Linus

2012-02-23 17:23:03

by Chris Wilson

[permalink] [raw]
Subject: Re: responsiveness: newer kernels causing lagging and blocking

On Thu, 23 Feb 2012 17:30:41 +0100, Stephan Bärwolf <[email protected]> wrote:
> Under various conditions linux since 2.6.39-rc1 laggs and blocks enormously the whole system.
> (For example while starting "winecfg" (on a thinkpad X220) and parallel moving the
> mousecursor you can observe a periodic blocking for some seconds)
>
> After bisecting a little while, commit "4819d2e4310796c4e9eef674499af9b9caf36b5a"
> (" drm: Retry i2c transfer of EDID block after failure ") seems to be responsible.
>
> Because function "drm_do_probe_ddc_edid" loops trying "i2c_transfer" it consumes a lot
> of time during errors. Reverting or changing "retries" from 5 to 1 extremly minimizes the
> problem to "not perceptible".
> It seems the locking within "i2c_transfer" slows everything down.
> So maybe it is possible to yield() before calling it?

As you can tell the issue is that we repeatedly attempt to do an
expensive retrieval of the EDID across a bit-banging i2c bus and keep
doing so in spite of failure. Intel is especially obnoxious in this
regard as we attempt to probe every connector described by the VBT, and
every non-existent connector results in a busy-spin until the query
times out. There are at least two ways we can mitigate this, the first
is to use GMBUS which is not as processor intensive as GPIO and can
detect the non-existent controller much quicker. The GMBUS
implementation needs some refinement to be a proper i2c citizen before
we can enable it again. The second is to break the loop for fatal errors,
which is addressed by

commit 9292f37e1f5c79400254dca46f83313488093825
Author: Eugeni Dodonov <[email protected]>
Date: Thu Jan 5 09:34:28 2012 -0200

drm: give up on edid retries when i2c bus is not responding

This allows to avoid talking to a non-responding bus repeatedly until we
finally timeout after 15 attempts. We can do this by catching the -ENXIO
error, provided by i2c_algo_bit:bit_doAddress call.

Within the bit_doAddress we already try 3 times to get the edid data, so
if the routine tells us that bus is not responding, it is mostly pointless
to keep re-trying those attempts over and over again until we reach final
number of retries.

This change should fix https://bugs.freedesktop.org/show_bug.cgi?id=41059
and improve overall edid detection timing by 10-30% in most cases, and by
a much larger margin in case of phantom outputs (up to 30x in one worst
case).

Timing results for i915-powered machines for 'time xrandr' command:
Machine 1: from 0.840s to 0.290s
Machine 2: from 0.315s to 0.280s
Machine 3: from +/- 4s to 0.184s

Timing results for HD5770 with 'time xrandr' command:
Machine 4: from 3.210s to 1.060s

Reviewed-by: Chris Wilson <[email protected]>
Reviewed-by: Keith Packard <[email protected]>
Tested-by: Sean Finney <[email protected]>
Tested-by: Soren Hansen <[email protected]>
Tested-by: Hernando Torque <[email protected]>
Tested-by: Mike Lothian <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41059
Signed-off-by: Eugeni Dodonov <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>

in airlied/drm-next.

One mystery that has never been resolved is just why wine causes a flood
of xrandr/connection probes, and then there is still the underlying issue
that probing blocks the device and causes noticeable latency.
-Chris

--
Chris Wilson, Intel Open Source Technology Centre

2012-02-23 17:29:05

by Stephan Bärwolf

[permalink] [raw]
Subject: Re: responsiveness: newer kernels causing lagging and blocking


> I'm assuming you don't have PREEMPT enabled? Does that hide the problem?
>
> Linus
>

It is a PREEMPT kernel with 1000Hz (maximum) NOHZ (tickless).

regards Stephan

2012-02-23 17:29:30

by Linus Torvalds

[permalink] [raw]
Subject: Re: responsiveness: newer kernels causing lagging and blocking

On Thu, Feb 23, 2012 at 9:21 AM, Chris Wilson <[email protected]> wrote:
>
> One mystery that has never been resolved is just why wine causes a flood
> of xrandr/connection probes, and then there is still the underlying issue
> that probing blocks the device and causes noticeable latency.

Is there any chance that the EDID code would just have a "if I probed
in the last second or two, let's not do it again and instead just
return the cached data from last time" thing to just protect against
some odd interrupt flood (or a user that asks for edid information
explicitly like xrandr).

Linus

2012-02-23 17:31:26

by Dave Airlie

[permalink] [raw]
Subject: Re: responsiveness: newer kernels causing lagging and blocking

On Thu, Feb 23, 2012 at 5:29 PM, Linus Torvalds
<[email protected]> wrote:
> On Thu, Feb 23, 2012 at 9:21 AM, Chris Wilson <[email protected]> wrote:
>>
>> One mystery that has never been resolved is just why wine causes a flood
>> of xrandr/connection probes, and then there is still the underlying issue
>> that probing blocks the device and causes noticeable latency.
>
> Is there any chance that the EDID code would just have a "if I probed
> in the last second or two, let's not do it again and instead just
> return the cached data from last time" thing to just protect against
> some odd interrupt flood (or a user that asks for edid information
> explicitly like xrandr).

X already does that, if you can the randr interfaces it doesn't cause
a probe unless you ask for it.

wine must be doing something silly or just needs to call a different interface.

Dave.
>
> ? ? ? ? ? ? ? ? ? Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at ?http://www.tux.org/lkml/

2012-02-23 18:15:47

by Linus Torvalds

[permalink] [raw]
Subject: Re: responsiveness: newer kernels causing lagging and blocking

On Thu, Feb 23, 2012 at 9:21 AM, Chris Wilson <[email protected]> wrote:
>
> The second is to break the loop for fatal errors, which is addressed by
>
> commit 9292f37e1f5c79400254dca46f83313488093825
> Author: Eugeni Dodonov <[email protected]>
> Date: ? Thu Jan 5 09:34:28 2012 -0200
>
> ? ?drm: give up on edid retries when i2c bus is not responding

So I tried cherry-picking this one commit on top of current -git on
one of those nasty Mac Mini's.

I'm sad to report that I don't get anywhere *close* to the reported
performance numbers. Are there other commits involved that I should
have tried?

> ? ?Timing results for i915-powered machines for 'time xrandr' command:
> ? ?Machine 1: from 0.840s to 0.290s
> ? ?Machine 2: from 0.315s to 0.280s
> ? ?Machine 3: from +/- 4s to 0.184s
>
> ? ?Timing results for HD5770 with 'time xrandr' command:
> ? ?Machine 4: from 3.210s to 1.060s

On that Mac Mini, I get 0.611s before, and 0.553s after. So it does
seem to improve, but not nearly enough.

Part of the problem seems to be the bogus "TV1 unknown connection",
which just isn't true (and probably due to some insane Apple
connection setup). So I suspect something just doesn't get a proper
EDID, and times out or fails, and falls back to some bogus default. I
get

TV1 unknown connection (normal left inverted right x axis y axis)
1024x768 60.0
800x600 60.3
640x480 59.9

and I'm pretty sure there's no TV attached unless it is hiding under
the carpet somehow.

"perf record -a" shows that it's spending the time in udelay() as
expected (30% from intel_wait_for_pipe_off(), 30% from sclhi(), 30%
from bit_xfer() and 10% from acknak()). Which is what you'd expect
with that CPU-polling EDID thing.

So adding some scheduling might be a good idea regardless of that commit.

Linus

2012-02-23 20:15:25

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] drm: Reduce the number of retries whilst reading EDIDs

On Thu, Feb 23, 2012 at 11:52 AM, Chris Wilson <[email protected]> wrote:
>
> i2c retries if sees an EGAIN, drm_do_probe_ddc_edid retries until it
> gets a result and *then* drm_do_get_edid retries until it gets a result
> it is happy with. All in all, that is a lot of processor intensive
> looping in cases where we do not expect and cannot get valid data - for
> example on Intel with disconnected hardware we will busy-spin until we
> hit the i2c timeout. This is then repeated for every connector when
> querying the current status of outputs.

Sadly, this doesn't seem to make any difference to my case. My xrandr
stays at 0.555s even with this patch.

So apparently the Apple Mac Mini issue is not about retries. But maybe
this fixes the problem Stephan B?rwolf reported? The Apple Mac Mini
is known for doing things oddly, so ..

You didn't include Stephan on the cc for that patch, though.

Btw, clearly X does *not* cache the EDID results, at least not for
this case. So the explicit xrandr example is probably pretty close to
what wine does. Maybe the proper fix is to just make X.org force
caching when clients do this (because it's definitely X that does the
drm_mode_getconnector() thing - xrandr itself spends zero time on
this, it just does an X request and waits for the result).

The fact that EDID takes half a second to get is not a problem per se.
It's a problem only when X does it over and over again because some
client does something unexpected.

Linus

2012-02-23 20:36:21

by Chris Wilson

[permalink] [raw]
Subject: [PATCH] drm: Reduce the number of retries whilst reading EDIDs

i2c retries if sees an EGAIN, drm_do_probe_ddc_edid retries until it
gets a result and *then* drm_do_get_edid retries until it gets a result
it is happy with. All in all, that is a lot of processor intensive
looping in cases where we do not expect and cannot get valid data - for
example on Intel with disconnected hardware we will busy-spin until we
hit the i2c timeout. This is then repeated for every connector when
querying the current status of outputs.

So to improve the situation, we can trim the number of retries for
reading the base block and to check for a reschedule before proceeding
so that we do not hog the machine whilst probing outputs. (Though since
we will be likely blocking the graphics device, the user is still going
to notice the latency.)

Reported-by: Linus Torvalds <[email protected]>
Reported-by: Stephan Bärwolf <[email protected]>
Signed-off-by: Chris Wilson <[email protected]>
Cc: Eugeni Dodonov <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Alex Deucher <[email protected]>
---
drivers/gpu/drm/drm_edid.c | 30 +++++++++++++++++++++++-------
1 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c
index ece03fc..4409cd4 100644
--- a/drivers/gpu/drm/drm_edid.c
+++ b/drivers/gpu/drm/drm_edid.c
@@ -35,6 +35,9 @@
#include "drm_edid.h"
#include "drm_edid_modes.h"

+#define EDID_PROBE_RETRIES 1
+#define EDID_READ_RETRIES 5
+
#define version_greater(edid, maj, min) \
(((edid)->version > (maj)) || \
((edid)->version == (maj) && (edid)->revision > (min)))
@@ -239,11 +242,12 @@ EXPORT_SYMBOL(drm_edid_is_valid);
* Try to fetch EDID information by calling i2c driver function.
*/
static int
-drm_do_probe_ddc_edid(struct i2c_adapter *adapter, unsigned char *buf,
- int block, int len)
+drm_get_edid_block(struct i2c_adapter *adapter,
+ unsigned char *buf, int block, int len,
+ int retries)
{
unsigned char start = block * EDID_LENGTH;
- int ret, retries = 5;
+ int ret;

/* The core i2c driver will automatically retry the transfer if the
* adapter reports EAGAIN. However, we find that bit-banging transfers
@@ -293,7 +297,19 @@ drm_do_get_edid(struct drm_connector *connector, struct i2c_adapter *adapter)

/* base block fetch */
for (i = 0; i < 4; i++) {
- if (drm_do_probe_ddc_edid(adapter, block, 0, EDID_LENGTH))
+ /* EDID transfer may be quite processor intensive and last
+ * a long time. For example, when waiting for a timeout on
+ * a non-existent connector whilst using bit-banging. As a
+ * result we can end up hogging the machine, so give someone
+ * else the chance to run first. But when we have started a
+ * transfer don't interrupt until finished.
+ */
+ if (need_resched())
+ schedule();
+
+ if (drm_get_edid_block(adapter,
+ block, 0, EDID_LENGTH,
+ EDID_PROBE_RETRIES))
goto out;
if (drm_edid_block_valid(block))
break;
@@ -316,9 +332,9 @@ drm_do_get_edid(struct drm_connector *connector, struct i2c_adapter *adapter)

for (j = 1; j <= block[0x7e]; j++) {
for (i = 0; i < 4; i++) {
- if (drm_do_probe_ddc_edid(adapter,
+ if (drm_get_edid_block(adapter,
block + (valid_extensions + 1) * EDID_LENGTH,
- j, EDID_LENGTH))
+ j, EDID_LENGTH, EDID_READ_RETRIES))
goto out;
if (drm_edid_block_valid(block + (valid_extensions + 1) * EDID_LENGTH)) {
valid_extensions++;
@@ -362,7 +378,7 @@ drm_probe_ddc(struct i2c_adapter *adapter)
{
unsigned char out;

- return (drm_do_probe_ddc_edid(adapter, &out, 0, 1) == 0);
+ return drm_get_edid_block(adapter, &out, 0, 1, EDID_PROBE_RETRIES) == 0;
}

/**
--
1.7.9.1

2012-02-23 20:36:52

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] drm: Reduce the number of retries whilst reading EDIDs

On Thu, Feb 23, 2012 at 12:15 PM, Linus Torvalds
<[email protected]> wrote:
>
> Sadly, this doesn't seem to make any difference to my case. My xrandr
> stays at 0.555s even with this patch.

Btw, profiling with call chains seems to say that it all comes from
intel_sdvo_get_analog_edid() (about half from intel_sdvo_get_modes()
and half from intel_sdvo_tmds_sink_detect()).

All called through drm_ioctl() -> drm_mode_getconnector() ->
drm_helper_probe_single_connector_modes().

Which I guess isn't anything interesting, but that
intel_sdvo_get_analog_edid() thing seems to be very much a Mac mini
hack. There's a comment about that in the sources too:

/*
* Mac mini hack. On this device, the DVI-I connector shares one DDC
* link between analog and digital outputs. So, if the regular SDVO
* DDC fails, check to see if the analog output is disconnected, in
* which case we'll look there for the digital DDC data.
*/

and maybe that mac mini hack ends up interacting badly with something
else? I'll happily test patches if people have any ideas.

Linus

2012-02-23 21:33:47

by Eugeni Dodonov

[permalink] [raw]
Subject: Re: [PATCH] drm: Reduce the number of retries whilst reading EDIDs

On 02/23/2012 06:15 PM, Linus Torvalds wrote:
> On Thu, Feb 23, 2012 at 11:52 AM, Chris Wilson<[email protected]> wrote:
>>
>> i2c retries if sees an EGAIN, drm_do_probe_ddc_edid retries until it
>> gets a result and *then* drm_do_get_edid retries until it gets a result
>> it is happy with. All in all, that is a lot of processor intensive
>> looping in cases where we do not expect and cannot get valid data - for
>> example on Intel with disconnected hardware we will busy-spin until we
>> hit the i2c timeout. This is then repeated for every connector when
>> querying the current status of outputs.
>
> Sadly, this doesn't seem to make any difference to my case. My xrandr
> stays at 0.555s even with this patch.

Perhaps a stupid question, but does you tree has
http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-next&id=9292f37e1f5c79400254dca46f83313488093825
from Dave's drm-next?

If it has, it would be the 1st time that I see xrandr take longer than
.5s with that patch on an Intel GPU. We even added a check for this into
intel-gpu-tools to warn us if any machine takes that long, and none had
hit it so far. So if this is the case here, there is something Mac
Mini-specific indeed to investigate.

-Eugeni

2012-02-23 21:40:50

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] drm: Reduce the number of retries whilst reading EDIDs

On Thu, Feb 23, 2012 at 1:36 PM, Eugeni Dodonov
<[email protected]> wrote:
>
> Perhaps a stupid question, but does you tree has
> http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-next&id=9292f37e1f5c79400254dca46f83313488093825
> from Dave's drm-next?
>
> If it has, it would be the 1st time that I see xrandr take longer than .5s
> with that patch on an Intel GPU. We even added a check for this into
> intel-gpu-tools to warn us if any machine takes that long, and none had hit
> it so far. So if this is the case here, there is something Mac Mini-specific
> indeed to investigate.

Yup, the tree I tested was current -git with the two commits mentioned
in this thread:

drm: Reduce the number of retries whilst reading EDIDs
drm: give up on edid retries when i2c bus is not responding

applied separately on top of it.

Linus

2012-02-24 15:17:46

by Adam Jackson

[permalink] [raw]
Subject: Re: [PATCH] drm: Reduce the number of retries whilst reading EDIDs

On Thu, 2012-02-23 at 12:15 -0800, Linus Torvalds wrote:

> Btw, clearly X does *not* cache the EDID results, at least not for
> this case. So the explicit xrandr example is probably pretty close to
> what wine does. Maybe the proper fix is to just make X.org force
> caching when clients do this (because it's definitely X that does the
> drm_mode_getconnector() thing - xrandr itself spends zero time on
> this, it just does an X request and waits for the result).

RANDR exposes two requests here for a reason. We used to only have
RRGetScreenResources which always pulled data afresh. We added
RRGetScreenResourcesCurrent to get the cached version. If you want
xrandr(1) to use the latter say xrandr --current. Arguably xrandr
should do things the other way around and require you to say --reprobe
or something.

Even then, the kernel should cache if it can. And any competent
hardware has plug interrupts on DP and DVI, so we really should get this
right.

- ajax


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part