2010-08-23 17:01:51

by Jonathan Corbet

[permalink] [raw]
Subject: i915: 2.6.36-rc2 hoses my Intel display

So I decided to fire up -rc2 today to see what would happen...the
results are best described by the attached images. Something is
clearly scrambled between my hardware and the i915 driver. Display with X
is hosed, but things go weird before X gets a chance to run (it is worth
noting that the initial output from the kernel is legible).

FWIW, my hardware is:

00:02.1 Display controller: Intel Corporation 82Q35 Express Integrated Graphics Controller (rev 02)
Subsystem: Dell OptiPlex 755
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Region 0: Memory at fea80000 (32-bit, non-prefetchable) [size=512K]
Capabilities: [d0] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-

What else can I provide to help track this one down?

Thanks,

jon


Attachments:
(No filename) (1.00 kB)
console.jpg (147.97 kB)
x.jpg (56.12 kB)
Download all attachments

2010-08-23 21:17:12

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [now bisected] i915: 2.6.36-rc2 hoses my Intel display

On Mon, 23 Aug 2010 11:01:45 -0600
Jonathan Corbet <[email protected]> wrote:

> So I decided to fire up -rc2 today to see what would happen...the
> results are best described by the attached images. Something is
> clearly scrambled between my hardware and the i915 driver. Display with X
> is hosed, but things go weird before X gets a chance to run (it is worth
> noting that the initial output from the kernel is legible).

I went ahead and bisected the problem, which was added between -rc1 and
-rc2. The end result is this:

32aad86fe88e7323d4fc5e9e423abcee0d55a03d is the first bad commit
commit 32aad86fe88e7323d4fc5e9e423abcee0d55a03d
Author: Chris Wilson <[email protected]>
Date: Wed Aug 4 13:50:25 2010 +0100

drm/i915/sdvo: Propagate errors from reading/writing control bus.

Signed-off-by: Chris Wilson <[email protected]>
Signed-off-by: Eric Anholt <[email protected]>

I don't know the driver or the hardware and can't begin to guess what
went wrong in that patch, but, hopefully, the information is useful to
somebody. Please let me know if there's anything else I can do to help
track this down.

Thanks,

jon

2010-08-23 22:37:05

by Chris Wilson

[permalink] [raw]
Subject: Re: [now bisected] i915: 2.6.36-rc2 hoses my Intel display

On Mon, 23 Aug 2010 15:17:08 -0600, Jonathan Corbet <[email protected]> wrote:
> I went ahead and bisected the problem, which was added between -rc1 and
> -rc2. The end result is this:

Taking the patch at face value, the cause should be a mistake in error
handling. So the first step would be to identify which i2c_transfer()
failed.

diff --git a/drivers/gpu/drm/i915/intel_sdvo.c b/drivers/gpu/drm/i915/intel_sdvo.c
index 093e914..6afc7cf 100644
--- a/drivers/gpu/drm/i915/intel_sdvo.c
+++ b/drivers/gpu/drm/i915/intel_sdvo.c
@@ -269,7 +269,7 @@ static bool intel_sdvo_read_byte(struct intel_sdvo *intel_sdvo, u8 addr, u8 *ch)
return true;
}

- DRM_DEBUG_KMS("i2c transfer returned %d\n", ret);
+ WARN(1, "i2c transfer failed, ret=%d\n", ret);
return false;
}

@@ -284,8 +284,13 @@ static bool intel_sdvo_write_byte(struct intel_sdvo *intel_sdvo, int addr, u8 ch
.buf = out_buf,
}
};
+ int ret;
+
+ if ((ret = i2c_transfer(intel_sdvo->base.i2c_bus, msgs, 1)) == 1)
+ return true;

- return i2c_transfer(intel_sdvo->base.i2c_bus, msgs, 1) == 1;
+ WARN(1, "i2c transfer failed, ret=%d\n", ret);
+ return false;
}

#define SDVO_CMD_NAME_ENTRY(cmd) {cmd, #cmd}

--
Chris Wilson, Intel Open Source Technology Centre

2010-08-23 23:32:28

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [now bisected] i915: 2.6.36-rc2 hoses my Intel display

On Mon, 23 Aug 2010 23:36:55 +0100
Chris Wilson <[email protected]> wrote:

> Taking the patch at face value, the cause should be a mistake in error
> handling. So the first step would be to identify which i2c_transfer()
> failed.

OK, I tried it, but neither warning triggers.

Don't know if it helps or not, but I tried booting with
drm.debug=0x05. The result was truly vast amounts of stuff like this:

Aug 23 17:20:59 bike kernel: m:drm_ioctl], pid=2032, cmd=0x6458
nm:drm_ioctl], pid=2032, cmd=0x6458, nm:drm_ioctl], pid=2032, cmd=0x645
m:drm_ioctl], pid=2032, cmd=0x6458 m:drm_ioctl], pid=2032, cmd=0x6458,
m:drm_ioctl], pid=2032, cmd=0x6458, nm:drm_ioctl], pid=2032, cmd=0x6458
nm:drm_ioctl], pid=2032, cmd=0x6458 m:drm_ioctl], pid=2032, cmd=0x6458
nm:drm_ioctl], pid=2032, cmd=0x6458 nm:drm_ioctl], pid=2032, cmd=0x6458,
nm:drm_ioctl], pid=2032, cmd=0x6458, nm:drm_ioctl], pid=2032, cmd=0x6458
nm:drm_ioctl], pid=2032, cmd=0x6458 m:drm_ioctl], pid=2032, cmd=0x6458
nm:drm_ioctl], pid=2032, cmd=0x6458, nm:drm_ioctl], pid=2032, cmd=0x6458
m:drm_ioctl], pid=2032, cmd=0x6458 nm:drm_ioctl], pid=2032, cmd=0x6458,
nm:drm_ioctl], pid=2032, cmd=0x6458 m:drm_ioctl], pid=2032, cmd=0x6458
nm:drm_ioctl], pid=2032, cmd=0x6458 nm:drm_ioctl], pid=2032, cmd=0x6458,
nm:drm_ioctl], pid=2032, cmd=0x6458, m:drm_ioctl], pid=2032, cmd=0x6458,
nm:drm_ioctl], pid=2032, cmd=0x6458 nm:drm_ioctl], pid=2032, cmd=0x6458
nm:drm_ioctl], pid=2032, c

The above is one line from the system log; I took the liberty of wrapping
it for readability.

jon

2010-08-23 23:38:19

by Chris Wilson

[permalink] [raw]
Subject: Re: [now bisected] i915: 2.6.36-rc2 hoses my Intel display

On Mon, 23 Aug 2010 17:32:25 -0600, Jonathan Corbet <[email protected]> wrote:
> On Mon, 23 Aug 2010 23:36:55 +0100
> Chris Wilson <[email protected]> wrote:
>
> > Taking the patch at face value, the cause should be a mistake in error
> > handling. So the first step would be to identify which i2c_transfer()
> > failed.
>
> OK, I tried it, but neither warning triggers.

Sigh, that sounds like I screwed the patch up instead.
Thanks.

> Don't know if it helps or not, but I tried booting with
> drm.debug=0x05. The result was truly vast amounts of stuff like this:
>
> Aug 23 17:20:59 bike kernel: m:drm_ioctl], pid=2032, cmd=0x6458
> nm:drm_ioctl], pid=2032, cmd=0x6458, nm:drm_ioctl], pid=2032, cmd=0x645
[snip]
>
> The above is one line from the system log; I took the liberty of wrapping
> it for readability.

Hmm, probably bailing out of the ioctl before hitting the newline.
drm.debug=0x4 should print the right information for this bug.

--
Chris Wilson, Intel Open Source Technology Centre

2010-08-23 23:46:45

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [now bisected] i915: 2.6.36-rc2 hoses my Intel display

On Tue, 24 Aug 2010 00:37:52 +0100
Chris Wilson <[email protected]> wrote:

> drm.debug=0x4 should print the right information for this bug.

That doesn't seem to give me any output at all.

One thing I noticed, though, is that I occasionally get something like:

Aug 23 17:43:14 bike kernel: [ 142.920185] [drm:intel_calculate_wm] *ERROR* Insufficient FIFO for plane, expect flickering: entries required = 51, available = 28.

They seem to come in threes, for whatever that's worth.

Thanks,

jon

2010-08-23 23:56:02

by Chris Wilson

[permalink] [raw]
Subject: Re: [now bisected] i915: 2.6.36-rc2 hoses my Intel display

On Mon, 23 Aug 2010 17:46:41 -0600, Jonathan Corbet <[email protected]> wrote:
> On Tue, 24 Aug 2010 00:37:52 +0100
> Chris Wilson <[email protected]> wrote:
>
> > drm.debug=0x4 should print the right information for this bug.
>
> That doesn't seem to give me any output at all.
>
> One thing I noticed, though, is that I occasionally get something like:
>
> Aug 23 17:43:14 bike kernel: [ 142.920185] [drm:intel_calculate_wm] *ERROR* Insufficient FIFO for plane, expect flickering: entries required = 51, available = 28.
>
> They seem to come in threes, for whatever that's worth.

In threes. Hmm, one for primary, cursor and self-refresh. drm.debug=0xe
would be interesting to see what the pixel clock is.

Can you grab one before the bad commit and one after? If there is a change
that may help pin-point the mistake. Or indicate further problems...

--
Chris Wilson, Intel Open Source Technology Centre

2010-08-24 13:16:31

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [now bisected] i915: 2.6.36-rc2 hoses my Intel display

On Tue, 24 Aug 2010 00:55:54 +0100
Chris Wilson <[email protected]> wrote:

> In threes. Hmm, one for primary, cursor and self-refresh. drm.debug=0xe
> would be interesting to see what the pixel clock is.
>
> Can you grab one before the bad commit and one after? If there is a change
> that may help pin-point the mistake. Or indicate further problems...

OK, three files attached; drm.good is from 2.6.35, drm.bad is from
2.6.36-rc2. I also stripped the times and did a diff, in case that's
useful.

If you'd like output from right around the bad commit, say the word;
that will take a bit of building time (I didn't keep all those bisect
kernels around) but I can do it.

Thanks,

jon


Attachments:
(No filename) (695.00 B)
drm.good (32.13 kB)
drm.bad (33.89 kB)
drm.diff (20.48 kB)
Download all attachments

2010-08-24 13:38:04

by Chris Wilson

[permalink] [raw]
Subject: Re: [now bisected] i915: 2.6.36-rc2 hoses my Intel display

On Tue, 24 Aug 2010 07:16:26 -0600, Jonathan Corbet <[email protected]> wrote:
> On Tue, 24 Aug 2010 00:55:54 +0100
> Chris Wilson <[email protected]> wrote:
>
> > In threes. Hmm, one for primary, cursor and self-refresh. drm.debug=0xe
> > would be interesting to see what the pixel clock is.
> >
> > Can you grab one before the bad commit and one after? If there is a change
> > that may help pin-point the mistake. Or indicate further problems...
>
> OK, three files attached; drm.good is from 2.6.35, drm.bad is from
> 2.6.36-rc2. I also stripped the times and did a diff, in case that's
> useful.

[snip]

> -[drm:intel_calculate_wm], FIFO entries required for mode: 48
> -[drm:intel_calculate_wm], FIFO watermark level: -22
> +[drm:intel_calculate_wm], FIFO entries required for mode: 49
> +[drm:intel_calculate_wm], FIFO watermark level: -23
> +*ERROR* Insufficient FIFO for plane, expect flickering: entries required = 51, available = 28.
> [drm:intel_calculate_wm], FIFO entries required for mode: 0
> [drm:intel_calculate_wm], FIFO watermark level: 29
> [drm:i9xx_update_wm], FIFO watermarks - A: 1, B: 29
> -[drm:i9xx_update_wm], self-refresh entries: 60
> -[drm:i9xx_update_wm], Setting FIFO watermarks - A: 1, B: 29, C: 2, SR 35
> -[drm:i915_get_vblank_counter], trying to get vblank count for disabled pipe 1
> +[drm:i9xx_update_wm], self-refresh entries: 120
> +[drm:i9xx_update_wm], Setting FIFO watermarks - A: 1, B: 29, C: 2, SR 1

I'm going to focus on this since this could account for the on-screen
corruption. Here we suddenly double the computed minimal FIFO size for
self-refresh and due to a separate bug program a minimal low watermark.

That should addressed with
http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=drm-testing&id=30c127264ef9729bcef1d9901718f9a8a47be6a4
however that patch isn't quite ready yet since Jesse pointed out that
some chipsets do indeed want a high-watermark instead of the low-watermark
used, at least, for gen3+.

The question though is why that bad commit would cause a doubling of the
SR. Thanks for the diff, I now know that I need to look more closely at
the mode-fixup for SDVO.

--
Chris Wilson, Intel Open Source Technology Centre

2010-08-26 19:23:47

by Maciej Rutecki

[permalink] [raw]
Subject: Re: i915: 2.6.36-rc2 hoses my Intel display

On poniedziaƂek, 23 sierpnia 2010 o 19:01:45 Jonathan Corbet wrote:
> So I decided to fire up -rc2 today to see what would happen...the
> results are best described by the attached images. Something is
> clearly scrambled between my hardware and the i915 driver. Display with X
> is hosed, but things go weird before X gets a chance to run (it is worth
> noting that the initial output from the kernel is legible).
>
> FWIW, my hardware is:
>
> 00:02.1 Display controller: Intel Corporation 82Q35 Express Integrated
> Graphics Controller (rev 02) Subsystem: Dell OptiPlex 755
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+
> ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> Latency: 0
> Region 0: Memory at fea80000 (32-bit, non-prefetchable) [size=512K]
> Capabilities: [d0] Power Management version 2
> Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
> PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0
> DScale=0 PME-
>
> What else can I provide to help track this one down?

I created a Bugzilla entry at
https://bugzilla.kernel.org/show_bug.cgi?id=17151
for your bug report, please add your address to the CC list in there, thanks!

--
Maciej Rutecki
http://www.maciek.unixy.pl