2013-08-21 10:35:12

by Sedat Dilek

[permalink] [raw]
Subject: Re: linux-next: Tree for Aug 21 [ screen corruption in graphical mode ]

On Wed, Aug 21, 2013 at 11:21 AM, Stephen Rothwell <[email protected]> wrote:
> Hi all,
>
> There will be no linux-next trees on Aug 23 or 26.
>
> Changes since 20130820:
>
> New tree: aio-direct
>
> Removed tree: xilinx (at maintainer's request)
>
> The xfs tree still had its build failure for which I reverted a commit.
>
> The trivial tree gained conflicts against the crypto, net-next and
> wireless trees.
>
> The aio tree gained conflicts against the aio-direct tree.
>
> The akpm-current tree gained conflicts against the modules and aio-direct
> trees.
>
> ----------------------------------------------------------------------------
>

Hi,

I still have this issue with next-20130821 and "Linux v3.11-rc6 plus
drm-intel-nightly on top"
Any new development on this?
Patches?

Currently, I have two workarounds:

[1] Revert this commit:

commit 5456fe3882812aba251886e36fe55bfefb8e8829
"drm/i915: Allocate LLC ringbuffers from stolen"

[2] Stop and start lightdm session manually

...switch to a VT and do a stop/start lightdm.

Any help appreciated.

BTW, I switched my Linux/X graphics stack to Mesa v9.2-rc1 and
Intel-DDX v2.21.14-55-ged40a7c (LibDRM is still v2.4.46).

Thanks.

Regards,
- Sedat -

P.S.: The suspend/resume problem which I saw parallelly on former
-next releases is gone.


2013-08-21 13:44:34

by Daniel Vetter

[permalink] [raw]
Subject: Re: linux-next: Tree for Aug 21 [ screen corruption in graphical mode ]

On Wed, Aug 21, 2013 at 12:35:08PM +0200, Sedat Dilek wrote:
> On Wed, Aug 21, 2013 at 11:21 AM, Stephen Rothwell <[email protected]> wrote:
> > Hi all,
> >
> > There will be no linux-next trees on Aug 23 or 26.
> >
> > Changes since 20130820:
> >
> > New tree: aio-direct
> >
> > Removed tree: xilinx (at maintainer's request)
> >
> > The xfs tree still had its build failure for which I reverted a commit.
> >
> > The trivial tree gained conflicts against the crypto, net-next and
> > wireless trees.
> >
> > The aio tree gained conflicts against the aio-direct tree.
> >
> > The akpm-current tree gained conflicts against the modules and aio-direct
> > trees.
> >
> > ----------------------------------------------------------------------------
> >
>
> Hi,
>
> I still have this issue with next-20130821 and "Linux v3.11-rc6 plus
> drm-intel-nightly on top"
> Any new development on this?
> Patches?

Tbh I'm at a loss what we could try above&beyond what Chris has already
tried out.

> Currently, I have two workarounds:
>
> [1] Revert this commit:
>
> commit 5456fe3882812aba251886e36fe55bfefb8e8829
> "drm/i915: Allocate LLC ringbuffers from stolen"

Since with a rather decent chance the next testing cycle I'll do this
friday will be the last chunk of features for 3.12 I'll probably drop the
above patch from my queue and we can try again in 3.13.

Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2013-08-21 18:11:31

by Sedat Dilek

[permalink] [raw]
Subject: Re: linux-next: Tree for Aug 21 [ screen corruption in graphical mode ]

On Wed, Aug 21, 2013 at 3:44 PM, Daniel Vetter <[email protected]> wrote:
> On Wed, Aug 21, 2013 at 12:35:08PM +0200, Sedat Dilek wrote:
>> On Wed, Aug 21, 2013 at 11:21 AM, Stephen Rothwell <[email protected]> wrote:
>> > Hi all,
>> >
>> > There will be no linux-next trees on Aug 23 or 26.
>> >
>> > Changes since 20130820:
>> >
>> > New tree: aio-direct
>> >
>> > Removed tree: xilinx (at maintainer's request)
>> >
>> > The xfs tree still had its build failure for which I reverted a commit.
>> >
>> > The trivial tree gained conflicts against the crypto, net-next and
>> > wireless trees.
>> >
>> > The aio tree gained conflicts against the aio-direct tree.
>> >
>> > The akpm-current tree gained conflicts against the modules and aio-direct
>> > trees.
>> >
>> > ----------------------------------------------------------------------------
>> >
>>
>> Hi,
>>
>> I still have this issue with next-20130821 and "Linux v3.11-rc6 plus
>> drm-intel-nightly on top"
>> Any new development on this?
>> Patches?
>
> Tbh I'm at a loss what we could try above&beyond what Chris has already
> tried out.
>
>> Currently, I have two workarounds:
>>
>> [1] Revert this commit:
>>
>> commit 5456fe3882812aba251886e36fe55bfefb8e8829
>> "drm/i915: Allocate LLC ringbuffers from stolen"
>
> Since with a rather decent chance the next testing cycle I'll do this
> friday will be the last chunk of features for 3.12 I'll probably drop the
> above patch from my queue and we can try again in 3.13.
>

Inspired by [1] I have switched from UXA to SNA...
...and applied "[PATCH] drm/i915: Cleaning up the relocate entry
function" on top of next-20130821...
...and can NOT see the screen corruptions anymore.

Can you explain that?

- Sedat -

[1] http://cynic.cc/blog//posts/sna_acceleration_vs_uxa/
[2] http://lists.freedesktop.org/archives/intel-gfx/2013-August/032182.html
[3] https://patchwork.kernel.org/patch/2847846/


Attachments:
xorg.conf (122.00 B)

2013-08-21 21:20:05

by Daniel Vetter

[permalink] [raw]
Subject: Re: linux-next: Tree for Aug 21 [ screen corruption in graphical mode ]

On Wed, Aug 21, 2013 at 08:11:27PM +0200, Sedat Dilek wrote:
> On Wed, Aug 21, 2013 at 3:44 PM, Daniel Vetter <[email protected]> wrote:
> > On Wed, Aug 21, 2013 at 12:35:08PM +0200, Sedat Dilek wrote:
> >> On Wed, Aug 21, 2013 at 11:21 AM, Stephen Rothwell <[email protected]> wrote:
> >> > Hi all,
> >> >
> >> > There will be no linux-next trees on Aug 23 or 26.
> >> >
> >> > Changes since 20130820:
> >> >
> >> > New tree: aio-direct
> >> >
> >> > Removed tree: xilinx (at maintainer's request)
> >> >
> >> > The xfs tree still had its build failure for which I reverted a commit.
> >> >
> >> > The trivial tree gained conflicts against the crypto, net-next and
> >> > wireless trees.
> >> >
> >> > The aio tree gained conflicts against the aio-direct tree.
> >> >
> >> > The akpm-current tree gained conflicts against the modules and aio-direct
> >> > trees.
> >> >
> >> > ----------------------------------------------------------------------------
> >> >
> >>
> >> Hi,
> >>
> >> I still have this issue with next-20130821 and "Linux v3.11-rc6 plus
> >> drm-intel-nightly on top"
> >> Any new development on this?
> >> Patches?
> >
> > Tbh I'm at a loss what we could try above&beyond what Chris has already
> > tried out.
> >
> >> Currently, I have two workarounds:
> >>
> >> [1] Revert this commit:
> >>
> >> commit 5456fe3882812aba251886e36fe55bfefb8e8829
> >> "drm/i915: Allocate LLC ringbuffers from stolen"
> >
> > Since with a rather decent chance the next testing cycle I'll do this
> > friday will be the last chunk of features for 3.12 I'll probably drop the
> > above patch from my queue and we can try again in 3.13.
> >
>
> Inspired by [1] I have switched from UXA to SNA...
> ...and applied "[PATCH] drm/i915: Cleaning up the relocate entry
> function" on top of next-20130821...
> ...and can NOT see the screen corruptions anymore.
>
> Can you explain that?

If the relocate cleanup patch [1] is indeed required, then I can't explain
this at all. Can you please double-check that this is really it, and that
it's not the uxa->sna switch?

Thanks, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2013-08-22 06:32:51

by Sedat Dilek

[permalink] [raw]
Subject: Re: linux-next: Tree for Aug 21 [ screen corruption in graphical mode ]

On Wed, Aug 21, 2013 at 11:20 PM, Daniel Vetter <[email protected]> wrote:
> On Wed, Aug 21, 2013 at 08:11:27PM +0200, Sedat Dilek wrote:
>> On Wed, Aug 21, 2013 at 3:44 PM, Daniel Vetter <[email protected]> wrote:
>> > On Wed, Aug 21, 2013 at 12:35:08PM +0200, Sedat Dilek wrote:
>> >> On Wed, Aug 21, 2013 at 11:21 AM, Stephen Rothwell <[email protected]> wrote:
>> >> > Hi all,
>> >> >
>> >> > There will be no linux-next trees on Aug 23 or 26.
>> >> >
>> >> > Changes since 20130820:
>> >> >
>> >> > New tree: aio-direct
>> >> >
>> >> > Removed tree: xilinx (at maintainer's request)
>> >> >
>> >> > The xfs tree still had its build failure for which I reverted a commit.
>> >> >
>> >> > The trivial tree gained conflicts against the crypto, net-next and
>> >> > wireless trees.
>> >> >
>> >> > The aio tree gained conflicts against the aio-direct tree.
>> >> >
>> >> > The akpm-current tree gained conflicts against the modules and aio-direct
>> >> > trees.
>> >> >
>> >> > ----------------------------------------------------------------------------
>> >> >
>> >>
>> >> Hi,
>> >>
>> >> I still have this issue with next-20130821 and "Linux v3.11-rc6 plus
>> >> drm-intel-nightly on top"
>> >> Any new development on this?
>> >> Patches?
>> >
>> > Tbh I'm at a loss what we could try above&beyond what Chris has already
>> > tried out.
>> >
>> >> Currently, I have two workarounds:
>> >>
>> >> [1] Revert this commit:
>> >>
>> >> commit 5456fe3882812aba251886e36fe55bfefb8e8829
>> >> "drm/i915: Allocate LLC ringbuffers from stolen"
>> >
>> > Since with a rather decent chance the next testing cycle I'll do this
>> > friday will be the last chunk of features for 3.12 I'll probably drop the
>> > above patch from my queue and we can try again in 3.13.
>> >
>>
>> Inspired by [1] I have switched from UXA to SNA...
>> ...and applied "[PATCH] drm/i915: Cleaning up the relocate entry
>> function" on top of next-20130821...
>> ...and can NOT see the screen corruptions anymore.
>>
>> Can you explain that?
>
> If the relocate cleanup patch [1] is indeed required, then I can't explain
> this at all. Can you please double-check that this is really it, and that
> it's not the uxa->sna switch?
>

It's independent of the applied patch.
My problem goes away with SNA but still exists with UXA.

As said in my previous analysis... switching back to Ubuntu's graphics
did not show the symptoms, too.

It's interesting to see, it is a problem of the intel-ddx.

Anyway, SNA seems here to be approx. 20% faster in gtkperf-0.40, so I
will use it.

I am open and willing to test any patches you will provide.
Please, let me know.

Thanks.

- Sedat -

2013-08-22 07:24:12

by Daniel Vetter

[permalink] [raw]
Subject: Re: linux-next: Tree for Aug 21 [ screen corruption in graphical mode ]

On Thu, Aug 22, 2013 at 08:32:47AM +0200, Sedat Dilek wrote:
> It's independent of the applied patch.
> My problem goes away with SNA but still exists with UXA.
>
> As said in my previous analysis... switching back to Ubuntu's graphics
> did not show the symptoms, too.
>
> It's interesting to see, it is a problem of the intel-ddx.

Nope, it's just that uxa and sna have completely different buffer object
usage patterns. Not the first time only one of them hits an issue ...

> Anyway, SNA seems here to be approx. 20% faster in gtkperf-0.40, so I
> will use it.
>
> I am open and willing to test any patches you will provide.
> Please, let me know.

Found a new bugger, please test the below patch.

Thanks, Daniel
---
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ef92c69..e0bff07 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2616,6 +2616,9 @@ int i915_vma_unbind(struct i915_vma *vma)
drm_i915_private_t *dev_priv = obj->base.dev->dev_private;
int ret;

+ /* For now we only ever use 1 vma per object */
+ WARN_ON(!list_is_singular(&obj->vma_list));
+
if (list_empty(&vma->vma_link))
return 0;

@@ -2661,7 +2664,9 @@ int i915_vma_unbind(struct i915_vma *vma)
drm_mm_remove_node(&vma->node);

destroy:
- i915_gem_vma_destroy(vma);
+ /* Keep the vma as a placeholder in the execbuffer reservation lists */
+ if (!list_empty(&vma->exec_list))
+ i915_gem_vma_destroy(vma);

/* Since the unbound list is global, only move to that list if
* no more VMAs exist.
@@ -4171,10 +4176,6 @@ void i915_gem_vma_destroy(struct i915_vma *vma)
WARN_ON(vma->node.allocated);
list_del(&vma->vma_link);

- /* Keep the vma as a placeholder in the execbuffer reservation lists */
- if (!list_empty(&vma->exec_list))
- return;
-
kfree(vma);
}

--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2013-08-22 11:13:36

by Sedat Dilek

[permalink] [raw]
Subject: Re: linux-next: Tree for Aug 21 [ screen corruption in graphical mode ]

On Thu, Aug 22, 2013 at 9:24 AM, Daniel Vetter <[email protected]> wrote:
> On Thu, Aug 22, 2013 at 08:32:47AM +0200, Sedat Dilek wrote:
>> It's independent of the applied patch.
>> My problem goes away with SNA but still exists with UXA.
>>
>> As said in my previous analysis... switching back to Ubuntu's graphics
>> did not show the symptoms, too.
>>
>> It's interesting to see, it is a problem of the intel-ddx.
>
> Nope, it's just that uxa and sna have completely different buffer object
> usage patterns. Not the first time only one of them hits an issue ...
>
>> Anyway, SNA seems here to be approx. 20% faster in gtkperf-0.40, so I
>> will use it.
>>
>> I am open and willing to test any patches you will provide.
>> Please, let me know.
>
> Found a new bugger, please test the below patch.
>
> Thanks, Daniel
> ---
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index ef92c69..e0bff07 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2616,6 +2616,9 @@ int i915_vma_unbind(struct i915_vma *vma)
> drm_i915_private_t *dev_priv = obj->base.dev->dev_private;
> int ret;
>
> + /* For now we only ever use 1 vma per object */
> + WARN_ON(!list_is_singular(&obj->vma_list));
> +
> if (list_empty(&vma->vma_link))
> return 0;
>
> @@ -2661,7 +2664,9 @@ int i915_vma_unbind(struct i915_vma *vma)
> drm_mm_remove_node(&vma->node);
>
> destroy:
> - i915_gem_vma_destroy(vma);
> + /* Keep the vma as a placeholder in the execbuffer reservation lists */
> + if (!list_empty(&vma->exec_list))
> + i915_gem_vma_destroy(vma);
>
> /* Since the unbound list is global, only move to that list if
> * no more VMAs exist.
> @@ -4171,10 +4176,6 @@ void i915_gem_vma_destroy(struct i915_vma *vma)
> WARN_ON(vma->node.allocated);
> list_del(&vma->vma_link);
>
> - /* Keep the vma as a placeholder in the execbuffer reservation lists */
> - if (!list_empty(&vma->exec_list))
> - return;
> -
> kfree(vma);
> }
>

dmesg (a lot of traces) and kernel-config attached.

UXA causes still screen corruption.

$ egrep -i 'uxa|sna|accelmethod' /var/log/Xorg.0.log
[ 118.951] (**) intel(0): Option "AccelMethod" "uxa"
[ 118.960] (II) UXA(0): Driver registered support for the following
operations:

- Sedat -

Is "drm/i915: More vma fixups around unbind/destroy" the nearly same fix?

[1] https://patchwork.kernel.org/patch/2848146/


Attachments:
dmesg_3.11.0-rc6-next20130821-1-iniza-small_with-danvet-patch.txt (195.90 kB)
config-3.11.0-rc6-next20130821-1-iniza-small (113.02 kB)
Download all attachments

2013-08-22 11:22:44

by Sedat Dilek

[permalink] [raw]
Subject: Re: linux-next: Tree for Aug 21 [ screen corruption in graphical mode ]

On Thu, Aug 22, 2013 at 1:13 PM, Sedat Dilek <[email protected]> wrote:
> On Thu, Aug 22, 2013 at 9:24 AM, Daniel Vetter <[email protected]> wrote:
>> On Thu, Aug 22, 2013 at 08:32:47AM +0200, Sedat Dilek wrote:
>>> It's independent of the applied patch.
>>> My problem goes away with SNA but still exists with UXA.
>>>
>>> As said in my previous analysis... switching back to Ubuntu's graphics
>>> did not show the symptoms, too.
>>>
>>> It's interesting to see, it is a problem of the intel-ddx.
>>
>> Nope, it's just that uxa and sna have completely different buffer object
>> usage patterns. Not the first time only one of them hits an issue ...
>>
>>> Anyway, SNA seems here to be approx. 20% faster in gtkperf-0.40, so I
>>> will use it.
>>>
>>> I am open and willing to test any patches you will provide.
>>> Please, let me know.
>>
>> Found a new bugger, please test the below patch.
>>
>> Thanks, Daniel
>> ---
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index ef92c69..e0bff07 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -2616,6 +2616,9 @@ int i915_vma_unbind(struct i915_vma *vma)
>> drm_i915_private_t *dev_priv = obj->base.dev->dev_private;
>> int ret;
>>
>> + /* For now we only ever use 1 vma per object */
>> + WARN_ON(!list_is_singular(&obj->vma_list));
>> +
>> if (list_empty(&vma->vma_link))
>> return 0;
>>
>> @@ -2661,7 +2664,9 @@ int i915_vma_unbind(struct i915_vma *vma)
>> drm_mm_remove_node(&vma->node);
>>
>> destroy:
>> - i915_gem_vma_destroy(vma);
>> + /* Keep the vma as a placeholder in the execbuffer reservation lists */
>> + if (!list_empty(&vma->exec_list))
>> + i915_gem_vma_destroy(vma);
>>
>> /* Since the unbound list is global, only move to that list if
>> * no more VMAs exist.
>> @@ -4171,10 +4176,6 @@ void i915_gem_vma_destroy(struct i915_vma *vma)
>> WARN_ON(vma->node.allocated);
>> list_del(&vma->vma_link);
>>
>> - /* Keep the vma as a placeholder in the execbuffer reservation lists */
>> - if (!list_empty(&vma->exec_list))
>> - return;
>> -
>> kfree(vma);
>> }
>>
>
> dmesg (a lot of traces) and kernel-config attached.
>
> UXA causes still screen corruption.
>
> $ egrep -i 'uxa|sna|accelmethod' /var/log/Xorg.0.log
> [ 118.951] (**) intel(0): Option "AccelMethod" "uxa"
> [ 118.960] (II) UXA(0): Driver registered support for the following
> operations:
>
> - Sedat -
>
> Is "drm/i915: More vma fixups around unbind/destroy" the nearly same fix?
>
> [1] https://patchwork.kernel.org/patch/2848146/

With above [1] I see no traces.

- Sedat -


Attachments:
dmesg_3.11.0-rc6-next20130821-1-iniza-small_uxa_with-danvet-patch-2.txt (172.51 kB)

2013-08-22 11:30:52

by Daniel Vetter

[permalink] [raw]
Subject: Re: linux-next: Tree for Aug 21 [ screen corruption in graphical mode ]

On Thu, Aug 22, 2013 at 1:13 PM, Sedat Dilek <[email protected]> wrote:
> dmesg (a lot of traces) and kernel-config attached.
>
> UXA causes still screen corruption.

Hm, was only a slim chance that this patch would fix anything - I
think you'd always see an oops when you'd hit this bug instead of just
a bit of corruption.

> $ egrep -i 'uxa|sna|accelmethod' /var/log/Xorg.0.log
> [ 118.951] (**) intel(0): Option "AccelMethod" "uxa"
> [ 118.960] (II) UXA(0): Driver registered support for the following
> operations:
>
> - Sedat -
>
> Is "drm/i915: More vma fixups around unbind/destroy" the nearly same fix?

Yeah, that version should get rid of the WARN noise in dmesg.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2013-08-22 11:32:18

by Daniel Vetter

[permalink] [raw]
Subject: Re: linux-next: Tree for Aug 21 [ screen corruption in graphical mode ]

On Thu, Aug 22, 2013 at 1:30 PM, Daniel Vetter <[email protected]> wrote:
> On Thu, Aug 22, 2013 at 1:13 PM, Sedat Dilek <[email protected]> wrote:
>> dmesg (a lot of traces) and kernel-config attached.
>>
>> UXA causes still screen corruption.
>
> Hm, was only a slim chance that this patch would fix anything - I
> think you'd always see an oops when you'd hit this bug instead of just
> a bit of corruption.

Ok, I think it's time to throw in the towel a bit. I've dropped


commit d46f1c3f1372e3a72fab97c60480aa4a1084387f
Author: Chris Wilson <[email protected]>
Date: Thu Aug 8 14:41:06 2013 +0100

drm/i915: Allow the GPU to cache stolen memory

from my queue. I guess we can retry for 3.13 again.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2013-08-23 07:55:19

by Sedat Dilek

[permalink] [raw]
Subject: Re: linux-next: Tree for Aug 21 [ screen corruption in graphical mode ]

On Thu, Aug 22, 2013 at 1:32 PM, Daniel Vetter <[email protected]> wrote:
> On Thu, Aug 22, 2013 at 1:30 PM, Daniel Vetter <[email protected]> wrote:
>> On Thu, Aug 22, 2013 at 1:13 PM, Sedat Dilek <[email protected]> wrote:
>>> dmesg (a lot of traces) and kernel-config attached.
>>>
>>> UXA causes still screen corruption.
>>
>> Hm, was only a slim chance that this patch would fix anything - I
>> think you'd always see an oops when you'd hit this bug instead of just
>> a bit of corruption.
>
> Ok, I think it's time to throw in the towel a bit. I've dropped
>
>
> commit d46f1c3f1372e3a72fab97c60480aa4a1084387f
> Author: Chris Wilson <[email protected]>
> Date: Thu Aug 8 14:41:06 2013 +0100
>
> drm/i915: Allow the GPU to cache stolen memory
>
> from my queue. I guess we can retry for 3.13 again.

I am sorry to keep someone's work to be delayed, really.
I would have liked to see this fixed (and I have spent some time on it).

Which patches did you exactly drop?

- Sedat -

2013-08-23 08:04:48

by Sedat Dilek

[permalink] [raw]
Subject: Re: linux-next: Tree for Aug 21 [ screen corruption in graphical mode ]

On Fri, Aug 23, 2013 at 9:55 AM, Sedat Dilek <[email protected]> wrote:
> On Thu, Aug 22, 2013 at 1:32 PM, Daniel Vetter <[email protected]> wrote:
>> On Thu, Aug 22, 2013 at 1:30 PM, Daniel Vetter <[email protected]> wrote:
>>> On Thu, Aug 22, 2013 at 1:13 PM, Sedat Dilek <[email protected]> wrote:
>>>> dmesg (a lot of traces) and kernel-config attached.
>>>>
>>>> UXA causes still screen corruption.
>>>
>>> Hm, was only a slim chance that this patch would fix anything - I
>>> think you'd always see an oops when you'd hit this bug instead of just
>>> a bit of corruption.
>>
>> Ok, I think it's time to throw in the towel a bit. I've dropped
>>
>>
>> commit d46f1c3f1372e3a72fab97c60480aa4a1084387f
>> Author: Chris Wilson <[email protected]>
>> Date: Thu Aug 8 14:41:06 2013 +0100
>>
>> drm/i915: Allow the GPU to cache stolen memory
>>
>> from my queue. I guess we can retry for 3.13 again.
>
> I am sorry to keep someone's work to be delayed, really.
> I would have liked to see this fixed (and I have spent some time on it).
>
> Which patches did you exactly drop?
>

Sorry for bombing you with question...

I am trying latest Linus-tree HEAD with the drm-intel-nightly I made
my last testings.

Are any of these TLB / x86-get_unmapped_area fixes of interested...
has any effects on the reported issue?

I still wonder what is the root-cause...
I mean if SNA is OK but UXA not and Linux graphics stack is that complex.
( Can't say if user-space like unity isn't involved... ).

- Sedat -

[1] Fix TLB gather virtual address range invalidation corner cases
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/?id=2b047252d087be7f2ba088b4933cd904f92e6fce

[2] Revert "x86 get_unmapped_area(): use proper mmap base for
bottom-up direction"
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/?id=5ea80f76a56605a190a7ea16846c82aa63dbd0aa

[3] x86 get_unmapped_area: Access mmap_legacy_base through mm_struct member
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/?id=41aacc1eea645c99edbe8fbcf78a97dc9b862adc

2013-08-23 08:34:57

by Chris Wilson

[permalink] [raw]
Subject: Re: linux-next: Tree for Aug 21 [ screen corruption in graphical mode ]

On Fri, Aug 23, 2013 at 10:04:37AM +0200, Sedat Dilek wrote:
> On Fri, Aug 23, 2013 at 9:55 AM, Sedat Dilek <[email protected]> wrote:
> > On Thu, Aug 22, 2013 at 1:32 PM, Daniel Vetter <[email protected]> wrote:
> >> On Thu, Aug 22, 2013 at 1:30 PM, Daniel Vetter <[email protected]> wrote:
> >>> On Thu, Aug 22, 2013 at 1:13 PM, Sedat Dilek <[email protected]> wrote:
> >>>> dmesg (a lot of traces) and kernel-config attached.
> >>>>
> >>>> UXA causes still screen corruption.
> >>>
> >>> Hm, was only a slim chance that this patch would fix anything - I
> >>> think you'd always see an oops when you'd hit this bug instead of just
> >>> a bit of corruption.
> >>
> >> Ok, I think it's time to throw in the towel a bit. I've dropped
> >>
> >>
> >> commit d46f1c3f1372e3a72fab97c60480aa4a1084387f
> >> Author: Chris Wilson <[email protected]>
> >> Date: Thu Aug 8 14:41:06 2013 +0100
> >>
> >> drm/i915: Allow the GPU to cache stolen memory

Hmm, wrong patch. Unless you have a good reason, you just want to drop
the ringbuffers in stolen.

> >> from my queue. I guess we can retry for 3.13 again.
> >
> > I am sorry to keep someone's work to be delayed, really.
> > I would have liked to see this fixed (and I have spent some time on it).

It's just a minor memory optimization (reclaiming less than a megabyte
of system memory).

> > Which patches did you exactly drop?
> >
>
> Sorry for bombing you with question...
>
> I am trying latest Linus-tree HEAD with the drm-intel-nightly I made
> my last testings.
>
> Are any of these TLB / x86-get_unmapped_area fixes of interested...
> has any effects on the reported issue?

It should not. Of concern is how the GPU views the world which has its
own independent set of TLBs and mapping tables - and access to those
should always be uncached from the CPU's perspective.

> I still wonder what is the root-cause...
> I mean if SNA is OK but UXA not and Linux graphics stack is that complex.
> ( Can't say if user-space like unity isn't involved... ).

All that userspace can affect here is the timing and inital contents of
the framebuffer, everything else is controlled by the kernel. All the
testing we have done so far imply that the kernel's view of the machine
state is consistent with our expectations, but the display is doing
something inexplicable.
-Chris

--
Chris Wilson, Intel Open Source Technology Centre

2013-08-23 08:48:01

by Sedat Dilek

[permalink] [raw]
Subject: Re: linux-next: Tree for Aug 21 [ screen corruption in graphical mode ]

On Fri, Aug 23, 2013 at 10:34 AM, Chris Wilson <[email protected]> wrote:
> On Fri, Aug 23, 2013 at 10:04:37AM +0200, Sedat Dilek wrote:
>> On Fri, Aug 23, 2013 at 9:55 AM, Sedat Dilek <[email protected]> wrote:
>> > On Thu, Aug 22, 2013 at 1:32 PM, Daniel Vetter <[email protected]> wrote:
>> >> On Thu, Aug 22, 2013 at 1:30 PM, Daniel Vetter <[email protected]> wrote:
>> >>> On Thu, Aug 22, 2013 at 1:13 PM, Sedat Dilek <[email protected]> wrote:
>> >>>> dmesg (a lot of traces) and kernel-config attached.
>> >>>>
>> >>>> UXA causes still screen corruption.
>> >>>
>> >>> Hm, was only a slim chance that this patch would fix anything - I
>> >>> think you'd always see an oops when you'd hit this bug instead of just
>> >>> a bit of corruption.
>> >>
>> >> Ok, I think it's time to throw in the towel a bit. I've dropped
>> >>
>> >>
>> >> commit d46f1c3f1372e3a72fab97c60480aa4a1084387f
>> >> Author: Chris Wilson <[email protected]>
>> >> Date: Thu Aug 8 14:41:06 2013 +0100
>> >>
>> >> drm/i915: Allow the GPU to cache stolen memory
>
> Hmm, wrong patch. Unless you have a good reason, you just want to drop
> the ringbuffers in stolen.
>
>> >> from my queue. I guess we can retry for 3.13 again.
>> >
>> > I am sorry to keep someone's work to be delayed, really.
>> > I would have liked to see this fixed (and I have spent some time on it).
>
> It's just a minor memory optimization (reclaiming less than a megabyte
> of system memory).
>
>> > Which patches did you exactly drop?
>> >
>>
>> Sorry for bombing you with question...
>>
>> I am trying latest Linus-tree HEAD with the drm-intel-nightly I made
>> my last testings.
>>
>> Are any of these TLB / x86-get_unmapped_area fixes of interested...
>> has any effects on the reported issue?
>
> It should not. Of concern is how the GPU views the world which has its
> own independent set of TLBs and mapping tables - and access to those
> should always be uncached from the CPU's perspective.
> s

Linux-v3.11-rc6-76-g6a7492a with my last d-i-n on top shows still the
same issue with UXA.
So, this is unrelated.

>> I still wonder what is the root-cause...
>> I mean if SNA is OK but UXA not and Linux graphics stack is that complex.
>> ( Can't say if user-space like unity isn't involved... ).
>
> All that userspace can affect here is the timing and inital contents of
> the framebuffer, everything else is controlled by the kernel. All the
> testing we have done so far imply that the kernel's view of the machine
> state is consistent with our expectations, but the display is doing
> something inexplicable.

I checked for a new BIOS, but version 13XK is the last available.

If I start in text-mode and then run from my VT-1 lightdm service
manually I see this screen/display corruption.
On a restart of lightdm everything is fine.

How can I check if my greeter and/or unity(-2d) is the culprit?
AFAICS I have here E17 for testing.

I made no benchmark with the (new) d-i-n w/ dropped patches.
Lemme see if I find some time.

- Sedat -

2013-08-24 09:35:05

by Sedat Dilek

[permalink] [raw]
Subject: Re: linux-next: Tree for Aug 21 [ screen corruption in graphical mode ]

On Fri, Aug 23, 2013 at 10:34 AM, Chris Wilson <[email protected]> wrote:
> On Fri, Aug 23, 2013 at 10:04:37AM +0200, Sedat Dilek wrote:
>> On Fri, Aug 23, 2013 at 9:55 AM, Sedat Dilek <[email protected]> wrote:
>> > On Thu, Aug 22, 2013 at 1:32 PM, Daniel Vetter <[email protected]> wrote:
>> >> On Thu, Aug 22, 2013 at 1:30 PM, Daniel Vetter <[email protected]> wrote:
>> >>> On Thu, Aug 22, 2013 at 1:13 PM, Sedat Dilek <[email protected]> wrote:
>> >>>> dmesg (a lot of traces) and kernel-config attached.
>> >>>>
>> >>>> UXA causes still screen corruption.
>> >>>
>> >>> Hm, was only a slim chance that this patch would fix anything - I
>> >>> think you'd always see an oops when you'd hit this bug instead of just
>> >>> a bit of corruption.
>> >>
>> >> Ok, I think it's time to throw in the towel a bit. I've dropped
>> >>
>> >>
>> >> commit d46f1c3f1372e3a72fab97c60480aa4a1084387f
>> >> Author: Chris Wilson <[email protected]>
>> >> Date: Thu Aug 8 14:41:06 2013 +0100
>> >>
>> >> drm/i915: Allow the GPU to cache stolen memory
>
> Hmm, wrong patch. Unless you have a good reason, you just want to drop
> the ringbuffers in stolen.
>
>> >> from my queue. I guess we can retry for 3.13 again.

Just to clarify...

I pulled in today's d-i-n (e1a7374a9920cbbb085ca310e50c903d182d1ca9)
on top of Linus-git HEAD (v3.11-rc6-139-g89b53e5).

This is the list of drm/i915 patches containing "stolen":

drm/i915: Free stolen node on failed preallocation
drm/i915: Verify that our stolen memory doesn't conflict
drm/i915: Use Graphics Base of Stolen Memory on all gen3+
drm/i915: List objects allocated from stolen memory in debugfs
drm/i915: Allow the GPU to cache stolen memory <--- *** NOT DROPPED ***
drm/i915: less magic for stolen preallocated objects w/o gtt offset
drm/i915: WARN if the bios reserved range is bigger than stolen size
drm/i915: disable stolen mem for OVERLAY_NEEDS_PHYSICAL
drm/i915: clarify error paths in create_stolen_for_preallocated

So the culprit patch "drm/i915: Allocate LLC ringbuffers from stolen"
seems to be gone.
I have not compared against my other cumulated patches.
As far as I understood Daniel he dropped a patchset, but I might have
understood him wrong.

- Sedat -

2013-08-24 10:55:10

by Sedat Dilek

[permalink] [raw]
Subject: Re: linux-next: Tree for Aug 21 [ screen corruption in graphical mode ]

On Fri, Aug 23, 2013 at 10:34 AM, Chris Wilson <[email protected]> wrote:
> On Fri, Aug 23, 2013 at 10:04:37AM +0200, Sedat Dilek wrote:
>> On Fri, Aug 23, 2013 at 9:55 AM, Sedat Dilek <[email protected]> wrote:
>> > On Thu, Aug 22, 2013 at 1:32 PM, Daniel Vetter <[email protected]> wrote:
>> >> On Thu, Aug 22, 2013 at 1:30 PM, Daniel Vetter <[email protected]> wrote:
>> >>> On Thu, Aug 22, 2013 at 1:13 PM, Sedat Dilek <[email protected]> wrote:
>> >>>> dmesg (a lot of traces) and kernel-config attached.
>> >>>>
>> >>>> UXA causes still screen corruption.
>> >>>
>> >>> Hm, was only a slim chance that this patch would fix anything - I
>> >>> think you'd always see an oops when you'd hit this bug instead of just
>> >>> a bit of corruption.
>> >>
>> >> Ok, I think it's time to throw in the towel a bit. I've dropped
>> >>
>> >>
>> >> commit d46f1c3f1372e3a72fab97c60480aa4a1084387f
>> >> Author: Chris Wilson <[email protected]>
>> >> Date: Thu Aug 8 14:41:06 2013 +0100
>> >>
>> >> drm/i915: Allow the GPU to cache stolen memory
>
> Hmm, wrong patch. Unless you have a good reason, you just want to drop
> the ringbuffers in stolen.
>
>> >> from my queue. I guess we can retry for 3.13 again.
>> >
>> > I am sorry to keep someone's work to be delayed, really.
>> > I would have liked to see this fixed (and I have spent some time on it).
>
> It's just a minor memory optimization (reclaiming less than a megabyte
> of system memory).
>
>> > Which patches did you exactly drop?
>> >
>>
>> Sorry for bombing you with question...
>>
>> I am trying latest Linus-tree HEAD with the drm-intel-nightly I made
>> my last testings.
>>
>> Are any of these TLB / x86-get_unmapped_area fixes of interested...
>> has any effects on the reported issue?
>
> It should not. Of concern is how the GPU views the world which has its
> own independent set of TLBs and mapping tables - and access to those
> should always be uncached from the CPU's perspective.
>
>> I still wonder what is the root-cause...
>> I mean if SNA is OK but UXA not and Linux graphics stack is that complex.
>> ( Can't say if user-space like unity isn't involved... ).
>
> All that userspace can affect here is the timing and inital contents of
> the framebuffer, everything else is controlled by the kernel. All the
> testing we have done so far imply that the kernel's view of the machine
> state is consistent with our expectations, but the display is doing
> something inexplicable.

Can this be a problem?

[ dmesg ]
[ 27.949704] [drm] Wrong MCH_SSKPD value: 0x16040307
[ 27.949708] [drm] This can cause pipe underruns and display issues.
[ 27.949709] [drm] Please upgrade your BIOS to fix this.

- Sedat -