2017-03-05 23:01:56

by Pavel Machek

[permalink] [raw]
Subject: [regression] Re: 4.11-rc0, thinkpad x220: GPU hang

Hi!

> > mplayer stopped working after a while. Dmesg says:
> >
> > [ 3000.266533] cdc_ether 2-1.2:1.0 usb0: register 'cdc_ether' at

Now I'm pretty sure it is a regression in v4.11-rc0. Any ideas what to
try? Bisect will be slow and nasty :-(.

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (398.00 B)
signature.asc (181.00 B)
Digital signature
Download all attachments

2017-03-06 11:16:20

by Chris Wilson

[permalink] [raw]
Subject: Re: [regression] Re: 4.11-rc0, thinkpad x220: GPU hang

On Mon, Mar 06, 2017 at 12:01:51AM +0100, Pavel Machek wrote:
> Hi!
>
> > > mplayer stopped working after a while. Dmesg says:
> > >
> > > [ 3000.266533] cdc_ether 2-1.2:1.0 usb0: register 'cdc_ether' at
>
> Now I'm pretty sure it is a regression in v4.11-rc0. Any ideas what to
> try? Bisect will be slow and nasty :-(.

I came the conclusion that #99671 is the ring HEAD overtaking the TAIL,
and under the presumption that your bug matches (as the symptoms do):

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 4ffa35faff49..62e31a7438ac 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -782,10 +782,10 @@ static void i9xx_submit_request(struct drm_i915_gem_request *request)
{
struct drm_i915_private *dev_priv = request->i915;

- i915_gem_request_submit(request);
-
GEM_BUG_ON(!IS_ALIGNED(request->tail, 8));
I915_WRITE_TAIL(request->engine, request->tail);
+
+ i915_gem_request_submit(request);
}

static void i9xx_emit_breadcrumb(struct drm_i915_gem_request *req, u32 *cs)


--
Chris Wilson, Intel Open Source Technology Centre

2017-03-06 12:06:43

by Chris Wilson

[permalink] [raw]
Subject: Re: [regression] Re: 4.11-rc0, thinkpad x220: GPU hang

On Mon, Mar 06, 2017 at 11:15:28AM +0000, Chris Wilson wrote:
> On Mon, Mar 06, 2017 at 12:01:51AM +0100, Pavel Machek wrote:
> > Hi!
> >
> > > > mplayer stopped working after a while. Dmesg says:
> > > >
> > > > [ 3000.266533] cdc_ether 2-1.2:1.0 usb0: register 'cdc_ether' at
> >
> > Now I'm pretty sure it is a regression in v4.11-rc0. Any ideas what to
> > try? Bisect will be slow and nasty :-(.
>
> I came the conclusion that #99671 is the ring HEAD overtaking the TAIL,
> and under the presumption that your bug matches (as the symptoms do):
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 4ffa35faff49..62e31a7438ac 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -782,10 +782,10 @@ static void i9xx_submit_request(struct drm_i915_gem_request *request)
> {
> struct drm_i915_private *dev_priv = request->i915;
>
> - i915_gem_request_submit(request);
> -
> GEM_BUG_ON(!IS_ALIGNED(request->tail, 8));
> I915_WRITE_TAIL(request->engine, request->tail);
> +
> + i915_gem_request_submit(request);

Hmm. request->tail is not set until i915_gem_request_submit() Uh oh.
-Chris

--
Chris Wilson, Intel Open Source Technology Centre

2017-03-06 12:17:54

by Pavel Machek

[permalink] [raw]
Subject: Re: [regression] Re: 4.11-rc0, thinkpad x220: GPU hang

On Mon 2017-03-06 11:15:28, Chris Wilson wrote:
> On Mon, Mar 06, 2017 at 12:01:51AM +0100, Pavel Machek wrote:
> > Hi!
> >
> > > > mplayer stopped working after a while. Dmesg says:
> > > >
> > > > [ 3000.266533] cdc_ether 2-1.2:1.0 usb0: register 'cdc_ether' at
> >
> > Now I'm pretty sure it is a regression in v4.11-rc0. Any ideas what to
> > try? Bisect will be slow and nasty :-(.
>
> I came the conclusion that #99671 is the ring HEAD overtaking the TAIL,
> and under the presumption that your bug matches (as the symptoms do):
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 4ffa35faff49..62e31a7438ac 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -782,10 +782,10 @@ static void i9xx_submit_request(struct drm_i915_gem_request *request)
> {
> struct drm_i915_private *dev_priv = request->i915;
>
> - i915_gem_request_submit(request);
> -
> GEM_BUG_ON(!IS_ALIGNED(request->tail, 8));
> I915_WRITE_TAIL(request->engine, request->tail);
> +
> + i915_gem_request_submit(request);
> }
>
> static void i9xx_emit_breadcrumb(struct drm_i915_gem_request *req, u32 *cs)

I applied it as:

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 91bc4ab..9c49c7a 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1338,9 +1338,9 @@ static void i9xx_submit_request(struct drm_i915_gem_request *request)
{
struct drm_i915_private *dev_priv = request->i915;

- i915_gem_request_submit(request);
-
I915_WRITE_TAIL(request->engine, request->tail);
+
+ i915_gem_request_submit(request);
}

static void i9xx_emit_breadcrumb(struct drm_i915_gem_request *req,

Hmm. But your next mail suggest that it may not be smart to try to
boot it? :-).

Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (2.01 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments

2017-03-06 12:24:10

by Chris Wilson

[permalink] [raw]
Subject: Re: [regression] Re: 4.11-rc0, thinkpad x220: GPU hang

On Mon, Mar 06, 2017 at 01:10:48PM +0100, Pavel Machek wrote:
> On Mon 2017-03-06 11:15:28, Chris Wilson wrote:
> > On Mon, Mar 06, 2017 at 12:01:51AM +0100, Pavel Machek wrote:
> > > Hi!
> > >
> > > > > mplayer stopped working after a while. Dmesg says:
> > > > >
> > > > > [ 3000.266533] cdc_ether 2-1.2:1.0 usb0: register 'cdc_ether' at
> > >
> > > Now I'm pretty sure it is a regression in v4.11-rc0. Any ideas what to
> > > try? Bisect will be slow and nasty :-(.
> >
> > I came the conclusion that #99671 is the ring HEAD overtaking the TAIL,
> > and under the presumption that your bug matches (as the symptoms do):
> >
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 4ffa35faff49..62e31a7438ac 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -782,10 +782,10 @@ static void i9xx_submit_request(struct drm_i915_gem_request *request)
> > {
> > struct drm_i915_private *dev_priv = request->i915;
> >
> > - i915_gem_request_submit(request);
> > -
> > GEM_BUG_ON(!IS_ALIGNED(request->tail, 8));
> > I915_WRITE_TAIL(request->engine, request->tail);
> > +
> > + i915_gem_request_submit(request);
> > }
> >
> > static void i9xx_emit_breadcrumb(struct drm_i915_gem_request *req, u32 *cs)
>
> I applied it as:
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 91bc4ab..9c49c7a 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1338,9 +1338,9 @@ static void i9xx_submit_request(struct drm_i915_gem_request *request)
> {
> struct drm_i915_private *dev_priv = request->i915;
>
> - i915_gem_request_submit(request);
> -
> I915_WRITE_TAIL(request->engine, request->tail);
> +
> + i915_gem_request_submit(request);
> }
>
> static void i9xx_emit_breadcrumb(struct drm_i915_gem_request *req,
>
> Hmm. But your next mail suggest that it may not be smart to try to
> boot it? :-).

Don't bother, it'll promptly hang.
-Chris

--
Chris Wilson, Intel Open Source Technology Centre

2017-03-14 09:08:30

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [regression] Re: 4.11-rc0, thinkpad x220: GPU hang

On 06.03.2017 00:01, Pavel Machek wrote:
>>> mplayer stopped working after a while. Dmesg says:
>>>
>>> [ 3000.266533] cdc_ether 2-1.2:1.0 usb0: register 'cdc_ether' at
> Now I'm pretty sure it is a regression in v4.11-rc0. Any ideas what to
> try? Bisect will be slow and nasty :-(.

@Pavel, @Chris: What's the status of this?

I added this report to the list of regressions for Linux 4.11. I'll try
to watch this thread for further updates on this issue to document
progress in my weekly reports. Please let me know in case the discussion
moves to a different place (bugzilla or another mail thread for
example). tia!

Ciao, Thorsten

2017-03-14 11:35:53

by Pavel Machek

[permalink] [raw]
Subject: Re: [regression] Re: 4.11-rc0, thinkpad x220: GPU hang

On Tue 2017-03-14 10:08:23, Thorsten Leemhuis wrote:
> On 06.03.2017 00:01, Pavel Machek wrote:
> >>> mplayer stopped working after a while. Dmesg says:
> >>>
> >>> [ 3000.266533] cdc_ether 2-1.2:1.0 usb0: register 'cdc_ether' at
> > Now I'm pretty sure it is a regression in v4.11-rc0. Any ideas what to
> > try? Bisect will be slow and nasty :-(.
>
> @Pavel, @Chris: What's the status of this?
>
> I added this report to the list of regressions for Linux 4.11. I'll try
> to watch this thread for further updates on this issue to document
> progress in my weekly reports. Please let me know in case the discussion
> moves to a different place (bugzilla or another mail thread for
> example). tia!

We know where the bug is, but there's no fix for it. There was one patch, but
it was quickly withdrawn.

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2017-03-21 14:13:06

by Pavel Machek

[permalink] [raw]
Subject: Re: [regression] Re: 4.11-rc0, thinkpad x220: GPU hang

Hi!

> > > > > > mplayer stopped working after a while. Dmesg says:
> > > > > >
> > > > > > [ 3000.266533] cdc_ether 2-1.2:1.0 usb0: register 'cdc_ether' at
> > > >
> > > > Now I'm pretty sure it is a regression in v4.11-rc0. Any ideas what to
> > > > try? Bisect will be slow and nasty :-(.
> > >
> > > I came the conclusion that #99671 is the ring HEAD overtaking the TAIL,
> > > and under the presumption that your bug matches (as the symptoms do):
> > >
> > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > index 4ffa35faff49..62e31a7438ac 100644
> > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > @@ -782,10 +782,10 @@ static void i9xx_submit_request(struct drm_i915_gem_request *request)
> > > {
> > > struct drm_i915_private *dev_priv = request->i915;
> > >
> > > - i915_gem_request_submit(request);
> > > -
> > > GEM_BUG_ON(!IS_ALIGNED(request->tail, 8));
> > > I915_WRITE_TAIL(request->engine, request->tail);
> > > +
> > > + i915_gem_request_submit(request);
> > > }
> > >
> > > static void i9xx_emit_breadcrumb(struct drm_i915_gem_request *req, u32 *cs)
> >
> > I applied it as:
> >
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 91bc4ab..9c49c7a 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -1338,9 +1338,9 @@ static void i9xx_submit_request(struct drm_i915_gem_request *request)
> > {
> > struct drm_i915_private *dev_priv = request->i915;
> >
> > - i915_gem_request_submit(request);
> > -
> > I915_WRITE_TAIL(request->engine, request->tail);
> > +
> > + i915_gem_request_submit(request);
> > }
> >
> > static void i9xx_emit_breadcrumb(struct drm_i915_gem_request *req,
> >
> > Hmm. But your next mail suggest that it may not be smart to try to
> > boot it? :-).
>
> Don't bother, it'll promptly hang.

Any news here?

Is there something I can revert to get back to working system?

Thanks,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (2.21 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments

2017-03-25 21:33:53

by Pavel Machek

[permalink] [raw]
Subject: Re: [regression] Re: 4.11-rc0, thinkpad x220: GPU hang

On Mon 2017-03-06 12:23:41, Chris Wilson wrote:
> On Mon, Mar 06, 2017 at 01:10:48PM +0100, Pavel Machek wrote:
> > On Mon 2017-03-06 11:15:28, Chris Wilson wrote:
> > > On Mon, Mar 06, 2017 at 12:01:51AM +0100, Pavel Machek wrote:
> > > > Hi!
> > > >
> > > > > > mplayer stopped working after a while. Dmesg says:
> > > > > >
> > > > > > [ 3000.266533] cdc_ether 2-1.2:1.0 usb0: register 'cdc_ether' at
> > > >
> > > > Now I'm pretty sure it is a regression in v4.11-rc0. Any ideas what to
> > > > try? Bisect will be slow and nasty :-(.
> > >
> > > I came the conclusion that #99671 is the ring HEAD overtaking the TAIL,
> > > and under the presumption that your bug matches (as the symptoms do):
> > >
...
> > static void i9xx_emit_breadcrumb(struct drm_i915_gem_request *req,
> >
> > Hmm. But your next mail suggest that it may not be smart to try to
> > boot it? :-).
>
> Don't bother, it'll promptly hang.

Any news here? Is there chance this is fixed in -rc4?
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (1.10 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments

2017-04-09 10:33:35

by Pavel Machek

[permalink] [raw]
Subject: Re: [regression] Re: 4.11-rc0, thinkpad x220: GPU hang

On Mon 2017-03-06 12:23:41, Chris Wilson wrote:
> On Mon, Mar 06, 2017 at 01:10:48PM +0100, Pavel Machek wrote:
> > On Mon 2017-03-06 11:15:28, Chris Wilson wrote:
> > > On Mon, Mar 06, 2017 at 12:01:51AM +0100, Pavel Machek wrote:
> > > > Hi!
> > > >
> > > > > > mplayer stopped working after a while. Dmesg says:
> > > > > >
> > > > > > [ 3000.266533] cdc_ether 2-1.2:1.0 usb0: register 'cdc_ether' at
> > > >
> > > > Now I'm pretty sure it is a regression in v4.11-rc0. Any ideas what to
> > > > try? Bisect will be slow and nasty :-(.
> > >
> > > I came the conclusion that #99671 is the ring HEAD overtaking the TAIL,
> > > and under the presumption that your bug matches (as the symptoms do):
> > >
> > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > index 4ffa35faff49..62e31a7438ac 100644
> > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > @@ -782,10 +782,10 @@ static void i9xx_submit_request(struct drm_i915_gem_request *request)
> > > {
> > > struct drm_i915_private *dev_priv = request->i915;
> > >
> > > - i915_gem_request_submit(request);
> > > -
> > > GEM_BUG_ON(!IS_ALIGNED(request->tail, 8));
> > > I915_WRITE_TAIL(request->engine, request->tail);
> > > +
> > > + i915_gem_request_submit(request);
> > > }
> > >
> > > static void i9xx_emit_breadcrumb(struct drm_i915_gem_request *req, u32 *cs)
> >
> > I applied it as:
> >
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 91bc4ab..9c49c7a 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -1338,9 +1338,9 @@ static void i9xx_submit_request(struct drm_i915_gem_request *request)
> > {
> > struct drm_i915_private *dev_priv = request->i915;
> >
> > - i915_gem_request_submit(request);
> > -
> > I915_WRITE_TAIL(request->engine, request->tail);
> > +
> > + i915_gem_request_submit(request);
> > }
> >
> > static void i9xx_emit_breadcrumb(struct drm_i915_gem_request *req,
> >
> > Hmm. But your next mail suggest that it may not be smart to try to
> > boot it? :-).
>
> Don't bother, it'll promptly hang.

Any news here? 4.11-rc5 is actually usable on the hardware (unlike
-rc1), not sure what changed.

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (2.45 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments