2017-06-09 20:25:16

by Takashi Iwai

[permalink] [raw]
Subject: [4.4.70 REGRESSION] Nouveau hangs up at boot

Hi,

we've received a bug report about 4.4.70 kernel showing the hang up at
boot. And, this turned out to be a regression in nouveau driver:
https://bugzilla.suse.com/show_bug.cgi?id=1043467

I provided a test kernel reverting the last five commits about
nouveau below, and it was confirmed to work. But still not figured
out which one actually breaks.

e4add1cf6b4154804350c3385c6d447cff3570de
drm/nouveau/tmr: handle races with hw when updating the next alarm time
commit 1b0f84380b10ee97f7d2dd191294de9017e94d1d upstream.

9d78e40f5f41ad1db1849f8d15acbda99d0871b4
drm/nouveau/tmr: avoid processing completed alarms when adding a new one
commit 330bdf62fe6a6c5b99a647f7bf7157107c9348b3 upstream.

5e07724c28f4e06fe42dd5b58bb6f9dd56510567
drm/nouveau/tmr: fix corruption of the pending list when rescheduling an alarm
commit 9fc64667ee48c9a25e7dca1a6bcb6906fec5bcc5 upstream.

27f82df2f02688c51d2c1d9f624cc0c5b8a62661
drm/nouveau/tmr: ack interrupt before processing alarms
commit 3733bd8b407211739e72d051e5f30ad82a52c4bc upstream.

3819271d8a5f4c6e0c8f71c339e44e2efbe40710
drm/nouveau/therm: remove ineffective workarounds for alarm bugs
commit e4311ee51d1e2676001b2d8fcefd92bdd79aad85 upstream.


Ben, is this a known problem? Or is there any fixup?
The kernel back trace found in the bugzilla report shows the issue in
nvkm_timer_alarm_trigger(), at least.


thanks,

Takashi


2017-06-12 22:40:31

by Ben Skeggs

[permalink] [raw]
Subject: Re: [4.4.70 REGRESSION] Nouveau hangs up at boot

On 06/10/2017 06:25 AM, Takashi Iwai wrote:
> Hi,
>
> we've received a bug report about 4.4.70 kernel showing the hang up at
> boot. And, this turned out to be a regression in nouveau driver:
> https://bugzilla.suse.com/show_bug.cgi?id=1043467
>
> I provided a test kernel reverting the last five commits about
> nouveau below, and it was confirmed to work. But still not figured
> out which one actually breaks.
>
> e4add1cf6b4154804350c3385c6d447cff3570de
> drm/nouveau/tmr: handle races with hw when updating the next alarm time
> commit 1b0f84380b10ee97f7d2dd191294de9017e94d1d upstream.
>
> 9d78e40f5f41ad1db1849f8d15acbda99d0871b4
> drm/nouveau/tmr: avoid processing completed alarms when adding a new one
> commit 330bdf62fe6a6c5b99a647f7bf7157107c9348b3 upstream.
>
> 5e07724c28f4e06fe42dd5b58bb6f9dd56510567
> drm/nouveau/tmr: fix corruption of the pending list when rescheduling an alarm
> commit 9fc64667ee48c9a25e7dca1a6bcb6906fec5bcc5 upstream.
>
> 27f82df2f02688c51d2c1d9f624cc0c5b8a62661
> drm/nouveau/tmr: ack interrupt before processing alarms
> commit 3733bd8b407211739e72d051e5f30ad82a52c4bc upstream.
>
> 3819271d8a5f4c6e0c8f71c339e44e2efbe40710
> drm/nouveau/therm: remove ineffective workarounds for alarm bugs
> commit e4311ee51d1e2676001b2d8fcefd92bdd79aad85 upstream.
>
>
> Ben, is this a known problem? Or is there any fixup?
> The kernel back trace found in the bugzilla report shows the issue in
> nvkm_timer_alarm_trigger(), at least.
>
A fix (b4e382ca7586a63b6c1e5221ce0863ff867c2df6) has been submitted already.

Sorry for the trouble!
Ben.

>
> thanks,
>
> Takashi
>


Attachments:
signature.asc (833.00 B)
OpenPGP digital signature

2017-06-13 06:08:22

by Takashi Iwai

[permalink] [raw]
Subject: Re: [4.4.70 REGRESSION] Nouveau hangs up at boot

On Tue, 13 Jun 2017 00:40:26 +0200,
Ben Skeggs wrote:
>
> On 06/10/2017 06:25 AM, Takashi Iwai wrote:
> > Hi,
> >
> > we've received a bug report about 4.4.70 kernel showing the hang up at
> > boot. And, this turned out to be a regression in nouveau driver:
> > https://bugzilla.suse.com/show_bug.cgi?id=1043467
> >
> > I provided a test kernel reverting the last five commits about
> > nouveau below, and it was confirmed to work. But still not figured
> > out which one actually breaks.
> >
> > e4add1cf6b4154804350c3385c6d447cff3570de
> > drm/nouveau/tmr: handle races with hw when updating the next alarm time
> > commit 1b0f84380b10ee97f7d2dd191294de9017e94d1d upstream.
> >
> > 9d78e40f5f41ad1db1849f8d15acbda99d0871b4
> > drm/nouveau/tmr: avoid processing completed alarms when adding a new one
> > commit 330bdf62fe6a6c5b99a647f7bf7157107c9348b3 upstream.
> >
> > 5e07724c28f4e06fe42dd5b58bb6f9dd56510567
> > drm/nouveau/tmr: fix corruption of the pending list when rescheduling an alarm
> > commit 9fc64667ee48c9a25e7dca1a6bcb6906fec5bcc5 upstream.
> >
> > 27f82df2f02688c51d2c1d9f624cc0c5b8a62661
> > drm/nouveau/tmr: ack interrupt before processing alarms
> > commit 3733bd8b407211739e72d051e5f30ad82a52c4bc upstream.
> >
> > 3819271d8a5f4c6e0c8f71c339e44e2efbe40710
> > drm/nouveau/therm: remove ineffective workarounds for alarm bugs
> > commit e4311ee51d1e2676001b2d8fcefd92bdd79aad85 upstream.
> >
> >
> > Ben, is this a known problem? Or is there any fixup?
> > The kernel back trace found in the bugzilla report shows the issue in
> > nvkm_timer_alarm_trigger(), at least.
> >
> A fix (b4e382ca7586a63b6c1e5221ce0863ff867c2df6) has been submitted already.
>
> Sorry for the trouble!
> Ben.

Hrm, the commit doesn't apply to 4.4.x kernel properly.

Could you cook up a 4.4.x fix? Then I'll prepare a test kernel
package for Luigi, so that he can test quickly.


thanks,

Takashi

2017-06-13 13:32:26

by Takashi Iwai

[permalink] [raw]
Subject: Re: [4.4.70 REGRESSION] Nouveau hangs up at boot

On Tue, 13 Jun 2017 08:08:17 +0200,
Takashi Iwai wrote:
>
> On Tue, 13 Jun 2017 00:40:26 +0200,
> Ben Skeggs wrote:
> >
> > On 06/10/2017 06:25 AM, Takashi Iwai wrote:
> > > Hi,
> > >
> > > we've received a bug report about 4.4.70 kernel showing the hang up at
> > > boot. And, this turned out to be a regression in nouveau driver:
> > > https://bugzilla.suse.com/show_bug.cgi?id=1043467
> > >
> > > I provided a test kernel reverting the last five commits about
> > > nouveau below, and it was confirmed to work. But still not figured
> > > out which one actually breaks.
> > >
> > > e4add1cf6b4154804350c3385c6d447cff3570de
> > > drm/nouveau/tmr: handle races with hw when updating the next alarm time
> > > commit 1b0f84380b10ee97f7d2dd191294de9017e94d1d upstream.
> > >
> > > 9d78e40f5f41ad1db1849f8d15acbda99d0871b4
> > > drm/nouveau/tmr: avoid processing completed alarms when adding a new one
> > > commit 330bdf62fe6a6c5b99a647f7bf7157107c9348b3 upstream.
> > >
> > > 5e07724c28f4e06fe42dd5b58bb6f9dd56510567
> > > drm/nouveau/tmr: fix corruption of the pending list when rescheduling an alarm
> > > commit 9fc64667ee48c9a25e7dca1a6bcb6906fec5bcc5 upstream.
> > >
> > > 27f82df2f02688c51d2c1d9f624cc0c5b8a62661
> > > drm/nouveau/tmr: ack interrupt before processing alarms
> > > commit 3733bd8b407211739e72d051e5f30ad82a52c4bc upstream.
> > >
> > > 3819271d8a5f4c6e0c8f71c339e44e2efbe40710
> > > drm/nouveau/therm: remove ineffective workarounds for alarm bugs
> > > commit e4311ee51d1e2676001b2d8fcefd92bdd79aad85 upstream.
> > >
> > >
> > > Ben, is this a known problem? Or is there any fixup?
> > > The kernel back trace found in the bugzilla report shows the issue in
> > > nvkm_timer_alarm_trigger(), at least.
> > >
> > A fix (b4e382ca7586a63b6c1e5221ce0863ff867c2df6) has been submitted already.
> >
> > Sorry for the trouble!
> > Ben.
>
> Hrm, the commit doesn't apply to 4.4.x kernel properly.

My bad, it *does* apply. I must have looked at a wrong commit, sorry
for the noise!

> Could you cook up a 4.4.x fix? Then I'll prepare a test kernel
> package for Luigi, so that he can test quickly.

Luigi, a new test kernel is being built in OBS home:tiwai:bnc1043467-2
repo. Please give it a try.


thanks,

Takashi

2017-06-13 14:42:35

by Luigi Baldoni

[permalink] [raw]
Subject: Re: [4.4.70 REGRESSION] Nouveau hangs up at boot

Sent: Tuesday, June 13, 2017 at 3:32 PM
From: "Takashi Iwai" <[email protected]>
> Subject: Re: [4.4.70 REGRESSION] Nouveau hangs up at boot
>
> On Tue, 13 Jun 2017 08:08:17 +0200,
> Takashi Iwai wrote:
> >
> > On Tue, 13 Jun 2017 00:40:26 +0200,
> > Ben Skeggs wrote:
> > >
> > > On 06/10/2017 06:25 AM, Takashi Iwai wrote:
> > > > Hi,
> > > >
> > > > we've received a bug report about 4.4.70 kernel showing the hang up at
> > > > boot. And, this turned out to be a regression in nouveau driver:
> > > > https://bugzilla.suse.com/show_bug.cgi?id=1043467
> > > >
> > > > I provided a test kernel reverting the last five commits about
> > > > nouveau below, and it was confirmed to work. But still not figured
> > > > out which one actually breaks.
> > > >
> > > > e4add1cf6b4154804350c3385c6d447cff3570de
> > > > drm/nouveau/tmr: handle races with hw when updating the next alarm time
> > > > commit 1b0f84380b10ee97f7d2dd191294de9017e94d1d upstream.
> > > >
> > > > 9d78e40f5f41ad1db1849f8d15acbda99d0871b4
> > > > drm/nouveau/tmr: avoid processing completed alarms when adding a new one
> > > > commit 330bdf62fe6a6c5b99a647f7bf7157107c9348b3 upstream.
> > > >
> > > > 5e07724c28f4e06fe42dd5b58bb6f9dd56510567
> > > > drm/nouveau/tmr: fix corruption of the pending list when rescheduling an alarm
> > > > commit 9fc64667ee48c9a25e7dca1a6bcb6906fec5bcc5 upstream.
> > > >
> > > > 27f82df2f02688c51d2c1d9f624cc0c5b8a62661
> > > > drm/nouveau/tmr: ack interrupt before processing alarms
> > > > commit 3733bd8b407211739e72d051e5f30ad82a52c4bc upstream.
> > > >
> > > > 3819271d8a5f4c6e0c8f71c339e44e2efbe40710
> > > > drm/nouveau/therm: remove ineffective workarounds for alarm bugs
> > > > commit e4311ee51d1e2676001b2d8fcefd92bdd79aad85 upstream.
> > > >
> > > >
> > > > Ben, is this a known problem? Or is there any fixup?
> > > > The kernel back trace found in the bugzilla report shows the issue in
> > > > nvkm_timer_alarm_trigger(), at least.
> > > >
> > > A fix (b4e382ca7586a63b6c1e5221ce0863ff867c2df6) has been submitted already.
> > >
> > > Sorry for the trouble!
> > > Ben.
> >
> > Hrm, the commit doesn't apply to 4.4.x kernel properly.
>
> My bad, it *does* apply. I must have looked at a wrong commit, sorry
> for the noise!
>
> > Could you cook up a 4.4.x fix? Then I'll prepare a test kernel
> > package for Luigi, so that he can test quickly.
>
> Luigi, a new test kernel is being built in OBS home:tiwai:bnc1043467-2
> repo. Please give it a try.

4.4.71-2.ge1e822f-default works for me.

Regards

2017-06-15 06:39:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [4.4.70 REGRESSION] Nouveau hangs up at boot

On Tue, Jun 13, 2017 at 03:32:22PM +0200, Takashi Iwai wrote:
> On Tue, 13 Jun 2017 08:08:17 +0200,
> Takashi Iwai wrote:
> >
> > On Tue, 13 Jun 2017 00:40:26 +0200,
> > Ben Skeggs wrote:
> > >
> > > On 06/10/2017 06:25 AM, Takashi Iwai wrote:
> > > > Hi,
> > > >
> > > > we've received a bug report about 4.4.70 kernel showing the hang up at
> > > > boot. And, this turned out to be a regression in nouveau driver:
> > > > https://bugzilla.suse.com/show_bug.cgi?id=1043467
> > > >
> > > > I provided a test kernel reverting the last five commits about
> > > > nouveau below, and it was confirmed to work. But still not figured
> > > > out which one actually breaks.
> > > >
> > > > e4add1cf6b4154804350c3385c6d447cff3570de
> > > > drm/nouveau/tmr: handle races with hw when updating the next alarm time
> > > > commit 1b0f84380b10ee97f7d2dd191294de9017e94d1d upstream.
> > > >
> > > > 9d78e40f5f41ad1db1849f8d15acbda99d0871b4
> > > > drm/nouveau/tmr: avoid processing completed alarms when adding a new one
> > > > commit 330bdf62fe6a6c5b99a647f7bf7157107c9348b3 upstream.
> > > >
> > > > 5e07724c28f4e06fe42dd5b58bb6f9dd56510567
> > > > drm/nouveau/tmr: fix corruption of the pending list when rescheduling an alarm
> > > > commit 9fc64667ee48c9a25e7dca1a6bcb6906fec5bcc5 upstream.
> > > >
> > > > 27f82df2f02688c51d2c1d9f624cc0c5b8a62661
> > > > drm/nouveau/tmr: ack interrupt before processing alarms
> > > > commit 3733bd8b407211739e72d051e5f30ad82a52c4bc upstream.
> > > >
> > > > 3819271d8a5f4c6e0c8f71c339e44e2efbe40710
> > > > drm/nouveau/therm: remove ineffective workarounds for alarm bugs
> > > > commit e4311ee51d1e2676001b2d8fcefd92bdd79aad85 upstream.
> > > >
> > > >
> > > > Ben, is this a known problem? Or is there any fixup?
> > > > The kernel back trace found in the bugzilla report shows the issue in
> > > > nvkm_timer_alarm_trigger(), at least.
> > > >
> > > A fix (b4e382ca7586a63b6c1e5221ce0863ff867c2df6) has been submitted already.
> > >
> > > Sorry for the trouble!
> > > Ben.
> >
> > Hrm, the commit doesn't apply to 4.4.x kernel properly.
>
> My bad, it *does* apply. I must have looked at a wrong commit, sorry
> for the noise!

Great, that means this is fixed in 4.4.72.

thanks,

greg k-h