2014-01-01 14:04:47

by Sid Boyce

[permalink] [raw]
Subject: Re: Possible 3.13-rc nouveau regression with GT 560 Ti

On 01/01/14 00:55, Ilia Mirkin wrote:
> On Tue, Dec 31, 2013 at 7:41 PM, Sid Boyce <[email protected]> wrote:
>> On 31/12/13 10:36, Ilia Mirkin wrote:
>>> On Tue, Dec 31, 2013 at 5:14 AM, Sid Boyce <[email protected]>
>>> wrote:
>>>> System x86_64 with openSUSE 13.1.
>>>> X.Org version: 1.14.99.905
>>>>
>>>> openSUSE 12.2 kernels boot successfully into a graphical screen, login to
>>>> KDE4, etc. all normal.
>>>>
>>>> 3.13-rc kernels boot fully with X running but no graphical screen and it
>>>> freezes in VC with not all the startup messages displayed but I could ssh
>>>> in
>>>> from another box to check dmesg and logs.
>>>>
>>>> Xorg.0.log which I thought I had saved did not log an error.
>>>>
>>>> dmesg said "nouveau Playlist update failed".
>>>>
>>>> Changed the GeForce GT 560 Ti for a GeForce 8600 GT and 3.13.0-rc6 is up
>>>> and
>>>> running.
>>>>
>>>> If necessary I can go back to the GT 560 Ti to gather dmesg and Xorg log
>>>> information.
>>> Having a dmesg would be nice. One thing I can think of off-hand is
>>> that 3.13-rc has MSI turned on by default. You can turn it off by
>>> adding "nouveau.config=NvMSI=0" to your kernel cmdline. If that
>>> doesn't help, a bisect restricted to drivers/gpu/drm/nouveau should
>>> show the offending commit fairly quickly.
>>>
>>> -ilia
>>>
>> Adding "nouveau.config=NvMSI=0" to the command line fixed the problem.
>> So it looks like commit 049ffa8ab33a63b3bff672d1a0ee6a35ad253fe8 introduced
>> it.
> Any chance you might mmiotrace the blob (version 325 or later) to see
> which registers it fiddles with? Or alternatively, if you have a NVCE
> card (you never did end up providing the logs which would have made
> that apparent), could you try replacing nvc3_mc_oclass with
> nvc0_mc_oclass for the 0xce case in
> drivers/gpu/drm/nouveau/core/engine/device/nvc0.c? (and boot without
> the MSI disabling.) The switch has already been made for NVC8 in
> 0bae1d61c75 -- perhaps there are more "odd" ones.
>
> -ilia
>
Fails exactly the same.
case 0xc3:
device->cname = "GF106";
device->oclass[NVDEV_SUBDEV_VBIOS ] =
&nouveau_bios_oclass;
device->oclass[NVDEV_SUBDEV_GPIO ] = &nv50_gpio_oclass;
device->oclass[NVDEV_SUBDEV_I2C ] = &nv94_i2c_oclass;
device->oclass[NVDEV_SUBDEV_CLOCK ] = &nvc0_clock_oclass;
device->oclass[NVDEV_SUBDEV_THERM ] = &nva3_therm_oclass;
device->oclass[NVDEV_SUBDEV_MXM ] = &nv50_mxm_oclass;
device->oclass[NVDEV_SUBDEV_DEVINIT] =
&nvc0_devinit_oclass;
device->oclass[NVDEV_SUBDEV_MC ] =
nvc0_mc_oclass; <<<<<====
device->oclass[NVDEV_SUBDEV_BUS ] = nvc0_bus_oclass;
device->oclass[NVDEV_SUBDEV_TIMER ] = &nv04_timer_oclass;

The dmesg and Xorg.0.log with the problem captured across a ssh link.

# ps fax|grep X
5633 pts/0 S+ 0:00 \_ grep --color=auto X
5160 tty7 Ss+ 0:08 \_ /usr/bin/Xorg -br :0 vt7 -nolisten tcp
-auth /var/lib/kdm/AuthFiles/A:0-yqspza

Also
# echo mmiotrace > /sys/kernel/debug/tracing/current_tracer
-bash: echo: write error: Invalid argument.

# ls -l /sys/kernel/debug/tracing/current_tracer
-rw-r--r-- 1 root root 0 Jan 1 14:00
/sys/kernel/debug/tracing/current_tracer
Regards
Sid.

--
Sid Boyce ... Hamradio License G3VBV, Licensed Private Pilot
Emeritus IBM/Amdahl Mainframes and Sun/Fujitsu Servers Tech Support
Senior Staff Specialist, Cricket Coach
Microsoft Windows Free Zone - Linux used for all Computing Tasks


Attachments:
dmesg.gz (19.55 kB)
Xorg.0.log (39.27 kB)
Download all attachments

2014-01-01 18:46:20

by Ilia Mirkin

[permalink] [raw]
Subject: Re: Possible 3.13-rc nouveau regression with GT 560 Ti

On Wed, Jan 1, 2014 at 9:04 AM, Sid Boyce <[email protected]> wrote:
> On 01/01/14 00:55, Ilia Mirkin wrote:
>>
>> On Tue, Dec 31, 2013 at 7:41 PM, Sid Boyce <[email protected]>
>> wrote:
>>>
>>> On 31/12/13 10:36, Ilia Mirkin wrote:
>>>> Having a dmesg would be nice. One thing I can think of off-hand is
>>>> that 3.13-rc has MSI turned on by default. You can turn it off by
>>>> adding "nouveau.config=NvMSI=0" to your kernel cmdline. If that
>>>> doesn't help, a bisect restricted to drivers/gpu/drm/nouveau should
>>>> show the offending commit fairly quickly.
>>>>
>>>> -ilia
>>>>
>>> Adding "nouveau.config=NvMSI=0" to the command line fixed the problem.
>>> So it looks like commit 049ffa8ab33a63b3bff672d1a0ee6a35ad253fe8
>>> introduced
>>> it.
>>
>> Any chance you might mmiotrace the blob (version 325 or later) to see
>> which registers it fiddles with? Or alternatively, if you have a NVCE
>> card (you never did end up providing the logs which would have made
>> that apparent), could you try replacing nvc3_mc_oclass with
>> nvc0_mc_oclass for the 0xce case in
>> drivers/gpu/drm/nouveau/core/engine/device/nvc0.c? (and boot without
>> the MSI disabling.) The switch has already been made for NVC8 in
>> 0bae1d61c75 -- perhaps there are more "odd" ones.
>>
>> -ilia
>>
> Fails exactly the same.
> case 0xc3:
> device->cname = "GF106";
>
> device->oclass[NVDEV_SUBDEV_VBIOS ] = &nouveau_bios_oclass;
> device->oclass[NVDEV_SUBDEV_GPIO ] = &nv50_gpio_oclass;
> device->oclass[NVDEV_SUBDEV_I2C ] = &nv94_i2c_oclass;
> device->oclass[NVDEV_SUBDEV_CLOCK ] = &nvc0_clock_oclass;
> device->oclass[NVDEV_SUBDEV_THERM ] = &nva3_therm_oclass;
> device->oclass[NVDEV_SUBDEV_MXM ] = &nv50_mxm_oclass;
> device->oclass[NVDEV_SUBDEV_DEVINIT] = &nvc0_devinit_oclass;
> device->oclass[NVDEV_SUBDEV_MC ] = nvc0_mc_oclass;
> <<<<<====
> device->oclass[NVDEV_SUBDEV_BUS ] = nvc0_bus_oclass;
> device->oclass[NVDEV_SUBDEV_TIMER ] = &nv04_timer_oclass;

That's the 0xc3 case... you have a nvce card, not nvc3 -- you would
need to change the NVDEV_SUBDEV_MC line to nvc0_mc_oclass for the 0xce
case.

>
> The dmesg and Xorg.0.log with the problem captured across a ssh link.
>
> # ps fax|grep X
> 5633 pts/0 S+ 0:00 \_ grep --color=auto X
> 5160 tty7 Ss+ 0:08 \_ /usr/bin/Xorg -br :0 vt7 -nolisten tcp -auth
> /var/lib/kdm/AuthFiles/A:0-yqspza
>
> Also
> # echo mmiotrace > /sys/kernel/debug/tracing/current_tracer
> -bash: echo: write error: Invalid argument.

Take a look at https://wiki.ubuntu.com/X/MMIOTracing

-ilia

2014-01-02 02:36:12

by Sid Boyce

[permalink] [raw]
Subject: Re: Possible 3.13-rc nouveau regression with GT 560 Ti

On 01/01/14 18:46, Ilia Mirkin wrote:
> On Wed, Jan 1, 2014 at 9:04 AM, Sid Boyce <[email protected]> wrote:
>> On 01/01/14 00:55, Ilia Mirkin wrote:
>>> On Tue, Dec 31, 2013 at 7:41 PM, Sid Boyce <[email protected]>
>>> wrote:
>>>> On 31/12/13 10:36, Ilia Mirkin wrote:
>>>>> Having a dmesg would be nice. One thing I can think of off-hand is
>>>>> that 3.13-rc has MSI turned on by default. You can turn it off by
>>>>> adding "nouveau.config=NvMSI=0" to your kernel cmdline. If that
>>>>> doesn't help, a bisect restricted to drivers/gpu/drm/nouveau should
>>>>> show the offending commit fairly quickly.
>>>>>
>>>>> -ilia
>>>>>
>>>> Adding "nouveau.config=NvMSI=0" to the command line fixed the problem.
>>>> So it looks like commit 049ffa8ab33a63b3bff672d1a0ee6a35ad253fe8
>>>> introduced
>>>> it.
>>> Any chance you might mmiotrace the blob (version 325 or later) to see
>>> which registers it fiddles with? Or alternatively, if you have a NVCE
>>> card (you never did end up providing the logs which would have made
>>> that apparent), could you try replacing nvc3_mc_oclass with
>>> nvc0_mc_oclass for the 0xce case in
>>> drivers/gpu/drm/nouveau/core/engine/device/nvc0.c? (and boot without
>>> the MSI disabling.) The switch has already been made for NVC8 in
>>> 0bae1d61c75 -- perhaps there are more "odd" ones.
>>>
>>> -ilia
>>>
>> Fails exactly the same.
>> case 0xc3:
>> device->cname = "GF106";
>>
>> device->oclass[NVDEV_SUBDEV_VBIOS ] = &nouveau_bios_oclass;
>> device->oclass[NVDEV_SUBDEV_GPIO ] = &nv50_gpio_oclass;
>> device->oclass[NVDEV_SUBDEV_I2C ] = &nv94_i2c_oclass;
>> device->oclass[NVDEV_SUBDEV_CLOCK ] = &nvc0_clock_oclass;
>> device->oclass[NVDEV_SUBDEV_THERM ] = &nva3_therm_oclass;
>> device->oclass[NVDEV_SUBDEV_MXM ] = &nv50_mxm_oclass;
>> device->oclass[NVDEV_SUBDEV_DEVINIT] = &nvc0_devinit_oclass;
>> device->oclass[NVDEV_SUBDEV_MC ] = nvc0_mc_oclass;
>> <<<<<====
>> device->oclass[NVDEV_SUBDEV_BUS ] = nvc0_bus_oclass;
>> device->oclass[NVDEV_SUBDEV_TIMER ] = &nv04_timer_oclass;
> That's the 0xc3 case... you have a nvce card, not nvc3 -- you would
> need to change the NVDEV_SUBDEV_MC line to nvc0_mc_oclass for the 0xce
> case.
>
>> The dmesg and Xorg.0.log with the problem captured across a ssh link.
>>
>> # ps fax|grep X
>> 5633 pts/0 S+ 0:00 \_ grep --color=auto X
>> 5160 tty7 Ss+ 0:08 \_ /usr/bin/Xorg -br :0 vt7 -nolisten tcp -auth
>> /var/lib/kdm/AuthFiles/A:0-yqspza
>>
>> Also
>> # echo mmiotrace > /sys/kernel/debug/tracing/current_tracer
>> -bash: echo: write error: Invalid argument.
> Take a look at https://wiki.ubuntu.com/X/MMIOTracing
>
> -ilia
>
Of course it's a GF114.
Made the change and it boots without the command line change.
Regards
Sid.

--
Sid Boyce ... Hamradio License G3VBV, Licensed Private Pilot
Emeritus IBM/Amdahl Mainframes and Sun/Fujitsu Servers Tech Support
Senior Staff Specialist, Cricket Coach
Microsoft Windows Free Zone - Linux used for all Computing Tasks

2014-01-02 02:40:21

by Ilia Mirkin

[permalink] [raw]
Subject: Re: Possible 3.13-rc nouveau regression with GT 560 Ti

On Wed, Jan 1, 2014 at 9:36 PM, Sid Boyce <[email protected]> wrote:
> On 01/01/14 18:46, Ilia Mirkin wrote:
>>
>> On Wed, Jan 1, 2014 at 9:04 AM, Sid Boyce <[email protected]> wrote:
>>>
>>> On 01/01/14 00:55, Ilia Mirkin wrote:
>>>>
>>>> On Tue, Dec 31, 2013 at 7:41 PM, Sid Boyce <[email protected]>
>>>> wrote:
>>>>>
>>>>> On 31/12/13 10:36, Ilia Mirkin wrote:
>>>>>>
>>>>>> Having a dmesg would be nice. One thing I can think of off-hand is
>>>>>> that 3.13-rc has MSI turned on by default. You can turn it off by
>>>>>> adding "nouveau.config=NvMSI=0" to your kernel cmdline. If that
>>>>>> doesn't help, a bisect restricted to drivers/gpu/drm/nouveau should
>>>>>> show the offending commit fairly quickly.
>>>>>>
>>>>>> -ilia
>>>>>>
>>>>> Adding "nouveau.config=NvMSI=0" to the command line fixed the problem.
>>>>> So it looks like commit 049ffa8ab33a63b3bff672d1a0ee6a35ad253fe8
>>>>> introduced
>>>>> it.
>>>>
>>>> Any chance you might mmiotrace the blob (version 325 or later) to see
>>>> which registers it fiddles with? Or alternatively, if you have a NVCE
>>>> card (you never did end up providing the logs which would have made
>>>> that apparent), could you try replacing nvc3_mc_oclass with
>>>> nvc0_mc_oclass for the 0xce case in
>>>> drivers/gpu/drm/nouveau/core/engine/device/nvc0.c? (and boot without
>>>> the MSI disabling.) The switch has already been made for NVC8 in
>>>> 0bae1d61c75 -- perhaps there are more "odd" ones.
>>>>
>>>> -ilia
>>>>
>>> Fails exactly the same.
>>> case 0xc3:
>>> device->cname = "GF106";
>>>
>>> device->oclass[NVDEV_SUBDEV_VBIOS ] =
>>> &nouveau_bios_oclass;
>>> device->oclass[NVDEV_SUBDEV_GPIO ] =
>>> &nv50_gpio_oclass;
>>> device->oclass[NVDEV_SUBDEV_I2C ] = &nv94_i2c_oclass;
>>> device->oclass[NVDEV_SUBDEV_CLOCK ] =
>>> &nvc0_clock_oclass;
>>> device->oclass[NVDEV_SUBDEV_THERM ] =
>>> &nva3_therm_oclass;
>>> device->oclass[NVDEV_SUBDEV_MXM ] = &nv50_mxm_oclass;
>>> device->oclass[NVDEV_SUBDEV_DEVINIT] =
>>> &nvc0_devinit_oclass;
>>> device->oclass[NVDEV_SUBDEV_MC ] = nvc0_mc_oclass;
>>> <<<<<====
>>> device->oclass[NVDEV_SUBDEV_BUS ] = nvc0_bus_oclass;
>>> device->oclass[NVDEV_SUBDEV_TIMER ] =
>>> &nv04_timer_oclass;
>>
>> That's the 0xc3 case... you have a nvce card, not nvc3 -- you would
>> need to change the NVDEV_SUBDEV_MC line to nvc0_mc_oclass for the 0xce
>> case.
>>
>>> The dmesg and Xorg.0.log with the problem captured across a ssh link.
>>>
>>> # ps fax|grep X
>>> 5633 pts/0 S+ 0:00 \_ grep --color=auto X
>>> 5160 tty7 Ss+ 0:08 \_ /usr/bin/Xorg -br :0 vt7 -nolisten tcp
>>> -auth
>>> /var/lib/kdm/AuthFiles/A:0-yqspza
>>>
>>> Also
>>> # echo mmiotrace > /sys/kernel/debug/tracing/current_tracer
>>> -bash: echo: write error: Invalid argument.
>>
>> Take a look at https://wiki.ubuntu.com/X/MMIOTracing
>>
>> -ilia
>>
> Of course it's a GF114.
> Made the change and it boots without the command line change.

Great! Care to send a patch?

-ilia

2014-01-02 02:59:51

by Sid Boyce

[permalink] [raw]
Subject: Re: Possible 3.13-rc nouveau regression with GT 560 Ti

On 02/01/14 02:40, Ilia Mirkin wrote:
> On Wed, Jan 1, 2014 at 9:36 PM, Sid Boyce <[email protected]> wrote:
>> On 01/01/14 18:46, Ilia Mirkin wrote:
>>> On Wed, Jan 1, 2014 at 9:04 AM, Sid Boyce <[email protected]> wrote:
>>>> On 01/01/14 00:55, Ilia Mirkin wrote:
>>>>> On Tue, Dec 31, 2013 at 7:41 PM, Sid Boyce <[email protected]>
>>>>> wrote:
>>>>>> On 31/12/13 10:36, Ilia Mirkin wrote:
>>>>>>> Having a dmesg would be nice. One thing I can think of off-hand is
>>>>>>> that 3.13-rc has MSI turned on by default. You can turn it off by
>>>>>>> adding "nouveau.config=NvMSI=0" to your kernel cmdline. If that
>>>>>>> doesn't help, a bisect restricted to drivers/gpu/drm/nouveau should
>>>>>>> show the offending commit fairly quickly.
>>>>>>>
>>>>>>> -ilia
>>>>>>>
>>>>>> Adding "nouveau.config=NvMSI=0" to the command line fixed the problem.
>>>>>> So it looks like commit 049ffa8ab33a63b3bff672d1a0ee6a35ad253fe8
>>>>>> introduced
>>>>>> it.
>>>>> Any chance you might mmiotrace the blob (version 325 or later) to see
>>>>> which registers it fiddles with? Or alternatively, if you have a NVCE
>>>>> card (you never did end up providing the logs which would have made
>>>>> that apparent), could you try replacing nvc3_mc_oclass with
>>>>> nvc0_mc_oclass for the 0xce case in
>>>>> drivers/gpu/drm/nouveau/core/engine/device/nvc0.c? (and boot without
>>>>> the MSI disabling.) The switch has already been made for NVC8 in
>>>>> 0bae1d61c75 -- perhaps there are more "odd" ones.
>>>>>
>>>>> -ilia
>>>>>
>>>> Fails exactly the same.
>>>> case 0xc3:
>>>> device->cname = "GF106";
>>>>
>>>> device->oclass[NVDEV_SUBDEV_VBIOS ] =
>>>> &nouveau_bios_oclass;
>>>> device->oclass[NVDEV_SUBDEV_GPIO ] =
>>>> &nv50_gpio_oclass;
>>>> device->oclass[NVDEV_SUBDEV_I2C ] = &nv94_i2c_oclass;
>>>> device->oclass[NVDEV_SUBDEV_CLOCK ] =
>>>> &nvc0_clock_oclass;
>>>> device->oclass[NVDEV_SUBDEV_THERM ] =
>>>> &nva3_therm_oclass;
>>>> device->oclass[NVDEV_SUBDEV_MXM ] = &nv50_mxm_oclass;
>>>> device->oclass[NVDEV_SUBDEV_DEVINIT] =
>>>> &nvc0_devinit_oclass;
>>>> device->oclass[NVDEV_SUBDEV_MC ] = nvc0_mc_oclass;
>>>> <<<<<====
>>>> device->oclass[NVDEV_SUBDEV_BUS ] = nvc0_bus_oclass;
>>>> device->oclass[NVDEV_SUBDEV_TIMER ] =
>>>> &nv04_timer_oclass;
>>> That's the 0xc3 case... you have a nvce card, not nvc3 -- you would
>>> need to change the NVDEV_SUBDEV_MC line to nvc0_mc_oclass for the 0xce
>>> case.
>>>
>>>> The dmesg and Xorg.0.log with the problem captured across a ssh link.
>>>>
>>>> # ps fax|grep X
>>>> 5633 pts/0 S+ 0:00 \_ grep --color=auto X
>>>> 5160 tty7 Ss+ 0:08 \_ /usr/bin/Xorg -br :0 vt7 -nolisten tcp
>>>> -auth
>>>> /var/lib/kdm/AuthFiles/A:0-yqspza
>>>>
>>>> Also
>>>> # echo mmiotrace > /sys/kernel/debug/tracing/current_tracer
>>>> -bash: echo: write error: Invalid argument.
>>> Take a look at https://wiki.ubuntu.com/X/MMIOTracing
>>>
>>> -ilia
>>>
>> Of course it's a GF114.
>> Made the change and it boots without the command line change.
> Great! Care to send a patch?
>
> -ilia
>
Here it is.
Regards
Sid.

--
Sid Boyce ... Hamradio License G3VBV, Licensed Private Pilot
Emeritus IBM/Amdahl Mainframes and Sun/Fujitsu Servers Tech Support
Senior Staff Specialist, Cricket Coach
Microsoft Windows Free Zone - Linux used for all Computing Tasks


Attachments:
GF114.diff (738.00 B)

2014-01-02 14:58:36

by Sid Boyce

[permalink] [raw]
Subject: [PATCH] Fix hang problem with GeForce GT 560 Ti and nouveau in 3.13-rc


--
Sid Boyce ... Hamradio License G3VBV, Licensed Private Pilot
Emeritus IBM/Amdahl Mainframes and Sun/Fujitsu Servers Tech Support
Senior Staff Specialist, Cricket Coach
Microsoft Windows Free Zone - Linux used for all Computing Tasks


Attachments:
GF114_2.diff (1.36 kB)