2019-06-14 02:50:55

by Sergey Senozhatsky

[permalink] [raw]
Subject: nouveau: DRM: GPU lockup - switching to software fbcon

5.2.0-rc4-next-20190613

dmesg

nouveau 0000:01:00.0: DRM: GPU lockup - switching to software fbcon
nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery
nouveau 0000:01:00.0: fifo: channel 5: killed
nouveau 0000:01:00.0: fifo: engine 6: scheduled for recovery
nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery
nouveau 0000:01:00.0: firefox[476]: channel 5 killed!
nouveau 0000:01:00.0: firefox[476]: failed to idle channel 5 [firefox[476]]

It lockups several times a day. Twice in just one hour today.
Can we fix this?

-ss


2019-06-19 05:09:21

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: nouveau: DRM: GPU lockup - switching to software fbcon

On (06/14/19 11:50), Sergey Senozhatsky wrote:
> dmesg
>
> nouveau 0000:01:00.0: DRM: GPU lockup - switching to software fbcon
> nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
> nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery
> nouveau 0000:01:00.0: fifo: channel 5: killed
> nouveau 0000:01:00.0: fifo: engine 6: scheduled for recovery
> nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery
> nouveau 0000:01:00.0: firefox[476]: channel 5 killed!
> nouveau 0000:01:00.0: firefox[476]: failed to idle channel 5 [firefox[476]]
>
> It lockups several times a day. Twice in just one hour today.
> Can we fix this?

Unusable

[10380.555859] ------------[ cut here ]------------
[10380.556923] nouveau 0000:01:00.0: timeout
[10380.557981] WARNING: CPU: 3 PID: 12845 at drivers/gpu/drm/nouveau/nvkm/engine/fifo/gk104.c:171 gk104_fifo_runlist_commit+0x11d/0x140
[10380.559079] Modules linked in: rndis_host cdc_ether usbnet mii mousedev hid_generic usbhid hid snd_hda_codec_realtek snd_hda_codec_generic r8169 snd_hda_intel realtek libphy snd_hda_codec snd_hda_core snd_pcm coretemp hwmon snd_timer snd i2c_i801 soundcore button xhci_pci xhci_hcd usbcore usb_common
[10380.560390] CPU: 3 PID: 12845 Comm: JS Helper Not tainted 5.2.0-rc5-next-20190617-dbg-00012-g45d135944f17-dirty #3438
[10380.560392] RIP: 0010:gk104_fifo_runlist_commit+0x11d/0x140
[10380.560393] Code: 24 08 48 8b 40 10 48 8b 78 10 4c 8b 77 50 4d 85 f6 74 34 e8 75 ea 06 00 4c 89 f2 48 c7 c7 ba b1 de a8 48 89 c6 e8 4e 8b c2 ff <0f> 0b 41 8b 45 50 85 c0 0f 85 33 1d 00 00 48 83 c4 30 5b 5d 41 5c
[10380.560393] RSP: 0018:ffff962383e179b8 EFLAGS: 00010296
[10380.560394] RAX: 000000000000001d RBX: ffff8b98cc9a7400 RCX: 0000000000000000
[10380.560394] RDX: ffff8b98ceae5218 RSI: ffff8b98cead6348 RDI: ffff8b98cead6348
[10380.560395] RBP: 0000000000002284 R08: ffff8b98cead6348 R09: 00000000000002d7
[10380.560395] R10: 0000000000000001 R11: 00000000ffffffff R12: 0000000000000000
[10380.560396] R13: ffff8b98cb0f9000 R14: ffff8b95c7594b20 R15: 0000000000000000
[10380.560396] FS: 00007fcff2aff700(0000) GS:ffff8b98ceac0000(0000) knlGS:0000000000000000
[10380.560397] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10380.560397] CR2: 000055d2e178b098 CR3: 00000001e3008006 CR4: 00000000001606e0
[10380.560398] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[10380.560398] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[10380.560398] Call Trace:
[10380.560401] gk104_fifo_runlist_update+0x19e/0x1c0
[10380.560403] gk104_fifo_gpfifo_fini+0x7d/0xa0
[10380.560404] nvkm_fifo_chan_fini+0x19/0x20
[10380.560406] nvkm_object_fini+0xbc/0x150
[10380.560408] nvkm_ioctl_del+0x2f/0x50
[10380.560409] nvkm_ioctl+0xdf/0x177
[10380.560410] nvif_object_fini+0x49/0x60
[10380.560412] nouveau_channel_del+0x89/0x110
[10380.560413] nouveau_abi16_chan_fini.isra.0+0xa0/0x110
[10380.560414] nouveau_abi16_fini+0x2d/0x60
[10380.560416] nouveau_drm_postclose+0x4c/0xe0
[10380.560418] drm_file_free.part.0+0x1e0/0x290
[10380.560420] drm_release+0xa7/0xe0
[10380.591300] __fput+0xc7/0x250
[10380.592291] task_work_run+0x90/0xc0
[10380.593271] do_exit+0x286/0xb10
[10380.594306] do_group_exit+0x33/0xa0
[10380.595333] get_signal+0x12d/0x7e0
[10380.596304] do_signal+0x23/0x590
[10380.597490] ? __bpf_prog_run64+0x40/0x40
[10380.598441] ? __seccomp_filter+0x7e/0x430
[10380.599503] ? __x64_sys_futex+0x12c/0x145
[10380.600477] exit_to_usermode_loop+0x5d/0x70
[10380.601447] do_syscall_64+0x21f/0x2e8
[10380.602420] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[10380.603352] RIP: 0033:0x7fd0025bdbac
[10380.604341] Code: Bad RIP value.
[10380.605264] RSP: 002b:00007fcff2afe590 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
[10380.606196] RAX: fffffffffffffe00 RBX: 00007fcff2b06608 RCX: 00007fd0025bdbac
[10380.607241] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007fcff2b06630
[10380.608241] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[10380.609172] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000001ecc
[10380.610097] R13: 00007fcff2b065b0 R14: 0000000000000000 R15: 00007fcff2b06630
[10380.611023] ---[ end trace 3a96e3448f4194de ]---
[10380.611946] nouveau 0000:01:00.0: fifo: runlist 0 update timeout
[10382.850861] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[10382.851777] nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery
[10382.852673] nouveau 0000:01:00.0: fifo: channel 5: killed
[10382.853560] nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery
[10382.854521] nouveau 0000:01:00.0: firefox[12157]: channel 5 killed!
[10395.612848] nouveau 0000:01:00.0: firefox[12157]: failed to idle channel 5 [firefox[12157]]

-ss

2019-06-19 05:20:50

by Ilia Mirkin

[permalink] [raw]
Subject: Re: nouveau: DRM: GPU lockup - switching to software fbcon

On Wed, Jun 19, 2019 at 1:08 AM Sergey Senozhatsky
<[email protected]> wrote:
>
> On (06/14/19 11:50), Sergey Senozhatsky wrote:
> > dmesg
> >
> > nouveau 0000:01:00.0: DRM: GPU lockup - switching to software fbcon
> > nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
> > nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery
> > nouveau 0000:01:00.0: fifo: channel 5: killed
> > nouveau 0000:01:00.0: fifo: engine 6: scheduled for recovery
> > nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery
> > nouveau 0000:01:00.0: firefox[476]: channel 5 killed!
> > nouveau 0000:01:00.0: firefox[476]: failed to idle channel 5 [firefox[476]]
> >
> > It lockups several times a day. Twice in just one hour today.
> > Can we fix this?
>
> Unusable

Are you using a GTX 660 by any chance? You've provided rather minimal
system info.

-ilia

2019-06-19 05:48:54

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: nouveau: DRM: GPU lockup - switching to software fbcon

On (06/19/19 01:20), Ilia Mirkin wrote:
> On Wed, Jun 19, 2019 at 1:08 AM Sergey Senozhatsky
> <[email protected]> wrote:
> >
> > On (06/14/19 11:50), Sergey Senozhatsky wrote:
> > > dmesg
> > >
> > > nouveau 0000:01:00.0: DRM: GPU lockup - switching to software fbcon
> > > nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
> > > nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery
> > > nouveau 0000:01:00.0: fifo: channel 5: killed
> > > nouveau 0000:01:00.0: fifo: engine 6: scheduled for recovery
> > > nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery
> > > nouveau 0000:01:00.0: firefox[476]: channel 5 killed!
> > > nouveau 0000:01:00.0: firefox[476]: failed to idle channel 5 [firefox[476]]
> > >
> > > It lockups several times a day. Twice in just one hour today.
> > > Can we fix this?
> >
> > Unusable
>
> Are you using a GTX 660 by any chance? You've provided rather minimal
> system info.

01:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 730] (rev a1)

-ss

2019-06-19 06:08:04

by Ilia Mirkin

[permalink] [raw]
Subject: Re: nouveau: DRM: GPU lockup - switching to software fbcon

On Wed, Jun 19, 2019 at 1:48 AM Sergey Senozhatsky
<[email protected]> wrote:
>
> On (06/19/19 01:20), Ilia Mirkin wrote:
> > On Wed, Jun 19, 2019 at 1:08 AM Sergey Senozhatsky
> > <[email protected]> wrote:
> > >
> > > On (06/14/19 11:50), Sergey Senozhatsky wrote:
> > > > dmesg
> > > >
> > > > nouveau 0000:01:00.0: DRM: GPU lockup - switching to software fbcon
> > > > nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
> > > > nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery
> > > > nouveau 0000:01:00.0: fifo: channel 5: killed
> > > > nouveau 0000:01:00.0: fifo: engine 6: scheduled for recovery
> > > > nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery
> > > > nouveau 0000:01:00.0: firefox[476]: channel 5 killed!
> > > > nouveau 0000:01:00.0: firefox[476]: failed to idle channel 5 [firefox[476]]
> > > >
> > > > It lockups several times a day. Twice in just one hour today.
> > > > Can we fix this?
> > >
> > > Unusable
> >
> > Are you using a GTX 660 by any chance? You've provided rather minimal
> > system info.
>
> 01:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 730] (rev a1)

Quite literally the same GPU I have plugged in...

02:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK208B
[GeForce GT 730] [10de:1287] (rev a1)

Works great here! Only other thing I can think of is that I avoid
applications with the letters "G" and "K" in their names, and I'm
using xf86-video-nouveau ddx, whereas you might be using the "modeset"
ddx with glamor.

If all else fails, just remove nouveau_dri.so and/or boot with
nouveau.noaccel=1 -- should be perfect.

Cheers,

-ilia

2019-06-19 06:27:39

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: nouveau: DRM: GPU lockup - switching to software fbcon

On (06/19/19 02:07), Ilia Mirkin wrote:
> On Wed, Jun 19, 2019 at 1:48 AM Sergey Senozhatsky
> <[email protected]> wrote:
> >
> > On (06/19/19 01:20), Ilia Mirkin wrote:
> > > On Wed, Jun 19, 2019 at 1:08 AM Sergey Senozhatsky
> > > <[email protected]> wrote:
> > > >
> > > > On (06/14/19 11:50), Sergey Senozhatsky wrote:
> > > > > dmesg
> > > > >
> > > > > nouveau 0000:01:00.0: DRM: GPU lockup - switching to software fbcon
> > > > > nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
> > > > > nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery
> > > > > nouveau 0000:01:00.0: fifo: channel 5: killed
> > > > > nouveau 0000:01:00.0: fifo: engine 6: scheduled for recovery
> > > > > nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery
> > > > > nouveau 0000:01:00.0: firefox[476]: channel 5 killed!
> > > > > nouveau 0000:01:00.0: firefox[476]: failed to idle channel 5 [firefox[476]]
> > > > >
> > > > > It lockups several times a day. Twice in just one hour today.
> > > > > Can we fix this?
> > > >
> > > > Unusable
> > >
> > > Are you using a GTX 660 by any chance? You've provided rather minimal
> > > system info.
> >
> > 01:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 730] (rev a1)
>
> Quite literally the same GPU I have plugged in...
>
> 02:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK208B
> [GeForce GT 730] [10de:1287] (rev a1)
>
> Works great here! Only other thing I can think of is that I avoid
> applications with the letters "G" and "K" in their names, and I'm
> using xf86-video-nouveau ddx, whereas you might be using the "modeset"
> ddx with glamor.

xf86-video-nouveau 1.0.16-1

cat .local/share/xorg/Xorg.0.log

[..]
[ 304.159] (II) NOUVEAU driver
[ 304.159] (II) NOUVEAU driver for NVIDIA chipset families :
[ 304.159] RIVA TNT (NV04)
[ 304.159] RIVA TNT2 (NV05)
[ 304.159] GeForce 256 (NV10)
[ 304.159] GeForce 2 (NV11, NV15)
[ 304.159] GeForce 4MX (NV17, NV18)
[ 304.159] GeForce 3 (NV20)
[ 304.159] GeForce 4Ti (NV25, NV28)
[ 304.159] GeForce FX (NV3x)
[ 304.159] GeForce 6 (NV4x)
[ 304.159] GeForce 7 (G7x)
[ 304.159] GeForce 8 (G8x)
[ 304.159] GeForce 9 (G9x)
[ 304.159] GeForce GTX 2xx/3xx (GT2xx)
[ 304.159] GeForce GTX 4xx/5xx (GFxxx)
[ 304.159] GeForce GTX 6xx/7xx (GKxxx)
[ 304.159] GeForce GTX 9xx (GMxxx)
[ 304.159] GeForce GTX 10xx (GPxxx)
[ 304.159] (II) modesetting: Driver for Modesetting Kernel Drivers: kms
[ 304.159] (II) [drm] nouveau interface version: 1.3.1
[ 304.159] (WW) Falling back to old probe method for modesetting
[ 304.159] (WW) VGA arbiter: cannot open kernel arbiter, no multi-card support
[ 304.159] (II) Loading sub module "dri2"
[ 304.159] (II) LoadModule: "dri2"
[ 304.159] (II) Module "dri2" already built-in
[ 304.159] (--) NOUVEAU(0): Chipset: "NVIDIA NV106"
[ 304.159] (II) NOUVEAU(0): Creating default Display subsection in Screen section
"Default Screen Section" for depth/fbbpp 24/32
[...]
[ 304.309] (II) UnloadModule: "modesetting"
[ 304.309] (II) Unloading modesetting
[ 304.310] (II) NOUVEAU(0): Channel setup complete.
[ 304.310] (II) NOUVEAU(0): [COPY] async initialised.
[ 304.310] (II) NOUVEAU(0): Hardware support for Present enabled
[ 304.310] (II) NOUVEAU(0): [DRI2] Setup complete
[ 304.310] (II) NOUVEAU(0): [DRI2] DRI driver: nouveau
[ 304.310] (II) NOUVEAU(0): [DRI2] VDPAU driver: nouveau
[ 304.310] (II) Loading sub module "exa"
[ 304.310] (II) LoadModule: "exa"
[ 304.310] (II) Loading /usr/lib/xorg/modules/libexa.so
[ 304.310] (II) Module exa: vendor="X.Org Foundation"
[ 304.310] compiled for 1.20.5, module version = 2.6.0
[ 304.310] ABI class: X.Org Video Driver, version 24.0
[ 304.310] (II) EXA(0): Driver allocated offscreen pixmaps
[ 304.310] (II) EXA(0): Driver registered support for the following operations:
[ 304.310] (II) Solid
[ 304.310] (II) Copy
[ 304.310] (II) Composite (RENDER acceleration)
[ 304.310] (II) UploadToScreen
[ 304.310] (II) DownloadFromScreen
[ 304.310] (==) NOUVEAU(0): Backing store enabled
[ 304.310] (==) NOUVEAU(0): Silken mouse disabled
[ 304.310] (II) NOUVEAU(0): [XvMC] Associated with Nouveau GeForce 8/9 Textured Video.
[ 304.310] (II) NOUVEAU(0): [XvMC] Extension initialized.
[ 304.310] (==) NOUVEAU(0): DPMS enabled
[..]

> If all else fails, just remove nouveau_dri.so and/or boot with
> nouveau.noaccel=1 -- should be perfect.

Can give it a try.

-ss

2019-06-19 06:37:18

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: nouveau: DRM: GPU lockup - switching to software fbcon

On (06/19/19 15:27), Sergey Senozhatsky wrote:
> [..]
>
> > If all else fails, just remove nouveau_dri.so and/or boot with
> > nouveau.noaccel=1 -- should be perfect.
>
> Can give it a try.

That has some impact on system responsiveness:

CPU% COMM
339.7 firefox

Which is slightly less than perfect :)

-ss

2019-07-01 03:19:40

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: nouveau: DRM: GPU lockup - switching to software fbcon

On (06/19/19 02:07), Ilia Mirkin wrote:
> If all else fails, just remove nouveau_dri.so and/or boot with
> nouveau.noaccel=1 -- should be perfect.

nouveau.noaccel=1 did the trick. Is there any other, let's say
less CPU-intensive, way to fix nouveau?

-ss