Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp534884pxu; Tue, 5 Jan 2021 19:20:27 -0800 (PST) X-Google-Smtp-Source: ABdhPJxoag2kxwHTC0Xz6efctkLVzWGk6XZJImKR7qBnswL0Y70X0mX4MJE4cS8wevjJj+8S3309 X-Received: by 2002:a50:ee1a:: with SMTP id g26mr2630677eds.68.1609903227315; Tue, 05 Jan 2021 19:20:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1609903227; cv=none; d=google.com; s=arc-20160816; b=b7iNSCgHe7XJMVtulryQTLvZ1qCubxeLcd6u+dagApJAYj2l6PGXvl2CPpZCALu2RW 7DeMEJoY1p+U4jo6Nazqd8UVU2JZ38WCvCryKpluDr9FkzFIPfHEuLm2z/k6OqNBTKFB jh2k1RJEcEXtVUMURS1aAA0b0Qt8amSFzmjeUW8ICOdX2UoITuEZHlpe7GtA6Q62IsqW UfSWL05m3WrKUhQLAWvseTTw5kYrQzReD56uZ/kXtf4hapD5PB4XCek+biphC0b/9RAg FYkD9w8Ikmyh6ilV6YWlh1dCtsIwVoa0faDCakita2dkVMOkQ2Vt+syzTwvYLSaBYgqT RTTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:mail-followup-to:message-id:subject:to:from:date; bh=Kw+wC59RNU14NMnFEcYRd+A5VN+2kATs9uTeCNhnPsg=; b=Cu+kec+f05GmshPRQuUnlnajO4lgZfrIrycWSdXFpyRCkUF34RmvMOjy52zuHXz5iE wic5AucHBVOd62XQRf428daAlOITasLtZzyKF8xZdPVgiVnNOCwiceGtJHASWItFrVdR d2G3M2A6FoY7uE7dOlW2wR7eTkbETGN5G7J3UBMdIbH7bcY2jkYTk5tNfDNG5ckmPHeJ 8maF1wG626bOkkNrSei4rXG/JBBRy917ATAur4mR3u19MmaHMa0bFOLARx12GFn5q2qd HO3vJOP7s1LVjVqbD55vf8GMBFXk9Ru8giX/Ut404e2CxjJN0CWZ7cH4ctjQAT7bUOma Ld/w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n3si430165edv.105.2021.01.05.19.20.02; Tue, 05 Jan 2021 19:20:27 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725948AbhAFDTL (ORCPT + 99 others); Tue, 5 Jan 2021 22:19:11 -0500 Received: from audible.transient.net ([24.143.126.66]:60276 "HELO audible.transient.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1725730AbhAFDTL (ORCPT ); Tue, 5 Jan 2021 22:19:11 -0500 Received: (qmail 18976 invoked from network); 6 Jan 2021 03:18:30 -0000 Received: from cucamonga.audible.transient.net (192.168.2.5) by canarsie.audible.transient.net with QMQP; 6 Jan 2021 03:18:30 -0000 Received: (nullmailer pid 4867 invoked by uid 1000); Wed, 06 Jan 2021 03:18:30 -0000 Date: Wed, 6 Jan 2021 03:18:29 +0000 From: Jamie Heilman To: Karol Herbst , Ben Skeggs , LKML , nouveau Subject: Re: [Nouveau] nouveau regression post v5.8, still present in v5.10 Message-ID: Mail-Followup-To: Karol Herbst , Ben Skeggs , LKML , nouveau References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Jamie Heilman wrote: > Jamie Heilman wrote: > > Karol Herbst wrote: > > > do you think you'd be able to do a kernel bisect in order to pinpoint > > > the actual commit causing it? Thanks > > > > No. I can't reproduce it reliably. I if I could, bisection wouldn't > > be a problem but as I can't and as it can take weeks for the problem > > to occur there's essentially no chance. I know it regressed roughly > > in 5.8-rc1 only because that's what I was running when the first event > > occured. > > er, 5.9.0-rc1 rather Actually ... I've found a way to reproduce this in hours intead of weeks, so I think I may be able to bisect it after all, it's something of a brute force approach and its probably doing horrible things to the backlight in my poor old monitor, but just running this: #!/bin/sh sleep 5 while ! dmesg | tail | grep -q nouveau do xset dpms force off sleep 65 xdotool mousemove 1024 1024 mousemove restore sleep 10 done Does manage to trip the issue sooner than it would otherwise happen with natural usage. Given that this is my primary workstation and I sort of need it functional during waking hours, it'll take me a bit, but I'll update folks when I have the error more dialed in. I'm using git bisect start -- drivers/gpu/drm include/drm include/video in an effort to make this go a bit quicker, let me know if you think that's a bad idea or I should add other paths. > > > On Sun, Dec 27, 2020 at 8:16 PM Jamie Heilman > > > wrote: > > > > > > > > Something between v5.8 and v5.9 has resulted in periodically losing video. > > > > Unfortunately, I can't reliably reproduce it, it seems to happen every > > > > once in a long while---I can go weeks without an occurance, but it > > > > always seems to happen after my workstation has been idle long enough > > > > to screen blank and put the monitor to sleep. I'm using a single > > > > display (Dell 2405FPW) connected via DVI, running X (Xorg 1.20.x from > > > > Debian sid). I don't really do anything fancy, xterms, a browser or > > > > two, play the occasional video, but like I said, I can't reliably > > > > reproduce this. I've had it happen about 11 times since August. > > > > > > > > lspci -vv output is: > > > > > > > > 01:00.0 VGA compatible controller: NVIDIA Corporation G86 [Quadro NVS 290] (rev a1) (prog-if 00 [VGA controller]) > > > > Subsystem: NVIDIA Corporation G86 [Quadro NVS 290] > > > > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ > > > > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- > > > Latency: 0, Cache Line Size: 64 bytes > > > > Interrupt: pin A routed to IRQ 28 > > > > Region 0: Memory at fc000000 (32-bit, non-prefetchable) [size=16M] > > > > Region 1: Memory at d0000000 (64-bit, prefetchable) [size=256M] > > > > Region 3: Memory at fa000000 (64-bit, non-prefetchable) [size=32M] > > > > Region 5: I/O ports at dc80 [size=128] > > > > Expansion ROM at 000c0000 [disabled] [size=128K] > > > > Capabilities: [60] Power Management version 2 > > > > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) > > > > Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- > > > > Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+ > > > > Address: 00000000fee01004 Data: 4023 > > > > Capabilities: [78] Express (v1) Endpoint, MSI 00 > > > > DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <4us > > > > ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 25.000W > > > > DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq- > > > > RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ > > > > MaxPayload 128 bytes, MaxReadReq 512 bytes > > > > DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- > > > > LnkCap: Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us > > > > ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- > > > > LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+ > > > > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > > > > LnkSta: Speed 2.5GT/s (ok), Width x16 (ok) > > > > TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- > > > > Capabilities: [100 v1] Virtual Channel > > > > Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 > > > > Arb: Fixed- WRR32- WRR64- WRR128- > > > > Ctrl: ArbSelect=Fixed > > > > Status: InProgress- > > > > VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- > > > > Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- > > > > Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01 > > > > Status: NegoPending- InProgress- > > > > Capabilities: [128 v1] Power Budgeting > > > > Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 > > > > Kernel driver in use: nouveau > > > > > > > > The last time this happened, this is what got logged: > > > > > > > > nouveau 0000:01:00.0: disp: ERROR 5 [INVALID_STATE] 06 [] chid 1 mthd 0080 data 00000001 > > > > nouveau 0000:01:00.0: disp: Base 1: > > > > nouveau 0000:01:00.0: disp: 0084: 00000000 > > > > nouveau 0000:01:00.0: disp: 0088: 00000000 > > > > nouveau 0000:01:00.0: disp: 008c: 00000000 > > > > nouveau 0000:01:00.0: disp: 0090: 00000000 > > > > nouveau 0000:01:00.0: disp: 0094: 00000000 > > > > nouveau 0000:01:00.0: disp: 00a0: 00000060 -> 00000070 > > > > nouveau 0000:01:00.0: disp: 00a4: 00000000 -> f0000000 > > > > nouveau 0000:01:00.0: disp: 00c0: 00000000 > > > > nouveau 0000:01:00.0: disp: 00c4: 00000000 > > > > nouveau 0000:01:00.0: disp: 00c8: 00000000 > > > > nouveau 0000:01:00.0: disp: 00cc: 00000000 > > > > nouveau 0000:01:00.0: disp: 00e0: 40000000 > > > > nouveau 0000:01:00.0: disp: 00e4: 00000000 > > > > nouveau 0000:01:00.0: disp: 00e8: 00000000 > > > > nouveau 0000:01:00.0: disp: 00ec: 00000000 > > > > nouveau 0000:01:00.0: disp: 00fc: 00000000 > > > > nouveau 0000:01:00.0: disp: 0100: fffe0000 > > > > nouveau 0000:01:00.0: disp: 0104: 00000000 > > > > nouveau 0000:01:00.0: disp: 0110: 00000000 > > > > nouveau 0000:01:00.0: disp: 0114: 00000000 > > > > nouveau 0000:01:00.0: disp: Base 1 - Image 0: > > > > nouveau 0000:01:00.0: disp: 0800: 00009500 > > > > nouveau 0000:01:00.0: disp: 0804: 00000000 > > > > nouveau 0000:01:00.0: disp: 0808: 04b00780 > > > > nouveau 0000:01:00.0: disp: 080c: 00007804 > > > > nouveau 0000:01:00.0: disp: 0810: 0000cf00 > > > > nouveau 0000:01:00.0: disp: Base 1 - Image 1: > > > > nouveau 0000:01:00.0: disp: 0c00: 00009500 > > > > nouveau 0000:01:00.0: disp: 0c04: 00000000 > > > > nouveau 0000:01:00.0: disp: 0c08: 04b00780 > > > > nouveau 0000:01:00.0: disp: 0c0c: 00007804 > > > > nouveau 0000:01:00.0: disp: 0c10: 0000cf00 > > > > nouveau 0000:01:00.0: disp: ERROR 5 [INVALID_STATE] 06 [] chid 1 mthd 0080 data 00000001 > > > > nouveau 0000:01:00.0: disp: Base 1: > > > > nouveau 0000:01:00.0: disp: 0084: 00000000 > > > > nouveau 0000:01:00.0: disp: 0088: 00000000 > > > > nouveau 0000:01:00.0: disp: 008c: 00000000 > > > > nouveau 0000:01:00.0: disp: 0090: 00000000 > > > > nouveau 0000:01:00.0: disp: 0094: 00000000 > > > > nouveau 0000:01:00.0: disp: 00a0: 00000060 -> 00000070 > > > > nouveau 0000:01:00.0: disp: 00a4: 00000000 -> f0000000 > > > > nouveau 0000:01:00.0: disp: 00c0: 00000000 > > > > nouveau 0000:01:00.0: disp: 00c4: 00000000 > > > > nouveau 0000:01:00.0: disp: 00c8: 00000000 > > > > nouveau 0000:01:00.0: disp: 00cc: 00000000 > > > > nouveau 0000:01:00.0: disp: 00e0: 40000000 > > > > nouveau 0000:01:00.0: disp: 00e4: 00000000 > > > > nouveau 0000:01:00.0: disp: 00e8: 00000000 > > > > nouveau 0000:01:00.0: disp: 00ec: 00000000 > > > > nouveau 0000:01:00.0: disp: 00fc: 00000000 > > > > nouveau 0000:01:00.0: disp: 0100: fffe0000 > > > > nouveau 0000:01:00.0: disp: 0104: 00000000 > > > > nouveau 0000:01:00.0: disp: 0110: 00000000 > > > > nouveau 0000:01:00.0: disp: 0114: 00000000 > > > > nouveau 0000:01:00.0: disp: Base 1 - Image 0: > > > > nouveau 0000:01:00.0: disp: 0800: 00009500 > > > > nouveau 0000:01:00.0: disp: 0804: 00000000 > > > > nouveau 0000:01:00.0: disp: 0808: 04b00780 > > > > nouveau 0000:01:00.0: disp: 080c: 00007804 > > > > nouveau 0000:01:00.0: disp: 0810: 0000cf00 > > > > nouveau 0000:01:00.0: disp: Base 1 - Image 1: > > > > nouveau 0000:01:00.0: disp: 0c00: 00009500 > > > > nouveau 0000:01:00.0: disp: 0c04: 00000000 > > > > nouveau 0000:01:00.0: disp: 0c08: 04b00780 > > > > nouveau 0000:01:00.0: disp: 0c0c: 00007804 > > > > nouveau 0000:01:00.0: disp: 0c10: 0000cf00 > > > > nouveau 0000:01:00.0: DRM: core notifier timeout > > > > nouveau 0000:01:00.0: DRM: base-0: timeout > > > > > > > > I've got logs of all of this, if they help I can collect them. The > > > > timeout message are consistent the error messages a little less so. > > > > > > > > If there's more debugging I can do when this happens, I'd love to know > > > > what it is. > > > > > > > > kernel config: http://audible.transient.net/~jamie/k/nouveau.config-5.10.0 > > > > dmesg at boot: http://audible.transient.net/~jamie/k/nouveau.dmesg > > > > > > > > -- > > > > Jamie Heilman http://audible.transient.net/~jamie/ > > > > _______________________________________________ > > > > Nouveau mailing list > > > > Nouveau@lists.freedesktop.org > > > > https://lists.freedesktop.org/mailman/listinfo/nouveau > > > > > > > > > > > -- > > Jamie Heilman http://audible.transient.net/~jamie/ > > -- > Jamie Heilman http://audible.transient.net/~jamie/ -- Jamie Heilman http://audible.transient.net/~jamie/