Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp726484pxb; Tue, 2 Feb 2021 16:50:36 -0800 (PST) X-Google-Smtp-Source: ABdhPJyOjibnhS4qx1JIBuuTKPXUBk5fYuOkktLz+G3hO/0bsMRLsh/IisTqMuNg79UOZFd4oSDh X-Received: by 2002:a17:906:27d7:: with SMTP id k23mr650329ejc.300.1612313436258; Tue, 02 Feb 2021 16:50:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612313436; cv=none; d=google.com; s=arc-20160816; b=zHBltOTZ+vPHF0/nGW+JgzPFYS5mxSOvEn3C4F6fcPdGjS157L7GSL+0v0srgoVcGO PZ3gpylYiG0o96WvEKEFri9M5LIWal/7d2cHiaB71LmV201N1+gDtg0GnKetVZ9RNF74 XMkLkdmpQrLWHRHiNWkxi5DX9RYevTAcmhKNfIsoUHb8Mf/GAZdlZag1mlSUVuGwvO8S vLgW1Aybjwd4iVNoea/+Vua0KkCg0Y5sygrcDkvB/jy4P/3FCkzKyUG+W0iUDfga8fcp Bo5opWapLa8neWYe6NHXg1tkKTuo8TnPO6/XwV1ynbIiGVr2SSoVJabOkyq2AY7wXMMY bXIw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:mail-followup-to:message-id:subject:to:from:date; bh=3pNcaEtZWMRSD7XoQ9INWlZCzSfONCObKz3xU316TxE=; b=axPC+n9Yy9+MkiXfK8V5GRCoXk+x86TKdcpjGGLQ+NoejPrQfjhKuJxaShECFrP7um TFx/C/9oUHzvQsozYF386DfMkoRlJyK6BvdwI9+hDDgWZIgx/Dwp6qYs4uVEwTmDN5c8 nYbNMD3bx2uGY+ob/yc61nDjvy2GlymGcB4Kl4Zi7C/5OlBs1qKgMZuWp5sQuHbRvzsu hAl5+tZuKFgTVcO9ll/0oHo9uFZZew4sM9f6giWvzi/VJveKhxd1O+sD3fouXCngo8Ab s1DzGRh7hgbnBpUk9LhiAPbd5xURDHQLgIvOXqoHYLgGVVhWHj426eq2Gw3IIKZILoNY VxYw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id qk25si317472ejb.422.2021.02.02.16.50.12; Tue, 02 Feb 2021 16:50:36 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233629AbhBBU3o (ORCPT + 99 others); Tue, 2 Feb 2021 15:29:44 -0500 Received: from audible.transient.net ([24.143.126.66]:45780 "HELO audible.transient.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S233785AbhBBU1h (ORCPT ); Tue, 2 Feb 2021 15:27:37 -0500 Received: (qmail 31260 invoked from network); 2 Feb 2021 20:26:48 -0000 Received: from cucamonga.audible.transient.net (192.168.2.5) by canarsie.audible.transient.net with QMQP; 2 Feb 2021 20:26:48 -0000 Received: (nullmailer pid 3418 invoked by uid 1000); Tue, 02 Feb 2021 20:26:48 -0000 Date: Tue, 2 Feb 2021 20:26:48 +0000 From: Jamie Heilman To: Karol Herbst , Ben Skeggs , LKML , nouveau Subject: Re: [Nouveau] nouveau regression post v5.8, still present in v5.10 Message-ID: Mail-Followup-To: Karol Herbst , Ben Skeggs , LKML , nouveau References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Jamie Heilman wrote: > Jamie Heilman wrote: > > Karol Herbst wrote: > > > fyi, there is a patch which solves a maybe related issue on your GPU, > > > mind giving it a try before we dig further? > > > https://gitlab.freedesktop.org/drm/nouveau/-/issues/14#note_767791 > > > > So, I tried that. Turns out, I can still trigger a problem. Is it > > the same problem? Maybe? I also tried applying the patch from > > > > ca386aa7155a ("drm/nouveau/kms/nv50-gp1xx: add WAR for EVO push buffer HW bug") > > to 5.8.0-rc6-01516-g0a96099691c8 and very interestingly, it changed > > the mode of failure to same thing I saw with 5.10.9 patched with the patch > > from that bug report. In both cases I get this in the log: > > > > kern.err: nouveau 0000:01:00.0: Xorg[2243]: nv50cal_space: -16 > > kern.err: nouveau 0000:01:00.0: Xorg[2243]: nv50cal_space: -16 > > kern.err: nouveau 0000:01:00.0: Xorg[2243]: nv50cal_space: -16 > > kern.err: nouveau 0000:01:00.0: Xorg[2243]: nv50cal_space: -16 > > ... > > and so on > > > > In one incident my monitor would't even wake up anymore after this. > > > > > > I'm trying to repo it now on an unpatched 5.8.0-rc6-01515-gae09163ac27c > > right now, as running glxgears does seem to help reproduce problems > > faster which is nice, I'm just not entirely sure it's the same set of > > problems; hopefully that version is free from issues, but we'll > > see... > > Ugh, well I can crash 5.8.0-rc6-01515-gae09163ac27c and 5.8.18 in > basically the same way running glxgears and a xset dpms force off > loop. So I'm starting to think it's not the same thing, and that > problem has been latent from before I started having periodic issues. > > I should note that my exact testing technique for the above was to run > 4 copies of glxgears and the xset force dpms off loop at the same > time. Really looks more like it triggers a resource starvation issue > maybe. The crash is also worse, particularly if I don't do anything > about it right away as my workstation eventually falls off the network > and I'm forced to power cycle it; the crashes I was chasing after > wouldn't do quite that much violence, normally I could still log in, > rebuild a kernel, and shut things down cleanly. > > More than one bug here I suspect. OK, I went back and bisected again while patching known issues to get a better idea what was causing the problem I've been having and I'm confident it was the bug which Bastian Beranek's patch (now in mainline) addressed. My original bisection got confused by the EVO push buffer HW bug which was fixed in ca386aa7155a54. Once I bisected with the patch from ca386aa7155a54 applied, my bisection landed on f844eb485eb05 and Bastian Beranek's patch fixed that right up. 'course I remain mildly concerned I can crash the kernel with little more than glxgears and xset ... but the original stability problem I reported I can safely say has been fixed. If I can figure out the nature of what I suspect is unrecoverable resource starvation, I'll start a new thread for that. -- Jamie Heilman http://audible.transient.net/~jamie/