Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934747AbdCWUpi (ORCPT ); Thu, 23 Mar 2017 16:45:38 -0400 Received: from mail.fireflyinternet.com ([109.228.58.192]:53158 "EHLO fireflyinternet.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752412AbdCWUpY (ORCPT ); Thu, 23 Mar 2017 16:45:24 -0400 X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Date: Thu, 23 Mar 2017 20:44:52 +0000 From: Chris Wilson To: Larry Finger Cc: LKML , Tvrtko Ursulin , intel-gfx@lists.freedesktop.org, Jani Nikula , Daniel Vetter , Thorsten Leemhuis Subject: Re: Regression in i915 for 4.11-rc1 - bisected to commit 69df05e11ab8 Message-ID: <20170323204452.GN27773@nuc-i3427.alporthouse.com> Mail-Followup-To: Chris Wilson , Larry Finger , LKML , Tvrtko Ursulin , intel-gfx@lists.freedesktop.org, Jani Nikula , Daniel Vetter , Thorsten Leemhuis References: <558064cb-f489-a743-79cb-c88fd06f17aa@lwfinger.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <558064cb-f489-a743-79cb-c88fd06f17aa@lwfinger.net> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3174 Lines: 62 On Thu, Mar 23, 2017 at 01:19:43PM -0500, Larry Finger wrote: > Since kernel 4.11-rc1, my desktop (Plasma5/KDE) has encountered > intermittent hangs with the following information in the logs: > > linux-4v1g.suse kernel: [drm] GPU HANG: ecode 7:0:0xf3cffffe, in > plasmashell [1283], reason: Hang on render ring, action: reset > linux-4v1g.suse kernel: [drm] GPU hangs can indicate a bug anywhere > in the entire gfx stack, including userspace. > linux-4v1g.suse kernel: [drm] Please file a _new_ bug report on > bugs.freedesktop.org against DRI -> DRM/Intel > linux-4v1g.suse kernel: [drm] drm/i915 developers can then reassign > to the right component if it's not a kernel issue. > linux-4v1g.suse kernel: [drm] The gpu crash dump is required to > analyze gpu hangs, so please always attach it. > linux-4v1g.suse kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error > linux-4v1g.suse kernel: drm/i915: Resetting chip after gpu hang > > This problem was added to > https://bugs.freedesktop.org/show_bug.cgi?id=99380, but it probably > is a different bug, as the OP in that report has problems with > kernel 4.10.x, whereas my problem did not appear until 4.11. Close. Actually that patch touches code you are not using (oa-perf and gvt), the real culprit was e8a9c58fcd9a ("drm/i915: Unify active context tracking between legacy/execlists/guc"). The fix commit 5d4bac5503fcc67dd7999571e243cee49371aef7 Author: Chris Wilson Date: Wed Mar 22 20:59:30 2017 +0000 drm/i915: Restore marking context objects as dirty on pinning Commit e8a9c58fcd9a ("drm/i915: Unify active context tracking between legacy/execlists/guc") converted the legacy intel_ringbuffer submission to the same context pinning mechanism as execlists - that is to pin the context until the subsequent request is retired. Previously it used the vma retirement of the context object to keep itself pinned until the next request (after i915_vma_move_to_active()). In the conversion, I missed that the vma retirement was also responsible for marking the object as dirty. Mark the context object as dirty when pinning (equivalent to execlists) which ensures that if the context is swapped out due to mempressure or suspend/hibernation, when it is loaded back in it does so with the previous state (and not all zero). Fixes: e8a9c58fcd9a ("drm/i915: Unify active context tracking between legacy/execlists/guc") Reported-by: Dennis Gilmore Reported-by: Mathieu Marquer Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99993 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100181 Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin Cc: # v4.11-rc1 Link: http://patchwork.freedesktop.org/patch/msgid/20170322205930.12762-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin went in this morning and so will be upstreamed ~next week. -Chris -- Chris Wilson, Intel Open Source Technology Centre