Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935624AbdCWSTx (ORCPT ); Thu, 23 Mar 2017 14:19:53 -0400 Received: from mail-oi0-f68.google.com ([209.85.218.68]:34627 "EHLO mail-oi0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932972AbdCWSTv (ORCPT ); Thu, 23 Mar 2017 14:19:51 -0400 To: LKML , Chris Wilson , Tvrtko Ursulin , intel-gfx@lists.freedesktop.org, Jani Nikula , Daniel Vetter From: Larry Finger Subject: Regression in i915 for 4.11-rc1 - bisected to commit 69df05e11ab8 Cc: Thorsten Leemhuis Message-ID: <558064cb-f489-a743-79cb-c88fd06f17aa@lwfinger.net> Date: Thu, 23 Mar 2017 13:19:43 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1599 Lines: 31 Since kernel 4.11-rc1, my desktop (Plasma5/KDE) has encountered intermittent hangs with the following information in the logs: linux-4v1g.suse kernel: [drm] GPU HANG: ecode 7:0:0xf3cffffe, in plasmashell [1283], reason: Hang on render ring, action: reset linux-4v1g.suse kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. linux-4v1g.suse kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel linux-4v1g.suse kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. linux-4v1g.suse kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. linux-4v1g.suse kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error linux-4v1g.suse kernel: drm/i915: Resetting chip after gpu hang This problem was added to https://bugs.freedesktop.org/show_bug.cgi?id=99380, but it probably is a different bug, as the OP in that report has problems with kernel 4.10.x, whereas my problem did not appear until 4.11. The problem was bisected to commit 69df05e11ab8 ("drm/i915: Simplify releasing context reference"). The accuracy of the bisection was tested by reverting that patch in kernel 4.11-rc3. With that change, my kernel has now run for over 17 hours with no problem. Before the reversion, the longest any affected kernel would run was ~3 hours until a gpu hang was detected. I admit that I do not understand this driver, but my guess is that this commit introduced a race condition in the context put. Thanks, Larry