Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757063Ab3FKUNc (ORCPT ); Tue, 11 Jun 2013 16:13:32 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:41952 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756695Ab3FKUEB (ORCPT ); Tue, 11 Jun 2013 16:04:01 -0400 From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Chris Wilson , Daniel Vetter , Damien Lespiau Subject: [ 57/79] drm/i915: Fix spurious -EIO/SIGBUS on wedged gpus Date: Tue, 11 Jun 2013 13:03:23 -0700 Message-Id: <20130611195323.607655857@linuxfoundation.org> X-Mailer: git-send-email 1.8.3.254.g5578ad7 In-Reply-To: <20130611195312.352656079@linuxfoundation.org> References: <20130611195312.352656079@linuxfoundation.org> User-Agent: quilt/0.60-5.1.1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2937 Lines: 80 3.9-stable review patch. If anyone has any objections, please let me know. ------------------ From: Daniel Vetter commit 7abb690a0e095717420ba78dcab4309abbbec78a upstream. Chris Wilson noticed that since commit 1f83fee08d625f8d0130f9fe5ef7b17c2e022f3c [v3.9] Author: Daniel Vetter Date: Thu Nov 15 17:17:22 2012 +0100 drm/i915: clear up wedged transitions X can again get -EIO when it does not expect it. And even worse score a SIGBUS when accessing gtt mmaps. The established ABI is that we _only_ return an -EIO from execbuf - all other ioctls should just work. And since the reset code moves all bos out of gpu domains and clears out all the last_seqno/ring tracking there really shouldn't be any reason for non-execbuf code to ever touch the hw and see an -EIO. After some extensive discussions we've noticed that these spurios -EIO are caused by i915_gem_wait_for_error: http://www.mail-archive.com/intel-gfx@lists.freedesktop.org/msg20540.html That is easy to fix by returning 0 instead of -EIO, since grabbing the dev->struct_mutex does not yet mean that we actually want to touch the hw. And so there is no reason at all to fail with -EIO. But that's not the entire since, since often (at least it's easily googleable) dmesg indicates that the reset fails and we declare the gpu wedged. Then, quite a bit later X wakes up with the "Timed out waiting for the gpu reset to complete" DRM_ERROR message in wait_for_errror and brings down the desktop with an -EIO/SIGBUS. So clearly we're missing a wakeup somewhere, since the gpu reset just doesn't take 10 seconds to complete. And indeed we're do handle the terminally wedged state wrong. Fix this all up. Reviewed-by: Chris Wilson Cc: Daniel Vetter Cc: Damien Lespiau Signed-off-by: Daniel Vetter Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/i915/i915_gem.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -91,14 +91,11 @@ i915_gem_wait_for_error(struct i915_gpu_ { int ret; -#define EXIT_COND (!i915_reset_in_progress(error)) +#define EXIT_COND (!i915_reset_in_progress(error) || \ + i915_terminally_wedged(error)) if (EXIT_COND) return 0; - /* GPU is already declared terminally dead, give up. */ - if (i915_terminally_wedged(error)) - return -EIO; - /* * Only wait 10 seconds for the gpu reset to complete to avoid hanging * userspace. If it takes that long something really bad is going on and -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/