Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932404AbdDGJFk (ORCPT ); Fri, 7 Apr 2017 05:05:40 -0400 Received: from mga05.intel.com ([192.55.52.43]:35023 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755030AbdDGJFa (ORCPT ); Fri, 7 Apr 2017 05:05:30 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.37,164,1488873600"; d="scan'208";a="245758583" Message-ID: <1491555922.3493.18.camel@linux.intel.com> Subject: Re: [Intel-gfx] [PATCH 1/5] i915: avoid kernel hang caused by synchronize rcu struct_mutex deadlock From: Joonas Lahtinen To: Andrea Arcangeli , Martin Kepplinger , Thorsten Leemhuis , daniel.vetter@intel.com, Dave Airlie , Chris Wilson Cc: intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org Date: Fri, 07 Apr 2017 12:05:22 +0300 In-Reply-To: <20170406232347.988-2-aarcange@redhat.com> References: <87pogtplxr.fsf@intel.com> <20170406232347.988-1-aarcange@redhat.com> <20170406232347.988-2-aarcange@redhat.com> Organization: Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.20.5 (3.20.5-1.fc24) Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1878 Lines: 48 On pe, 2017-04-07 at 01:23 +0200, Andrea Arcangeli wrote: > synchronize_rcu/synchronize_sched/synchronize_rcu_expedited() will > hang until its own workqueues are run. The i915 gem workqueues will > wait on the struct_mutex to be released. So we cannot wait for a > quiescent state using those rcu primitives while holding the > struct_mutex or it creates a circular lock dependency resulting in > kernel hangs (which is reproducible but goes undetected by lockdep). > > This started in commit 3d3d18f086cdda72ee18a454db70ca72c6e3246c and > lockdep didn't detect it apparently. The right format is; Fixes: 3d3d18f086cd ("drm/i915: Avoid rcu_barrier() from reclaim paths (shrinker)") > @@ -324,6 +320,16 @@ i915_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc) >   if (unlock) >   mutex_unlock(&dev->struct_mutex); >   > + if (likely(__mutex_owner(&dev->struct_mutex) != current)) This check can be dropped and synchronize_rcu_expedited() should be embedded directly to the if (unlock) branch as it's functionally equivalent. This can be applied to all the unlock cases, not just this one. That should be the correct action to avoid the deadlock. I've sent a patch to do this (Cc'd you), can you verify that it gets rid of the problem for you? > + /* > +  * If reclaim was invoked by an allocation done while > +  * holding the struct mutex, we cannot call > +  * synchronize_rcu_expedited() as it depends on > +  * workqueues to run but the running workqueue may be > +  * blocked waiting on us to release struct_mutex. > +  */ > + synchronize_rcu_expedited(); > + >   return freed; >  } >   > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Joonas Lahtinen Open Source Technology Center Intel Corporation