Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752582AbdLGT5Z (ORCPT ); Thu, 7 Dec 2017 14:57:25 -0500 Received: from bombadil.infradead.org ([65.50.211.133]:37910 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750922AbdLGT5X (ORCPT ); Thu, 7 Dec 2017 14:57:23 -0500 Date: Thu, 7 Dec 2017 20:57:09 +0100 From: Peter Zijlstra To: LKML , DRI Development , Intel Graphics Development , Tvrtko Ursulin , Marta Lofstedt , Byungchul Park , Ingo Molnar , Tejun Heo , Kees Cook , Thomas Gleixner , Shaohua Li , Andrew Morton , Jens Axboe , Greg Kroah-Hartman , Jonathan Corbet , Oleg Nesterov , Daniel Vetter Subject: Re: [PATCH] kthread: finer-grained lockdep/cross-release completion Message-ID: <20171207195709.cxcil57boc6czdot@hirez.programming.kicks-ass.net> References: <20171207100849.407-1-daniel.vetter@ffwll.ch> <20171207122255.zi5ishny24k66ik7@hirez.programming.kicks-ass.net> <20171207145828.z52kjbrrlcl75m7m@phenom.ffwll.local> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171207145828.z52kjbrrlcl75m7m@phenom.ffwll.local> User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3403 Lines: 93 On Thu, Dec 07, 2017 at 03:58:28PM +0100, Daniel Vetter wrote: > [ 85.069417] gem_exec_captur/2810 is trying to acquire lock: > [ 85.069419] ((completion)&self->parked){+.+.}, at: [] kthread_park+0x3d/0x50 > [ 85.069426] > but task is already holding lock: > [ 85.069428] (&dev->struct_mutex){+.+.}, at: [] i915_reset_device+0x1bd/0x230 [i915] > [ 85.069448] > which lock already depends on the new lock. > > [ 85.069451] > the existing dependency chain (in reverse order) is: > [ 85.069454] > -> #3 (&dev->struct_mutex){+.+.}: > [ 85.069460] __mutex_lock+0x81/0x9b0 > [ 85.069481] i915_mutex_lock_interruptible+0x47/0x130 [i915] > [ 85.069502] i915_gem_fault+0x201/0x760 [i915] > [ 85.069507] __do_fault+0x15/0x70 > [ 85.069509] __handle_mm_fault+0x7bf/0xda0 > [ 85.069512] handle_mm_fault+0x14f/0x2f0 > [ 85.069515] __do_page_fault+0x2d1/0x560 > [ 85.069518] page_fault+0x22/0x30 > [ 85.069520] > -> #2 (&mm->mmap_sem){++++}: > [ 85.069525] __might_fault+0x63/0x90 > [ 85.069529] _copy_to_user+0x1e/0x70 > [ 85.069532] perf_read+0x21d/0x290 > [ 85.069534] __vfs_read+0x1e/0x120 > [ 85.069536] vfs_read+0xa1/0x150 > [ 85.069539] SyS_read+0x40/0xa0 > [ 85.069541] entry_SYSCALL_64_fastpath+0x1c/0x89 > -> #0 ((completion)&self->parked){+.+.}: > [ 85.069692] Chain exists of: > (completion)&self->parked --> &mm->mmap_sem --> &dev->struct_mutex > [ 85.069718] 3 locks held by gem_exec_captur/2810: > [ 85.069732] #2: (&dev->struct_mutex){+.+.}, at: [] i915_reset_device+0x1bd/0x230 [i915] > stack backtrace: > [ 85.069788] lock_acquire+0xaf/0x200 > [ 85.069793] wait_for_common+0x54/0x210 > [ 85.069807] kthread_park+0x3d/0x50 > [ 85.069827] i915_gem_reset_prepare_engine+0x1d/0x90 [i915] > [ 85.069849] i915_gem_reset_prepare+0x2c/0x60 [i915] > [ 85.069865] i915_reset+0x66/0x230 [i915] > [ 85.069881] i915_reset_device+0x1cb/0x230 [i915] > [ 85.069919] i915_handle_error+0x2d3/0x430 [i915] > [ 85.069951] i915_wedged_set+0x79/0xc0 [i915] > [ 85.069955] simple_attr_write+0xab/0xc0 > [ 85.069959] full_proxy_write+0x4b/0x70 > [ 85.069961] __vfs_write+0x1e/0x130 > [ 85.069976] vfs_write+0xc0/0x1b0 > [ 85.069979] SyS_write+0x40/0xa0 > [ 85.069982] entry_SYSCALL_64_fastpath+0x1c/0x89 Thread-A k-Thread i915_reset_device #3 mutex_lock(&dev->struct_mutex) i915_reset() i915_gem_reset_prepare() i915_gem_reset_prepare_engine() kthread_park() __do_page_fault() #2 down_read(&mm->mmap_sem); handle_mm_fault() __handle_mm_fault() __do_fault() i915_gem_fault() i915_mutex_lock_interruptible() #3 mutex_lock(&dev->struct_mutex) /* twiddles thumbs forever more */ #0 wait_for_common() #0 complete() Is what it says I suppose. Now I don't know enough about that i915 code to say if that breadcrumbs_signal thread can ever trigger a fault or not. I got properly lost in that dma_fence callback maze. You're saying not? (also, that comment near need_resched() doesn't make sense to me)