Date: Thu, 7 Dec 2017 15:58:28 +0100
From: Daniel Vetter <daniel@ffwll.ch>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>,
        LKML <linux-kernel@vger.kernel.org>,
        DRI Development <dri-devel@lists.freedesktop.org>,
        Intel Graphics Development <intel-gfx@lists.freedesktop.org>,
        Tvrtko Ursulin <tvrtko.ursulin@intel.com>,
        Marta Lofstedt <marta.lofstedt@intel.com>,
        Byungchul Park <byungchul.park@lge.com>,
        Ingo Molnar <mingo@kernel.org>, Tejun Heo <tj@kernel.org>,
        Kees Cook <keescook@chromium.org>,
        Thomas Gleixner <tglx@linutronix.de>, Shaohua Li <shli@fb.com>,
        Andrew Morton <akpm@linux-foundation.org>,
        Jens Axboe <axboe@kernel.dk>,
        Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
        Jonathan Corbet <corbet@lwn.net>, Oleg Nesterov <oleg@redhat.com>,
        Daniel Vetter <daniel.vetter@intel.com>
Subject: Re: [PATCH] kthread: finer-grained lockdep/cross-release completion
Message-ID: <20171207145828.z52kjbrrlcl75m7m@phenom.ffwll.local>
Mail-Followup-To: Peter Zijlstra <peterz@infradead.org>,
        LKML <linux-kernel@vger.kernel.org>,
        DRI Development <dri-devel@lists.freedesktop.org>,
        Intel Graphics Development <intel-gfx@lists.freedesktop.org>,
        Tvrtko Ursulin <tvrtko.ursulin@intel.com>,
        Marta Lofstedt <marta.lofstedt@intel.com>,
        Byungchul Park <byungchul.park@lge.com>,
        Ingo Molnar <mingo@kernel.org>, Tejun Heo <tj@kernel.org>,
        Kees Cook <keescook@chromium.org>,
        Thomas Gleixner <tglx@linutronix.de>, Shaohua Li <shli@fb.com>,
        Andrew Morton <akpm@linux-foundation.org>,
        Jens Axboe <axboe@kernel.dk>,
        Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
        Jonathan Corbet <corbet@lwn.net>, Oleg Nesterov <oleg@redhat.com>,
        Daniel Vetter <daniel.vetter@intel.com>
References: <20171207100849.407-1-daniel.vetter@ffwll.ch>
 <20171207122255.zi5ishny24k66ik7@hirez.programming.kicks-ass.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20171207122255.zi5ishny24k66ik7@hirez.programming.kicks-ass.net>
User-Agent: NeoMutt/20170609 (1.8.3)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 7651
Lines: 155

On Thu, Dec 07, 2017 at 01:22:56PM +0100, Peter Zijlstra wrote:
> On Thu, Dec 07, 2017 at 11:08:49AM +0100, Daniel Vetter wrote:
> > Since -rc1 we're hitting a bunch of lockdep splats using the new
> > cross-release stuff around the 2 kthread completions. In all cases
> > they are because totally independent uses of kthread are mixed up by
> > lockdep into the same locking class, creating artificial deadlocks.
> > 
> > Fix this by converting kthread code in the same way as e.g.
> > alloc_workqueue already works: Use macros for the public api so we can
> > have a callsite specific lockdep key, then pass that through the
> > entire callchain. Due to the many entry points this is slightly
> > tedious.
> 
> Do you have a splat somewhere? I'm having trouble seeing how all this
> comes together. That is, I've no real idea what you're actual problem is
> and if this is the right solution.

The bugzilla link is full of them. One below as example - it ties entirely
unrelated locking chains from i915 memory management together with other
stuff, with the only common bit that the kthread completions are somewhere
in there. But neither can i915 code ever get at the cpu kthread nor the
other way round, so it's flat out impossible to ever hit a deadlock this
way. Same reasons why not all workqueues share their lockdep map either.
-Daniel

[   85.062523] Setting dangerous option reset - tainting kernel
[   85.068934] i915 0000:00:02.0: Resetting chip after gpu hang

[   85.069408] ======================================================
[   85.069410] WARNING: possible circular locking dependency detected
[   85.069413] 4.15.0-rc1-CI-CI_DRM_3397+ #1 Tainted: G     U          
[   85.069415] ------------------------------------------------------
[   85.069417] gem_exec_captur/2810 is trying to acquire lock:
[   85.069419]  ((completion)&self->parked){+.+.}, at: [<ffffffff8109d69d>] kthread_park+0x3d/0x50
[   85.069426] 
               but task is already holding lock:
[   85.069428]  (&dev->struct_mutex){+.+.}, at: [<ffffffffa039b13d>] i915_reset_device+0x1bd/0x230 [i915]
[   85.069448] 
               which lock already depends on the new lock.

[   85.069451] 
               the existing dependency chain (in reverse order) is:
[   85.069454] 
               -> #3 (&dev->struct_mutex){+.+.}:
[   85.069460]        __mutex_lock+0x81/0x9b0
[   85.069481]        i915_mutex_lock_interruptible+0x47/0x130 [i915]
[   85.069502]        i915_gem_fault+0x201/0x760 [i915]
[   85.069507]        __do_fault+0x15/0x70
[   85.069509]        __handle_mm_fault+0x7bf/0xda0
[   85.069512]        handle_mm_fault+0x14f/0x2f0
[   85.069515]        __do_page_fault+0x2d1/0x560
[   85.069518]        page_fault+0x22/0x30
[   85.069520] 
               -> #2 (&mm->mmap_sem){++++}:
[   85.069525]        __might_fault+0x63/0x90
[   85.069529]        _copy_to_user+0x1e/0x70
[   85.069532]        perf_read+0x21d/0x290
[   85.069534]        __vfs_read+0x1e/0x120
[   85.069536]        vfs_read+0xa1/0x150
[   85.069539]        SyS_read+0x40/0xa0
[   85.069541]        entry_SYSCALL_64_fastpath+0x1c/0x89
[   85.069543] 
               -> #1 (&cpuctx_mutex){+.+.}:
[   85.069547]        perf_event_ctx_lock_nested+0xbc/0x1d0
[   85.069549] 
               -> #0 ((completion)&self->parked){+.+.}:
[   85.069555]        lock_acquire+0xaf/0x200
[   85.069558]        wait_for_common+0x54/0x210
[   85.069560]        kthread_park+0x3d/0x50
[   85.069579]        i915_gem_reset_prepare_engine+0x1d/0x90 [i915]
[   85.069600]        i915_gem_reset_prepare+0x2c/0x60 [i915]
[   85.069617]        i915_reset+0x66/0x230 [i915]
[   85.069635]        i915_reset_device+0x1cb/0x230 [i915]
[   85.069651]        i915_handle_error+0x2d3/0x430 [i915]
[   85.069670]        i915_wedged_set+0x79/0xc0 [i915]
[   85.069673]        simple_attr_write+0xab/0xc0
[   85.069677]        full_proxy_write+0x4b/0x70
[   85.069679]        __vfs_write+0x1e/0x130
[   85.069682]        vfs_write+0xc0/0x1b0
[   85.069684]        SyS_write+0x40/0xa0
[   85.069686]        entry_SYSCALL_64_fastpath+0x1c/0x89
[   85.069688] 
               other info that might help us debug this:

[   85.069692] Chain exists of:
                 (completion)&self->parked --> &mm->mmap_sem --> &dev->struct_mutex

[   85.069698]  Possible unsafe locking scenario:

[   85.069701]        CPU0                    CPU1
[   85.069703]        ----                    ----
[   85.069705]   lock(&dev->struct_mutex);
[   85.069707]                                lock(&mm->mmap_sem);
[   85.069710]                                lock(&dev->struct_mutex);
[   85.069713]   lock((completion)&self->parked);
[   85.069715] 
                *** DEADLOCK ***

[   85.069718] 3 locks held by gem_exec_captur/2810:
[   85.069722]  #0:  (sb_writers#10){.+.+}, at: [<ffffffff811ff05e>] vfs_write+0x15e/0x1b0
[   85.069727]  #1:  (&attr->mutex){+.+.}, at: [<ffffffff8122a866>] simple_attr_write+0x36/0xc0
[   85.069732]  #2:  (&dev->struct_mutex){+.+.}, at: [<ffffffffa039b13d>] i915_reset_device+0x1bd/0x230 [i915]
[   85.069751] 
               stack backtrace:
[   85.069754] CPU: 2 PID: 2810 Comm: gem_exec_captur Tainted: G     U           4.15.0-rc1-CI-CI_DRM_3397+ #1
[   85.069758] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016
[   85.069760] Call Trace:
[   85.069765]  dump_stack+0x5f/0x86
[   85.069769]  print_circular_bug+0x230/0x3b0
[   85.069772]  check_prev_add+0x439/0x7b0
[   85.069775]  ? lockdep_init_map_crosslock+0x20/0x20
[   85.069778]  ? _raw_spin_unlock_irq+0x24/0x50
[   85.069782]  ? __lock_acquire+0x1385/0x15a0
[   85.069784]  __lock_acquire+0x1385/0x15a0
[   85.069788]  lock_acquire+0xaf/0x200
[   85.069790]  ? kthread_park+0x3d/0x50
[   85.069793]  wait_for_common+0x54/0x210
[   85.069797]  ? kthread_park+0x3d/0x50
[   85.069799]  ? _raw_spin_unlock_irqrestore+0x55/0x60
[   85.069802]  ? try_to_wake_up+0x2a2/0x600
[   85.069804]  ? _raw_spin_unlock_irqrestore+0x4c/0x60
[   85.069807]  kthread_park+0x3d/0x50
[   85.069827]  i915_gem_reset_prepare_engine+0x1d/0x90 [i915]
[   85.069849]  i915_gem_reset_prepare+0x2c/0x60 [i915]
[   85.069865]  i915_reset+0x66/0x230 [i915]
[   85.069881]  i915_reset_device+0x1cb/0x230 [i915]
[   85.069898]  ? gen8_gt_irq_ack+0x160/0x160 [i915]
[   85.069902]  ? work_on_cpu_safe+0x50/0x50
[   85.069919]  i915_handle_error+0x2d3/0x430 [i915]
[   85.069923]  ? __mutex_lock+0x81/0x9b0
[   85.069926]  ? __mutex_lock+0x432/0x9b0
[   85.069929]  ? __might_fault+0x39/0x90
[   85.069933]  ? __might_fault+0x39/0x90
[   85.069951]  i915_wedged_set+0x79/0xc0 [i915]
[   85.069955]  simple_attr_write+0xab/0xc0
[   85.069959]  full_proxy_write+0x4b/0x70
[   85.069961]  __vfs_write+0x1e/0x130
[   85.069965]  ? rcu_read_lock_sched_held+0x6f/0x80
[   85.069968]  ? rcu_sync_lockdep_assert+0x25/0x50
[   85.069971]  ? __sb_start_write+0xd5/0x1f0
[   85.069973]  ? __sb_start_write+0xef/0x1f0
[   85.069976]  vfs_write+0xc0/0x1b0
[   85.069979]  SyS_write+0x40/0xa0
[   85.069982]  entry_SYSCALL_64_fastpath+0x1c/0x89
[   85.069984] RIP: 0033:0x7f22dd739670
[   85.069986] RSP: 002b:00007ffe3f6bc7d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[   85.069990] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f22dd739670
[   85.069994] RDX: 0000000000000002 RSI: 000055f260b5c376 RDI: 0000000000000007
[   85.069996] RBP: 0000000000000001 R08: 000055f2614ebed0 R09: 0000000000000000
[   85.069999] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f22df019000
[   85.070002] R13: 0000000000000007 R14: 00007f22df018000 R15: 0000000000000003
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch