Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751172AbdHXCSo (ORCPT ); Wed, 23 Aug 2017 22:18:44 -0400 Received: from LGEAMRELO12.lge.com ([156.147.23.52]:51306 "EHLO lgeamrelo12.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751061AbdHXCSn (ORCPT ); Wed, 23 Aug 2017 22:18:43 -0400 X-Original-SENDERIP: 156.147.1.126 X-Original-MAILFROM: byungchul.park@lge.com X-Original-SENDERIP: 10.177.222.33 X-Original-MAILFROM: byungchul.park@lge.com Date: Thu, 24 Aug 2017 11:18:40 +0900 From: Byungchul Park To: Peter Zijlstra Cc: mingo@kernel.org, tj@kernel.org, boqun.feng@gmail.com, david@fromorbit.com, johannes@sipsolutions.net, oleg@redhat.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH 4/4] lockdep: Fix workqueue crossrelease annotation Message-ID: <20170824021840.GC6772@X58A-UD3R> References: <20170823115843.662056844@infradead.org> <20170823121432.990701317@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170823121432.990701317@infradead.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2778 Lines: 82 On Wed, Aug 23, 2017 at 01:58:47PM +0200, Peter Zijlstra wrote: > The new completion/crossrelease annotations interact unfavourable with > the extant flush_work()/flush_workqueue() annotations. > > The problem is that when a single work class does: > > wait_for_completion(&C) > > and > > complete(&C) > > in different executions, we'll build dependencies like: > > lock_map_acquire(W) > complete_acquire(C) > > and > > lock_map_acquire(W) > complete_release(C) > > which results in the dependency chain: W->C->W, which lockdep thinks > spells deadlock, even though there is no deadlock potential since > works are ran concurrently. > > One possibility would be to change the work 'lock' to recursive-read, I'm not sure if this solve the issue perfectly, but anyway it should be a recursive version after fixing lockdep, regardless of the issue. > but that would mean hitting a lockdep limitation on recursive locks. Fo now, work-around might be needed. > Also, unconditinoally switching to recursive-read here would fail to > detect the actual deadlock on single-threaded workqueues, which do Do you mean it's true even in case having fixed lockdep properly? Could you explain why if so? IMHO, I don't think so. > @@ -4751,15 +4751,31 @@ static inline void invalidate_xhlock(str > * The same is true for system-calls, once a system call is completed (we've > * returned to userspace) the next system call does not depend on the lock > * history of the previous system call. > + * > + * They key property for independence, this invariant state, is that it must be > + * a point where we hold no locks and have no history. Because if we were to > + * hold locks, the restore at _end() would not necessarily recover it's history > + * entry. Similarly, independence per-definition means it does not depend on > + * prior state. > */ > -void crossrelease_hist_start(enum xhlock_context_t c) > +void crossrelease_hist_start(enum xhlock_context_t c, bool force) > { > struct task_struct *cur = current; > > - if (cur->xhlocks) { > - cur->xhlock_idx_hist[c] = cur->xhlock_idx; > - cur->hist_id_save[c] = cur->hist_id; > + if (!cur->xhlocks) > + return; > + > + /* > + * We call this at an invariant point, no current state, no history. > + */ This very work-around code _must_ be removed after fixing read-recursive thing in lockdep. I think it would be better to add a tag(comment) saying it. > + if (c == XHLOCK_PROC) { > + /* verified the former, ensure the latter */ > + WARN_ON_ONCE(!force && cur->lockdep_depth); > + invalidate_xhlock(&xhlock(cur->xhlock_idx)); > } > + > + cur->xhlock_idx_hist[c] = cur->xhlock_idx; > + cur->hist_id_save[c] = cur->hist_id; > } > > void crossrelease_hist_end(enum xhlock_context_t c)