Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp365611pxb; Thu, 12 Nov 2020 05:59:28 -0800 (PST) X-Google-Smtp-Source: ABdhPJx42XCxJHHAqw8g4AFNVHliMe7C+ILRyqXduP3wYNZaGc7NraVMz0WTv0TwwizljS3mEUAp X-Received: by 2002:a50:bc02:: with SMTP id j2mr5479949edh.317.1605189567907; Thu, 12 Nov 2020 05:59:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1605189567; cv=none; d=google.com; s=arc-20160816; b=nbe6W7gh5/nV92fX0iD/0QVon7IPPJgiKVYdVpqeZfBivvJQEydnWcLTiePLfwLcJj MdBszhibM8YLUXPRe2A9/WramWRJWN+Fo02cKq9/ZoN2UsXdAIhcMaE+A/r4rurvlngC LvRMBl+jyXMt3Qq4dMfGcgzxEzZl5r50m2zlxovvMQ4UgIJjQBkvLICHUZ2IuS1mKjrV 2PxMsiPv6biK66TN048exnXQ9l+NvcxQqIbj76XD7GzdjXtPWG/egRfcRTPKNYCE414P wNKbY/IdUd/+AQwUbh4sf0IvG4FBbD4V2vXTM7J6cBiiG0HM/En7FWT0p8Yw0kyWcC3a TpXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=gg4hBcITMmHsAQ3VH+S26lqbosyuJB6Bks+9rbgVpbY=; b=T6hwvw9VyIUAAgHrFOKrYVkOGUhG39ML+4FtLNyL/GoP+RJ1X+3UADO0jADZRNkLzD sC0KJARMKfE1xp0RBInORNUz9/pNuATkyVA7YjR4GsVlZVY0BoLvk8vO8XVAcL5Pt3Yh DvmjFakqHCp4KQDl2EVpf/JXqfTbXmn6fc3SPSoIhxR67hY+vXlLpdO8EUughRxTJeKF sP6HpoeURVGiLz8lkM+i0WoM8C4UUtsZGz1Eas0eyGr4TDuyFrZDcocpoDH5sbkWCQaK zkstBFgjepJR+V33iD0e8QtQGgZI94tk2eKb0zrkbJKkUQqzoBTkm/YHkM7kOsRmPMDI Mbdg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ffwll.ch header.s=google header.b=G5xlvt9c; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id o16si3536660eja.345.2020.11.12.05.59.04; Thu, 12 Nov 2020 05:59:27 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@ffwll.ch header.s=google header.b=G5xlvt9c; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728414AbgKLN5C (ORCPT + 99 others); Thu, 12 Nov 2020 08:57:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54060 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727796AbgKLN5B (ORCPT ); Thu, 12 Nov 2020 08:57:01 -0500 Received: from mail-ot1-x342.google.com (mail-ot1-x342.google.com [IPv6:2607:f8b0:4864:20::342]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 85741C0613D1 for ; Thu, 12 Nov 2020 05:57:01 -0800 (PST) Received: by mail-ot1-x342.google.com with SMTP id 79so5612930otc.7 for ; Thu, 12 Nov 2020 05:57:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=gg4hBcITMmHsAQ3VH+S26lqbosyuJB6Bks+9rbgVpbY=; b=G5xlvt9cfX/zvnBRJyzlApPN6l5CLwfEJfQAyCMK4FXqX6GGpGgXsyQDsBU4nNTMNV PDZFy/XABwwVwKHEriR/Q4AZ/s5iLnt7jllyyWjdxV1ZMT4mJ7MGdR8f41kysgv6TZiM 9aaOpYMC+LW2SoJ4K+NXmSP3AClcplKs+Fnak= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=gg4hBcITMmHsAQ3VH+S26lqbosyuJB6Bks+9rbgVpbY=; b=djrFS9kMpLVnOFAPcOnkzDzixbS8sg3eV5GNE9ZTFB3V0aOPxynV9Dm0xs95w9+SRI dnSzFARW9H/H/EJGxB4tdqg/Bpg00qAFgWSaawTjS09oVqrGJ88t1Ia2Be+X2zjcK6+P E/bMkhVoLKN56MyT1FKFhUiij8ga8ffmlw0FOU8p3A8Yjb1HuR0W/uiPSDq+RN7czCeZ jHWMNIIVhQRm8Rf2FCU69iiidxv/PQW1Fn5+UPN04+H8lNLmnioqc8Gjz7mdVI+epLli Jkfrub9J/XiIXf/pG6TXI64JKTvBvicP6D9O4eLjStJEbZPWpIUHHFj6ytbKXNSjpzI1 IfBw== X-Gm-Message-State: AOAM531yVOJRxY1kNSvo2RPc/Ne14g/UdufQuU4Su3p08zvmH0/Ck5kv r0YskF/9ITOGpMA5NY4s+C53YUjf3Jsa5ylpXQLT8w== X-Received: by 2002:a05:6830:3155:: with SMTP id c21mr17186478ots.281.1605189420933; Thu, 12 Nov 2020 05:57:00 -0800 (PST) MIME-Version: 1.0 References: <20201111050559.GA24438@X58A-UD3R> <20201111105441.GA78848@gmail.com> <20201111093609.1bd2b637@gandalf.local.home> <20201112103225.GE14554@X58A-UD3R> In-Reply-To: <20201112103225.GE14554@X58A-UD3R> From: Daniel Vetter Date: Thu, 12 Nov 2020 14:56:49 +0100 Message-ID: Subject: Re: [RFC] Are you good with Lockdep? To: Byungchul Park Cc: Steven Rostedt , Ingo Molnar , Linus Torvalds , Peter Zijlstra , Ingo Molnar , Will Deacon , Linux Kernel Mailing List , Thomas Gleixner , Joel Fernandes , Sasha Levin , "Wilson, Chris" , duyuyang@gmail.com, Johannes Berg , Tejun Heo , "Theodore Ts'o" , Matthew Wilcox , Dave Chinner , Amir Goldstein , "J. Bruce Fields" , Greg KH , kernel-team@lge.com Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 12, 2020 at 11:33 AM Byungchul Park wrote: > > On Wed, Nov 11, 2020 at 09:36:09AM -0500, Steven Rostedt wrote: > > And this is especially true with lockdep, because lockdep only detects the > > deadlock, it doesn't tell you which lock was the incorrect locking. > > > > For example. If we have a locking chain of: > > > > A -> B -> D > > > > A -> C -> D > > > > Which on a correct system looks like this: > > > > lock(A) > > lock(B) > > unlock(B) > > unlock(A) > > > > lock(B) > > lock(D) > > unlock(D) > > unlock(B) > > > > lock(A) > > lock(C) > > unlock(C) > > unlock(A) > > > > lock(C) > > lock(D) > > unlock(D) > > unlock(C) > > > > which creates the above chains in that order. > > > > But, lets say we have a bug and the system boots up doing: > > > > lock(D) > > lock(A) > > unlock(A) > > unlock(D) > > > > which creates the incorrect chain. > > > > D -> A > > > > > > Now you do the correct locking: > > > > lock(A) > > lock(B) > > > > Creates A -> B > > > > lock(A) > > lock(C) > > > > Creates A -> C > > > > lock(B) > > lock(D) > > > > Creates B -> D and lockdep detects: > > > > D -> A -> B -> D > > > > and gives us the lockdep splat!!! > > > > But we don't disable lockdep. We let it continue... > > > > lock(C) > > lock(D) > > > > Which creates C -> D > > > > Now it explodes with D -> A -> C -> D > > It would be better to check both so that we can choose either > breaking a single D -> A chain or both breaking A -> B -> D and > A -> C -> D. > > > Which it already reported. And it can be much more complex when dealing > > with interrupt contexts and longer chains. That is, perhaps a different > > IRQ context is much much worse than longer chains. I understand what you > try to explain. > > > chain had a missing irq disable, now you might get 5 or 6 more lockdep > > splats because of that one bug. > > > > The point I'm making is that the lockdep splats after the first one may > > just be another version of the same bug and not a new one. Worse, if you > > only look at the later lockdep splats, it may be much more difficult to > > find the original bug than if you just had the first one. Believe me, I've > > If the later lockdep splats make us more difficult to fix, then we can > look at the first one. If it's more informative, then we can check the > all splats. Anyway it's up to us. > > > been down that road too many times! > > > > And it can be very difficult to know if new lockdep splats are not the same > > bug, and this will waste a lot of developers time! > > Again, we don't have to waste time. We can go with the first one. > > > This is why the decision to disable lockdep after the first splat was made. > > There were times I wanted to check locking somewhere, but is was using > > linux-next which had a lockdep splat that I didn't care about. So I > > made it not disable lockdep. And then I hit this exact scenario, that the > > one incorrect chain was causing reports all over the place. To solve it, I > > had to patch the incorrect chain to do raw locking to have lockdep ignore > > it ;-) Then I was able to test the code I was interested in. > > It's not a problem of whether it's single-reporting or multi-reporting > but it's the problem of the lock creating the incorrect chain and making > you felt hard to handle. > > Even if you were using single-reporting Lockdep, you anyway had to > continue to ignore locks in the same way until you got to the intest. > > > I think I understand it. For things like completions and other "wait for > > events" we have lockdep annotation, but it is rather awkward to implement. > > Having something that says "lockdep_wait_event()" and > > "lockdep_exec_event()" wrappers would be useful. > > Yes. It's a problem of lack of APIs. It can be done by reverting revert > of cross-release without big change. ;-) +1 on lockdep-native support for this. For another use case I've added annotations for dma_fence_wait, and they're not entirely correct unfortunately. But the false positives is along the lines of "you really shouldn't do this, even if it's in theory deadlock free". See commit 5fbff813a4a328b730cb117027c43a4ae9d8b6c0 Author: Daniel Vetter Date: Tue Jul 7 22:12:05 2020 +0200 dma-fence: basic lockdep annotations for fairly lengthy discussion of the problem and what I ended up with. Thanks, Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch