Received: by 10.223.185.116 with SMTP id b49csp526475wrg; Tue, 20 Feb 2018 03:34:34 -0800 (PST) X-Google-Smtp-Source: AH8x224jmkngA3ghtPlg21M1kBR461ZW9AXTAATvq7ymjlHXKTZlJL9JByY+y/3g49ZDHJSKwLXD X-Received: by 10.98.110.202 with SMTP id j193mr17831279pfc.19.1519126474605; Tue, 20 Feb 2018 03:34:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1519126474; cv=none; d=google.com; s=arc-20160816; b=Tu8QOy9w4bxqLJZDoNvRbasY9vurnjcy8BOGsA47beuBjdUJpxRugtELuWf6Ox3p6x WqZ4q+MMgzEYE/jBAYj0+p660zPIDgIEflTl8cGdILE9tkdOr3PMi1aOYxGj/Z7Mo/Q9 F1SERoDkTvE5JhuBfN3ozl3ZKW1sj+Osh2VWmElT0nwuX+90hj2SevHIT20BkKWHB5h4 UYAnv7YlXXiJVPaMgAhsaPR+b/VwuY28Iqqd2VJRfcem8VWHcsaC0+TR7xi++fZB9kBg yPkpMSZtZ2DC8nrzRyfLZ/p0ccWKRq+t9Uo5fwUxW+cO+aqZ5gx1mMpsUFf0g4ELYyb4 4I+g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:mail-followup-to:message-id:subject:cc:to:from:date :dkim-signature:arc-authentication-results; bh=RvnDLyE6fd2H2EkUQyI6YvsMFDK0PlkagygqdU1vsX0=; b=sVoIYATNA/XXdS00ennqkh93On/Ao1LgqgPz6fJvp2tbyU/QOibuxs8Bk+lmTSAY8O XujVlUuqhry/5KxmJGv9tn9U6Pyz39j8kY0mPr2b1O+MU0INz4BYoL+F/aOuyqY/qLUL AE87Yh5pwY4QgJQBjIy0FwnnqXTkEBWJJFbZ4unpc4jKKgwT++mjZOcFULv4SBhvpj6d CPqr/LOjqw0v+s1rWAOYwrVUMag9OVDH09F6uPLyFhvQOGg9Rf4UAzH+ozZizEJIvsoO DFCr3dVQyr5ubMWvJAO9msHoe0mDkVv5+MrmKfgkc2+e7hjsITQWgCZU1V9Alho/vpqa Eydw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@ffwll.ch header.s=google header.b=LsYC6CrA; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k33-v6si2158257pld.303.2018.02.20.03.34.19; Tue, 20 Feb 2018 03:34:34 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@ffwll.ch header.s=google header.b=LsYC6CrA; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751475AbeBTLdY (ORCPT + 99 others); Tue, 20 Feb 2018 06:33:24 -0500 Received: from mail-wr0-f169.google.com ([209.85.128.169]:41855 "EHLO mail-wr0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751348AbeBTLdW (ORCPT ); Tue, 20 Feb 2018 06:33:22 -0500 Received: by mail-wr0-f169.google.com with SMTP id f14so8948174wre.8 for ; Tue, 20 Feb 2018 03:33:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=sender:date:from:to:cc:subject:message-id:mail-followup-to :references:mime-version:content-disposition :content-transfer-encoding:in-reply-to:user-agent; bh=RvnDLyE6fd2H2EkUQyI6YvsMFDK0PlkagygqdU1vsX0=; b=LsYC6CrARX2imMZgf06mlTsQS2tuRhkGS24IalNeGQ8b/5wa4nOtVE78179Ius1sWX qfvj/3po0Lv6BL5cEhW6Zfox1v2d0kdxglSkkDXi4Pd2ZfZxCPpEZJ6MHmX5LChyHnZ5 ggdBv+qSuK7q/7Ln/YBt/uP0/h2FABihGlQjc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :mail-followup-to:references:mime-version:content-disposition :content-transfer-encoding:in-reply-to:user-agent; bh=RvnDLyE6fd2H2EkUQyI6YvsMFDK0PlkagygqdU1vsX0=; b=KUtf7ZsoVsrUUe48WS9/iqCZRW9XILhSGY6ogBH6wmyF8hMF8RvbsiDGnctaBH42n2 mJ1t2tFo4zeTw7f1HO101qc/00lF3xFk1tx2fQZ5lTqdLJVA0z/TNhHz/m5GF25i7ncF ZychTBuyHgFvLcUafaSMVC0cT+tfnAwo+TovijaPG32MmYO8p4HHKjr1MwFG53bdUtyl WQYiohioqlR7XW2jwlB2kmv1PKYaE74itT/bIyP5CcUPTOs6Bf8FMb8KEr4oFVM0x+TW bG4lpHHy/o2UJ3XSuzJJwlpQB+v5mmQKQpE1kyn2bYsNrY5U/m+NtKywadT3G/GmfEVy j62A== X-Gm-Message-State: APf1xPAH7NvxbbInfYodm3YyxfPNOHPKAFdPDNJtRa/c2um2CIjHAgF6 Y9vX61JxjA0Zr48Hk/MhavNDzBD1 X-Received: by 10.80.179.200 with SMTP id t8mr24377172edd.81.1519126401343; Tue, 20 Feb 2018 03:33:21 -0800 (PST) Received: from phenom.ffwll.local (212-51-149-109.fiber7.init7.net. [212.51.149.109]) by smtp.gmail.com with ESMTPSA id r7sm20585798edc.23.2018.02.20.03.33.20 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 20 Feb 2018 03:33:20 -0800 (PST) Date: Tue, 20 Feb 2018 12:33:18 +0100 From: Daniel Vetter To: christian.koenig@amd.com Cc: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/3] locking/ww_mutex: cleanup lock->ctx usage in amdgpu Message-ID: <20180220113318.GN22199@phenom.ffwll.local> Mail-Followup-To: christian.koenig@amd.com, dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org References: <20180215141944.4332-1-christian.koenig@amd.com> <20180219152421.GQ22199@phenom.ffwll.local> <20180219161544.GY22199@phenom.ffwll.local> <2e696981-6b4a-e292-c16a-0f3477dbf8ce@gmail.com> <20180219164354.GB22199@phenom.ffwll.local> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Operating-System: Linux phenom 4.14.0-3-amd64 User-Agent: Mutt/1.9.3 (2018-01-21) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 20, 2018 at 10:43:48AM +0100, Christian K?nig wrote: > Am 19.02.2018 um 17:43 schrieb Daniel Vetter: > > On Mon, Feb 19, 2018 at 05:29:46PM +0100, Christian K?nig wrote: > > > [SNIP] > > > Well that is not a problem at all. See we don't nest trylock with normal > > > lock acquiring, cause that would indeed bypass the whole deadlock detection. > > > > > > Instead we first use ww_mutex_acquire to lock all BOs which are needed for a > > > command submission, including the deadlock detection. > > > > > > Then all additional BOs which needed to be evicted to fulfill the current > > > request are trylocked. > > Yeah it's the lock; trylock; lock combo that deadlocks. If lockdep indeed > > catches that one (and not some other recursion combo) then I think we > > don't have to worry about holding tons of trylock'ed locks. > > Well I haven't explicitly tested the lock; trylock; lock case, but you get a > warning anyway in the lock; ... anything ...; lock case. > > See the first and the second lock can't use the same acquire context, > because that isn't known down the call stack and lockdep warns about that > quite intensively. Ah, so the ttm_ctx I've spotted was something entirely different and doesn't contain the ww_acquire_ctx (I didn't check)? I'd assume you have the same ctx passed around to everything in ttm, but if that doesn't exist then we can indeed not annotate ww_mutex_trylock_ctx with the right ctx. > What is a problem is that lockdep sometimes runs out of space to keep track > of all the trylocked mutexes, but that could have happened before as well if > I'm not completely mistaken. > > > > > If this is really what you want to do, then we need a > > > > ww_mutex_trylock_ctx, which also fills out the ctx value (so that other > > > > threads can correctly resolve deadlocks when you hold that lock while > > > > trying to grab additional locks). In which case you really don't need the > > > > task pointer. > > > Actually considered that as well, but it turned out that this is exactly > > > what I don't want. > > > > > > Cause then we wouldn't be able to distinct ww_mutex locked with a context > > > (because they are part of the submission) and without (because TTM trylocked > > > them). > > Out of curiosity, why do you need to know that? > > The control flow in TTM is that when you trylocked a BO you start to evict > it. > > Now sometimes it happens that we evict it from VRAM to GTT, but then find > that we don't have enough GTT space and need to evict something from there > to the SYSTEM domain. > > The problem is now since the reservation object is trylocked because of the > VRAM to GTT eviction we can't lock it again because of the GTT to the SYSTEM > domain. > > > > > Yes it's a disappointment that lockdep doesn't correctly track trylocks, > > > > it just does basic sanity checks, but then drops them on the floor wrt > > > > depency tracking. Just in case you wonder why you're not getting a > > > > lockdeps splat for this. Unfortunately I don't understand lockdep enough > > > > to be able to fix this gap. > > > Sorry to disappoint you, but lockdep is indeed capable to correctly track > > > those trylocked BOs. > > > > > > Got quite a bunch of warning before I was able to resolve to this solution. > > Hm, I thought it didn't detect a lock; trylock; lock combo because the > > trylock didn't show up in the dependency stack. Maybe that got fixed > > meanwhile. > > Yeah I can confirm that this indeed got fixed. > > > Assuming that we indeed need both, could we split up the two use-cases for > > clarity? I.e. ww_mutex_is_owned_by_ctx, which only takes the ctx (and > > forgoes checking for a task, since that's implied) and requires a non-NULL > > ctx. And ww_mutex_is_owned_by_task, which only takes the task (and maybe > > we should limit it to trylock and _single locks, i.e. WARN_ON(lock->ctx) > > if ww-mutex debugging is enabled). > > > > Or does that hit another requirement of your use-case? > > Well we could add two tests, one which only checks the context and one which > checks that the context is NULL and then checks the mutex owner. > > But to me it actually looks more like that makes it unnecessary complicated. > The use case in amdgpu which could only check the context isn't performance > critical. Oh I'm not worried about the runtime overhead at all, I'm worried about conceptual clarity of this stuff. If you have a ctx there's no need to also look at ->owner. Another idea: We drop the task argument from functions and go with the following logic: ww_mutex_is_owner(lock, ctx) { if (ctx) return lock->ctx == ctx; else return lock->owner == current; } I think that would solve your use case, and gives us the neat interface I'm aiming for. Kerneldoc can then explain what's happening for a NULL ctx. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch