Received: by 10.223.185.116 with SMTP id b49csp432421wrg; Tue, 20 Feb 2018 01:45:06 -0800 (PST) X-Google-Smtp-Source: AH8x226kWP5PoS92icbmi7bUJa1Pa8iUKbZTZOj92qEOjElZWN0p4aKKvcmnNIuL7u4A6xHT9Nfk X-Received: by 10.99.43.73 with SMTP id r70mr15023939pgr.316.1519119906049; Tue, 20 Feb 2018 01:45:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1519119906; cv=none; d=google.com; s=arc-20160816; b=FpdDoDYe292+tI9oqH4+7wEMbRd1BObcs0xhWBiO7DnUAE4444zLcoZBk3Ilvmoope JJjJC1YuwhJyIqg19tWxscZicWShUx34kuFccc1B9B9Xvt0IiNEXyzvYKoLN+KBKrrwy PPVtOeACGeX994yAMrYfrZIFonz9V24yoH7oXBDvrsV3WSdu8PL2CeFW7SCn19zabl6G 3MPGNQ91bo3VtS7scHG6L1e8XgUP8LE3MvUd5gZ9/14615PCGAc6ZIshVuuOZjbd69fO DSMAWG8ke4KPCSP2hsd5PlapSzsAGtpYZWTVBkVJk5ciWeMzjRraydJQ9mAj6SjqBmvE KMog== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:to:subject:reply-to:dkim-signature :arc-authentication-results; bh=KtKdeQzycYu4n/deTzPakF0XUbIPABoPazPTmyvnBuI=; b=cxv3BpfEEqLcM/qjLaGPt2RuUPGCA61IC3atX8KrKmkiX7tPr80qjKQIWzWaAt7RIC wAgeEHekgf0P76uMwOl9N8BktGth7TdevAN0YjpLws6RXXoI52ur8MuKe9sypUNtDpuV p0VuAtFwGiagY55J3gLuhGiL1gTZ5i5QDZ143g6aYqKxVBH6RAvuPgaGYdptA+1c5CJT a7+tRF/NeKnR6EbX9SgHICZt7VO2Iax2WHpCF581lU6XAMPl3KGkG0KEXLebI0eUx5Sx Hjvrg4RJGRM8dv3IuUrO5FiX+s2N6Z9evFIVuQJjDnUKz/eWpiWpo3GMRRec8n/aMW4u y2Tw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=MVENOzL7; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y6-v6si694452plk.705.2018.02.20.01.44.51; Tue, 20 Feb 2018 01:45:06 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=MVENOzL7; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751417AbeBTJny (ORCPT + 99 others); Tue, 20 Feb 2018 04:43:54 -0500 Received: from mail-wr0-f171.google.com ([209.85.128.171]:41938 "EHLO mail-wr0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750858AbeBTJnv (ORCPT ); Tue, 20 Feb 2018 04:43:51 -0500 Received: by mail-wr0-f171.google.com with SMTP id f14so7995793wre.8 for ; Tue, 20 Feb 2018 01:43:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=reply-to:subject:to:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding:content-language; bh=KtKdeQzycYu4n/deTzPakF0XUbIPABoPazPTmyvnBuI=; b=MVENOzL7eQG+AJGjaLyw5ysXO0XwU42dEX5ULqe1da7lCRA7xc6GGHFgZ8TvMpb7E7 8FWJwHgYI+hn5c5OqHhjmGH9PnyC4Mj8d0XDVU6+AnN8GYjGFXODAklYvJ/GZ1/uR2PV rX+7zPlazrqkng4bGPARWSOzGtxDlZUVpMslUizsZU81FGtf20Gp3PzZrZMOwnpkThp6 mfslgN7CDKEZtvn65RfOl6PBRb4aJAdNiHUeoxpJzbHBxReNq4vOFVvMkaK4/G7OTG3S ZRDcffn/DG5X2vu9ermYT6w5z1oVle4r7Qij1IY8BWeGQAiW/o/lN8uexXtKZkFE+xHn LYiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:reply-to:subject:to:references:from:message-id :date:user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=KtKdeQzycYu4n/deTzPakF0XUbIPABoPazPTmyvnBuI=; b=UTXxogbE2gB2IsRufF05+73ZehVaUnsVVg0tL3wwm1ph6yk5bQclwumqf1CCmH6XiT 1Pt6lpWiQTERXy1ygHj2IS2ASXsovobUDu20AW4CCzkZAc1cPSt7TuVhm6ns05z0fBY2 FHig/FyMboBnJEfIFH8/pnafAMI0Ipjp0m0amXLJEsAx87ixvpFxtWxeMdi32YNd4BZy /YeFavKUXnza+24wi5CDSSd3r36h7QMyosYLqnubX7mkKosL0+D9idpgfmocadDehtga KQTc3rM9Y2iiNbRR73miq6JCvGgaIuOWucjnNoX6gGT7s85h5gryOtLqX6W6mL0WA5++ /VNQ== X-Gm-Message-State: APf1xPCpNxqJKvgBzy72qNru+jiFHhf+DgC6bb9iRqewXwWtNDjhNZ8l BbJSAB8eItXCnT3HHk6XInvTZSXp X-Received: by 10.28.7.68 with SMTP id 65mr12536149wmh.9.1519119829357; Tue, 20 Feb 2018 01:43:49 -0800 (PST) Received: from ?IPv6:2a02:908:1251:8fc0:4c6d:7233:b7e1:3b88? ([2a02:908:1251:8fc0:4c6d:7233:b7e1:3b88]) by smtp.gmail.com with ESMTPSA id y64sm10337218wrb.56.2018.02.20.01.43.48 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Feb 2018 01:43:48 -0800 (PST) Reply-To: christian.koenig@amd.com Subject: Re: [PATCH 1/3] locking/ww_mutex: cleanup lock->ctx usage in amdgpu To: christian.koenig@amd.com, dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org References: <20180215141944.4332-1-christian.koenig@amd.com> <20180219152421.GQ22199@phenom.ffwll.local> <20180219161544.GY22199@phenom.ffwll.local> <2e696981-6b4a-e292-c16a-0f3477dbf8ce@gmail.com> <20180219164354.GB22199@phenom.ffwll.local> From: =?UTF-8?Q?Christian_K=c3=b6nig?= Message-ID: Date: Tue, 20 Feb 2018 10:43:48 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <20180219164354.GB22199@phenom.ffwll.local> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Am 19.02.2018 um 17:43 schrieb Daniel Vetter: > On Mon, Feb 19, 2018 at 05:29:46PM +0100, Christian König wrote: >> [SNIP] >> Well that is not a problem at all. See we don't nest trylock with normal >> lock acquiring, cause that would indeed bypass the whole deadlock detection. >> >> Instead we first use ww_mutex_acquire to lock all BOs which are needed for a >> command submission, including the deadlock detection. >> >> Then all additional BOs which needed to be evicted to fulfill the current >> request are trylocked. > Yeah it's the lock; trylock; lock combo that deadlocks. If lockdep indeed > catches that one (and not some other recursion combo) then I think we > don't have to worry about holding tons of trylock'ed locks. Well I haven't explicitly tested the lock; trylock; lock case, but you get a warning anyway in the lock; ... anything ...; lock case. See the first and the second lock can't use the same acquire context, because that isn't known down the call stack and lockdep warns about that quite intensively. What is a problem is that lockdep sometimes runs out of space to keep track of all the trylocked mutexes, but that could have happened before as well if I'm not completely mistaken. >>> If this is really what you want to do, then we need a >>> ww_mutex_trylock_ctx, which also fills out the ctx value (so that other >>> threads can correctly resolve deadlocks when you hold that lock while >>> trying to grab additional locks). In which case you really don't need the >>> task pointer. >> Actually considered that as well, but it turned out that this is exactly >> what I don't want. >> >> Cause then we wouldn't be able to distinct ww_mutex locked with a context >> (because they are part of the submission) and without (because TTM trylocked >> them). > Out of curiosity, why do you need to know that? The control flow in TTM is that when you trylocked a BO you start to evict it. Now sometimes it happens that we evict it from VRAM to GTT, but then find that we don't have enough GTT space and need to evict something from there to the SYSTEM domain. The problem is now since the reservation object is trylocked because of the VRAM to GTT eviction we can't lock it again because of the GTT to the SYSTEM domain. >>> Yes it's a disappointment that lockdep doesn't correctly track trylocks, >>> it just does basic sanity checks, but then drops them on the floor wrt >>> depency tracking. Just in case you wonder why you're not getting a >>> lockdeps splat for this. Unfortunately I don't understand lockdep enough >>> to be able to fix this gap. >> Sorry to disappoint you, but lockdep is indeed capable to correctly track >> those trylocked BOs. >> >> Got quite a bunch of warning before I was able to resolve to this solution. > Hm, I thought it didn't detect a lock; trylock; lock combo because the > trylock didn't show up in the dependency stack. Maybe that got fixed > meanwhile. Yeah I can confirm that this indeed got fixed. > Assuming that we indeed need both, could we split up the two use-cases for > clarity? I.e. ww_mutex_is_owned_by_ctx, which only takes the ctx (and > forgoes checking for a task, since that's implied) and requires a non-NULL > ctx. And ww_mutex_is_owned_by_task, which only takes the task (and maybe > we should limit it to trylock and _single locks, i.e. WARN_ON(lock->ctx) > if ww-mutex debugging is enabled). > > Or does that hit another requirement of your use-case? Well we could add two tests, one which only checks the context and one which checks that the context is NULL and then checks the mutex owner. But to me it actually looks more like that makes it unnecessary complicated. The use case in amdgpu which could only check the context isn't performance critical. Christian. > -Daniel >