Received: by 2002:a05:7412:31a9:b0:e2:908c:2ebd with SMTP id et41csp6173924rdb; Mon, 18 Sep 2023 06:25:35 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHgc62bvCTx5wh7cWf4gcZn7sOPw0chr4yCkvmflaRlNNpAYJs0psgoBcKcNEBtYlzaxBR9 X-Received: by 2002:a17:902:db03:b0:1c4:2ca5:8b7c with SMTP id m3-20020a170902db0300b001c42ca58b7cmr8277519plx.61.1695043535031; Mon, 18 Sep 2023 06:25:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695043535; cv=none; d=google.com; s=arc-20160816; b=cKHCUY2ifa4mWsNutgq3WRPgxccso9EGAuD+rQNWyPWwO2ez0SMVHxXGsXer+RvXD6 GYvTxmZ6Wmpw/0tKLE+wQdnYiAfr/kDGjEDNjwrDyLWbjcqcgaOIE7DZ0jfAkNw2xOc8 4V6S0tDj8Ub6rYkJ+dTMt+C+uzkfn9Ur5xxi0ZLU6jr+BbupDIdzs2R7cVSdNZqF8qJo nkaypP0Bjy8ntllKdC3iHhsC/cAvEZMjNBgChiZzlQrwNy4kszQwjZ3GTxDNNwtH7ve6 m5gY1RlqoT8+1AuJly39BFIVAmsWAy2ooWsjb2xTt+3nPbfl0x1oN8gZGiwat5Hg0bvo 5FjA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :organization:references:cc:to:from:content-language:subject :user-agent:mime-version:date:message-id:dkim-signature; bh=caZBuHO02JTq4izkwN1/2mH03NPW+7BFGaAKa0BCkPU=; fh=n8Mvttgqzz8nyeEULdhkkHQM3khASGPGRgAhQDRWIQs=; b=uM2CSky9wFQEo9v4PI8ASPsaS+cQuHC7uCmwpoplkV+4Ye4ngcAxW8YFY9BbO1UlLk hdr80xwUVNK9HUg+X6RE1sWRR1OLtFuBhbqD1sSIsmGfR5OZ1skHOH8/tEDD/GEFZIgi PvXvmC/wHlv3l7Pp5NtkRJBmC4YHMU2Oa3oTPwLwYTU1T1oFLBmNBHhENktZC7hARbU4 8M6fbb8O0Rn0dfzvN4SPyOEVe9pxpKYQY8PMPThYIBFga85pTDM6hLFQlDVBqHhz9VdR EIGsjB+s6q+hCBmNv70uhohHa3Wa68dixmCRXtFygCuuSHoPar6o+wcwlAWt7Hy3iHLr 2Gaw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=YNhD5Byb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from howler.vger.email (howler.vger.email. [23.128.96.34]) by mx.google.com with ESMTPS id v13-20020a170903238d00b001bdf71d52b0si7859237plh.456.2023.09.18.06.25.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Sep 2023 06:25:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) client-ip=23.128.96.34; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=YNhD5Byb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 9C4468083AE2; Mon, 18 Sep 2023 04:23:22 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241354AbjIRLWo (ORCPT + 99 others); Mon, 18 Sep 2023 07:22:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50714 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240223AbjIRLWV (ORCPT ); Mon, 18 Sep 2023 07:22:21 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D9B36B3 for ; Mon, 18 Sep 2023 04:21:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1695036090; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=caZBuHO02JTq4izkwN1/2mH03NPW+7BFGaAKa0BCkPU=; b=YNhD5BybBrzql7oC2vQsRcL3kw7hP78iSTZntQUFgO0u3+J3Nb/MGYMG6cppA/VCS7945q s/UnE+eX8xDN/UQoj9h0ekXj9DGTYPBgWKqE3EKnZr2jwXCEoRW4JNcfQmTHgREk1xTfVn /xWiEh9m4RcSeENjGRfNEVezrkfvsw0= Received: from mail-ed1-f71.google.com (mail-ed1-f71.google.com [209.85.208.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-364-YeHP4M2sMwCfIQH2Dk6Y3A-1; Mon, 18 Sep 2023 07:21:28 -0400 X-MC-Unique: YeHP4M2sMwCfIQH2Dk6Y3A-1 Received: by mail-ed1-f71.google.com with SMTP id 4fb4d7f45d1cf-530b0753764so1601498a12.3 for ; Mon, 18 Sep 2023 04:21:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695036087; x=1695640887; h=content-transfer-encoding:in-reply-to:organization:references:cc:to :from:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=caZBuHO02JTq4izkwN1/2mH03NPW+7BFGaAKa0BCkPU=; b=KUSYokLxLLfHgvJW1inr3fKl3tgrpu1goWMhSpyOJKSiOk3unZbQX+x9WtY1hBBUV9 e5Jd80I4BYXtVJsKVifsgWYSNXZrLNWcFbX33qVAhW028xBuk8Vu9e+uyb6ZFIdy6nRu h8j+oIzgFNz4mP6/LxWae6njRWNNdJiAeVjBSI3tjjSa3QleQ5FzydEsuqhudrhSkkaq 6ndIQSwtQOqIVCqDq2mwk9XIWVc7yRXxsloRl1wmWVDv8DS8msihjA8K/bOSU4kx0Mfy RsPbuNXroJEp6pgSdS2qBta8+s2/+rj6nVQrvaxEvxqrIH/grBgBJtLrYRJDo+Tnpvh+ tiNg== X-Gm-Message-State: AOJu0YyfW3wJ3aU0FJUVXMognQLV0VrJCX0TYpdXxLOwjuBqmn3Jt9JC 8vrNH3va5lADurxQJApk2ezbd5spgm8XiPrR7WjoUKkjj7QYp8osZgTt/tj7lpGnYeFqgayeTHu 4TCe5qerP/jVpdT9WtvcA5L4B X-Received: by 2002:a17:906:ef8b:b0:99e:f3b:2f78 with SMTP id ze11-20020a170906ef8b00b0099e0f3b2f78mr6887990ejb.67.1695036087508; Mon, 18 Sep 2023 04:21:27 -0700 (PDT) X-Received: by 2002:a17:906:ef8b:b0:99e:f3b:2f78 with SMTP id ze11-20020a170906ef8b00b0099e0f3b2f78mr6887978ejb.67.1695036087099; Mon, 18 Sep 2023 04:21:27 -0700 (PDT) Received: from ?IPV6:2a02:810d:4b3f:de9c:642:1aff:fe31:a15c? ([2a02:810d:4b3f:de9c:642:1aff:fe31:a15c]) by smtp.gmail.com with ESMTPSA id kg11-20020a17090776eb00b009a1a653770bsm6341125ejc.87.2023.09.18.04.21.26 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 18 Sep 2023 04:21:26 -0700 (PDT) Message-ID: Date: Mon, 18 Sep 2023 13:21:25 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation Content-Language: en-US From: Danilo Krummrich To: =?UTF-8?Q?Thomas_Hellstr=c3=b6m?= Cc: matthew.brost@intel.com, sarah.walker@imgtec.com, nouveau@lists.freedesktop.org, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, boris.brezillon@collabora.com, donald.robson@imgtec.com, daniel@ffwll.ch, christian.koenig@amd.com, faith.ekstrand@collabora.com References: <20230909153125.30032-1-dakr@redhat.com> <20230909153125.30032-7-dakr@redhat.com> <701dfead-e240-b3fb-422c-d49fc7e04595@linux.intel.com> <7c8c606dbf515bfe9aa3dc3ed6c23ae00d84ef9d.camel@linux.intel.com> <2c4bbd8c-ec2c-91ad-9f27-5476b7e65b4f@redhat.com> Organization: RedHat In-Reply-To: <2c4bbd8c-ec2c-91ad-9f27-5476b7e65b4f@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Mon, 18 Sep 2023 04:23:23 -0700 (PDT) On 9/14/23 19:15, Danilo Krummrich wrote: > On 9/14/23 19:13, Thomas Hellström wrote: >> On Thu, 2023-09-14 at 17:27 +0200, Danilo Krummrich wrote: >>> On 9/14/23 13:32, Thomas Hellström wrote: >>>> >>>> On 9/14/23 12:57, Danilo Krummrich wrote: >>>>> On 9/13/23 14:16, Danilo Krummrich wrote: >>>>> >>>>> >>>>> >>>>>>>> And validate() can remove it while still holding all dma- >>>>>>>> resv locks, >>>>>>>> neat! >>>>>>>> However, what if two tasks are trying to lock the VA space >>>>>>>> concurrently? What >>>>>>>> do we do when the drm_gpuvm_bo's refcount drops to zero in >>>>>>>> drm_gpuva_unlink()? >>>>>>>> Are we guaranteed that at this point of time the >>>>>>>> drm_gpuvm_bo is not >>>>>>>> on the >>>>>>>> evicted list? Because otherwise we would call >>>>>>>> drm_gpuvm_bo_destroy() >>>>>>>> with the >>>>>>>> dma-resv lock held, which wouldn't be allowed, since >>>>>>>> drm_gpuvm_bo_destroy() >>>>>>>> might drop the last reference to the drm_gem_object and >>>>>>>> hence we'd >>>>>>>> potentially >>>>>>>> free the dma-resv lock while holding it, at least if it's >>>>>>>> an external >>>>>>>> object. >>>>>>> >>>>>>> Easiest way in this scheme is to think of the lists as being >>>>>>> protected >>>>>>> by the vm's resv lock. That means anybody calling unlink() >>>>>>> must also >>>>>>> hold the vm's resv lock. (Which is OK from an UAF point of >>>>>>> view, but >>>>>>> perhaps not from a locking inversion POW from an async list >>>>>>> update). >>>>>> >>>>>> This would mean that on unlink() we'd need to hold the VM's >>>>>> resv lock and the >>>>>> corresponding GEM's resv lock (in case they're not the same >>>>>> anyways) because the >>>>>> VM's resv lock would protect the external / evicted object >>>>>> lists and the GEM >>>>>> objects resv lock protects the GEM's list of drm_gpuvm_bos and >>>>>> the >>>>>> drm_gpuvm_bo's list of drm_gpuvas. >>>>> >>>>> As mentioned below the same applies for drm_gpuvm_bo_put() since >>>>> it might >>>>> destroy the vm_bo, which includes removing the vm_bo from >>>>> external / evicted >>>>> object lists and the GEMs list of vm_bos. >>>>> >>>>> As mentioned, if the GEM's dma-resv is different from the VM's >>>>> dma-resv we need >>>>> to take both locks. Ultimately, this would mean we need a >>>>> drm_exec loop, because >>>>> we can't know the order in which to take these locks. Doing a >>>>> full drm_exec loop >>>>> just to put() a vm_bo doesn't sound reasonable to me. >>>>> >>>>> Can we instead just have an internal mutex for locking the lists >>>>> such that we >>>>> avoid taking and dropping the spinlocks, which we use currently, >>>>> in a loop? >>>> >>>> You'd have the same locking inversion problem with a mutex, right? >>>> Since in the eviction path you have resv->mutex, from exec you have >>>> resv->mutex->resv because validate would attempt to grab resv. >>> >>> Both lists, evict and extobj, would need to have a separate mutex, >>> not a common one. >>> We'd also need a dedicated GEM gpuva lock. Then the only rule would >>> be that you can't >>> hold the dma-resv lock when calling put(). Which I admit is not that >>> nice. >>> >>> With the current spinlock solution drivers wouldn't need to worry >>> about anything locking >>> related though. So maybe I come back to your proposal of having a >>> switch for external >>> locking with dma-resv locks entirely. Such that with external dma- >>> resv locking I skip >>> all the spinlocks and add lockdep checks instead. >>> >>> I think that makes the most sense in terms of taking advantage of >>> external dma-resv locking >>> where possible and on the other hand having a self-contained solution >>> if not. This should >>> get all concerns out of the way, yours, Christian's and Boris'. >> >> If we need additional locks yes, I'd prefer the opt-in/opt-out spinlock >> solution, and check back after a while to see if we can remove either >> option once most pitfalls are hit. > > Sounds good, I'll prepare this for a V4. I was considering getting rid of the spinlocks using srcu for both external and evicted objects instead. This would get us rid of taking/dropping the spinlock in every iteration step of the lists, limiting it to a single srcu_read_{lock,unlock} call per list walk. Plus, obviously the list_add_rcu() and list_del_rcu() variants as accessors. The accessors, would probably still need a spinlock to protect against concurrent list_add_rcu()/list_del_rcu() calls, but I think those are not a concern. Any concerns from your side with variant? > > - Danilo > >> >> Thanks, >> /Thomas >> >> >>> >>>> >>>> That said, xe currently indeed does the vm+bo exec dance on vma >>>> put. >>>> >>>> One reason why that seemingly horrible construct is good, is that >>>> when evicting an extobj and you need to access individual vmas to >>>> Zap page table entries or TLB flush, those VMAs are not allowed to >>>> go away (we're not refcounting them). Holding the bo resv on gpuva >>>> put prevents that from happening. Possibly one could use another >>>> mutex to protect the gem->vm_bo list to achieve the same, but we'd >>>> need to hold it on gpuva put. >>>> >>>> /Thomas >>>> >>>> >>>>> >>>>> - Danilo >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>>>> For extobjs an outer lock would be enough in case of >>>>>>>>>> Xe, but I >>>>>>>>>> really would not >>>>>>>>>> like to add even more complexity just to get the >>>>>>>>>> spinlock out of >>>>>>>>>> the way in case >>>>>>>>>> the driver already has an outer lock protecting this >>>>>>>>>> path. >>>>>>>>> >>>>>>>>> I must disagree here. These spinlocks and atomic >>>>>>>>> operations are >>>>>>>>> pretty >>>>>>>>> costly and as discussed earlier this type of locking was >>>>>>>>> the reason >>>>>>>>> (at >>>>>>>>> least according to the commit message) that made >>>>>>>>> Christian drop the >>>>>>>>> XArray >>>>>>>>> use in drm_exec for the same set of objects: "The locking >>>>>>>>> overhead >>>>>>>>> is >>>>>>>>> unecessary and measurable". IMHO the spinlock is the >>>>>>>>> added >>>>>>>>> complexity and a >>>>>>>>> single wide lock following the drm locking guidelines set >>>>>>>>> out by >>>>>>>>> Daniel and >>>>>>>>> David should really be the default choice with an opt-in >>>>>>>>> for a >>>>>>>>> spinlock if >>>>>>>>> needed for async and pushing out to a wq is not an >>>>>>>>> option. >>>>>>>> >>>>>>>> For the external object list an outer lock would work as >>>>>>>> long as it's >>>>>>>> not the >>>>>>>> dma-resv lock of the corresponding GEM object, since here >>>>>>>> we actually >>>>>>>> need to >>>>>>>> remove the list entry from the external object list on >>>>>>>> drm_gpuvm_bo_destroy(). >>>>>>>> It's just a bit weird design wise that drivers would need >>>>>>>> to take >>>>>>>> this outer >>>>>>>> lock on: >>>>>>>> >>>>>>>> - drm_gpuvm_bo_extobj_add() >>>>>>>> - drm_gpuvm_bo_destroy()        (and hence also >>>>>>>> drm_gpuvm_bo_put()) >>>>>>>> - drm_gpuva_unlink()            (because it needs to call >>>>>>>> drm_gpuvm_bo_put()) >>>>>>>> - drm_gpuvm_exec_lock() >>>>>>>> - drm_gpuvm_exec_lock_array() >>>>>>>> - drm_gpuvm_prepare_range() >>>>>>>> >>>>>>>> Given that it seems reasonable to do all the required >>>>>>>> locking >>>>>>>> internally. >>>>>>> >>>>>>>   From a design POW, there has been a clear direction in XE to >>>>>>> make >>>>>>> things similar to mmap() / munmap(), so this outer lock, >>>>>>> which in Xe is >>>>>>> an rwsem, is used in a similar way as the mmap_lock. It's >>>>>>> protecting >>>>>>> the page-table structures and vma rb tree, the userptr >>>>>>> structures and >>>>>>> the extobj list. Basically it's taken early in the exec >>>>>>> IOCTL, the >>>>>>> VM_BIND ioctl, the compute rebind worker and the pagefault >>>>>>> handler, so >>>>>>> all of the above are just asserting that it is taken in the >>>>>>> correct >>>>>>> mode. >>>>>>> >>>>>>> But strictly with this scheme one could also use the vm's >>>>>>> dma_resv for >>>>>>> the extobj list since with drm_exec, it's locked before >>>>>>> traversing the >>>>>>> list. >>>>>>> >>>>>>> The whole point of this scheme is to rely on locks that you >>>>>>> already are >>>>>>> supposed to be holding for various reasons and is simple to >>>>>>> comprehend. >>>>>> >>>>>> I don't agree that we're supposed to hold the VM's resv lock >>>>>> anyways for >>>>>> functions like drm_gpuvm_bo_put() or drm_gpuva_unlink(), but >>>>>> I'm fine using it >>>>>> for that purpose nevertheless. >>>>>> >>>>>>> >>>>>>>> >>>>>>>> In order to at least place lockdep checks, the driver would >>>>>>>> need to >>>>>>>> supply the >>>>>>>> corresponding lock's lockdep_map, because the GPUVM >>>>>>>> otherwise doesn't >>>>>>>> know about >>>>>>>> the lock. >>>>>>> >>>>>>> Yes, that sounds reasonable. One lockdep map per list. >>>>>> >>>>>> I'd really like to avoid that, especially now that everything >>>>>> got simpler. We >>>>>> should define the actual locks to take instead. >>>>>> >>>>>>> >>>>>>>> >>>>>>>> Out of curiosity, what is the overhead of a spin_lock() >>>>>>>> that doesn't >>>>>>>> need to >>>>>>>> spin? >>>>>>> >>>>>>> I guess it's hard to tell exactly, but it is much lower on >>>>>>> modern x86 >>>>>>> than what it used to be. Not sure about ARM, which is the >>>>>>> other >>>>>>> architecture important to us. I figure if there is little >>>>>>> cache-line >>>>>>> bouncing the main overhead comes from the implied barriers. >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> A pretty simple way that would not add much code would be >>>>>>>>> >>>>>>>>> static void gpuvm_cond_spin_lock(const struct drm_gpuvm >>>>>>>>> *gpuvm, >>>>>>>>> spinlock_t >>>>>>>>> *lock) >>>>>>>>> >>>>>>>>> { >>>>>>>>> >>>>>>>>>       if (!gpuvm->resv_protected_lists) >>>>>>>>>           spin_lock(lock); >>>>>>>>> >>>>>>>>> } >>>>>>>>> >>>>>>>>>>> For such drivers, that would require anybody calling >>>>>>>>>>> unlink to >>>>>>>>>>> hold the vm's >>>>>>>>>>> resv, though. >>>>>>>>>> In V4 I want to go back to having a dedicated lock for >>>>>>>>>> the GEMs >>>>>>>>>> gpuva list (or >>>>>>>>>> VM_BO list to be more precise). We can't just use the >>>>>>>>>> dma-resv >>>>>>>>>> lock for that >>>>>>>>>> with VM_BO abstractions, because on destruction of a >>>>>>>>>> VM_BO we >>>>>>>>>> otherwise wouldn't >>>>>>>>>> be allowed to already hold the dma-resv lock. That's >>>>>>>>>> the fix I >>>>>>>>>> was referring to >>>>>>>>>> earlier. >>>>>>>>> >>>>>>>>> Yeah, I can see the need for a dedicated lock for the >>>>>>>>> GEM's gpuva >>>>>>>>> list, but >>>>>>>>> holding the vm's dma-resv lock across the unlink >>>>>>>>> shouldn't be a >>>>>>>>> problem. We >>>>>>>>> may free the object and a pointer to the vm's resv during >>>>>>>>> unlink >>>>>>>>> but we >>>>>>>>> don't free the vm's resv.  It'd be a matter of ensuring >>>>>>>>> that any >>>>>>>>> calls to >>>>>>>>> unlink from *within* drm_gpuvm allows it to be held. >>>>>>>> >>>>>>>> Drivers calling unlink() from the fence signaling path >>>>>>>> can't use the >>>>>>>> VM's >>>>>>>> dma-resv lock. >>>>>>> >>>>>>> Yes, that made me a bit curious because in the current >>>>>>> version the code >>>>>>> required the object's dma_resv for unlink() which can't be >>>>>>> grabbed >>>>>>> either from the fence signaling path. So are there any >>>>>>> drivers actually >>>>>>> wanting to do that? If so, they will either need to resort to >>>>>>> the >>>>>>> current spinlock solution or they will need to call unlink >>>>>>> from a >>>>>>> workqueue item. >>>>>> >>>>>> As Boris already mentioned we have the dma-resv lock by default >>>>>> or a driver >>>>>> specific GEM gpuva lock as opt-in. Now, we can get rid of the >>>>>> latter. >>>>>> >>>>>>>> >>>>>>>> Also, what if the object is an external object? We can't >>>>>>>> use the VM's >>>>>>>> dma-resv >>>>>>>> lock here. >>>>>>> >>>>>>> Why? Typically (sync) unlink is only ever called from an >>>>>>> unbind-like >>>>>>> operation where it should be trivial to grab the vm's resv. >>>>>>> Or, for >>>>>>> that matter any outer lock protecting the extobj list. Rule >>>>>>> would be >>>>>>> the drm_gpuvm_bo::entry::extobj  and >>>>>>> drm_gpuvm_bo::entry::evict would >>>>>>> be protected by either the vm's dma_resv (or possibly an >>>>>>> outer lock in >>>>>>> the case of the extobj list). >>>>>> >>>>>> Outer lock wouldn't have been working for updates in the async >>>>>> path, but >>>>>> shouldn't be relevant anymore. We could use the VM's resv for >>>>>> that. >>>>>> >>>>>>> >>>>>>>>    And we can't have the GEM objs dma-resv lock held when >>>>>>>> calling >>>>>>>> unlink(), since unlink() calls drm_gpuvm_bo_put(), which if >>>>>>>> the >>>>>>>> refcount drops >>>>>>>> to zero calls drm_gpuvm_bo_destroy() and >>>>>>>> drm_gpuvm_bo_destroy() might >>>>>>>> drop the >>>>>>>> last reference of the GEM object. >>>>>>> >>>>>>> Yes, but this is a different problem as to what exactly >>>>>>> protects >>>>>>> drm_gpuvm_bo::entry::gem. Either as you suggest an internal >>>>>>> per bo list >>>>>>> lock, or if we want to keep the bo's dma_resv we need to >>>>>>> ensure that >>>>>>> the caller of dma_resv_unlock(obj->resv) actually refcounts >>>>>>> its obj >>>>>>> pointer, and doesn't implicitly rely on the gpuvm_bo's >>>>>>> refcount (I know >>>>>>> Boris didn't like that, but requiring an explicit refcount >>>>>>> for a >>>>>>> pointer you dereference unless you're under a lock that >>>>>>> ensures keeping >>>>>>> the object alive is pretty much required?) But anyway for the >>>>>>> drm_gpuvm_bo::entry::gem list protection (bo resv or internal >>>>>>> spinlock) >>>>>>> I don't have a strong preference. >>>>>> >>>>>> We can keep the GEM objects dma-resv lock, however as mentioned >>>>>> above >>>>>> drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires both >>>>>> the VM's resv lock >>>>>> and the GEM's resv lock in case they differ. >>>>>> >>>>> >>>>>>>>> >>>>> >>>> >>> >>