Received: by 2002:a05:7412:31a9:b0:e2:908c:2ebd with SMTP id et41csp4382804rdb; Thu, 14 Sep 2023 23:39:36 -0700 (PDT) X-Google-Smtp-Source: AGHT+IESXPEtGBgUkk+CwsRpsx1t4sGRZDccX044+iF8UVF8CXCQqlQk6CClMXtEZRK/paNdUoIE X-Received: by 2002:a17:90a:6482:b0:26d:11ff:1832 with SMTP id h2-20020a17090a648200b0026d11ff1832mr556518pjj.27.1694759975832; Thu, 14 Sep 2023 23:39:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694759975; cv=none; d=google.com; s=arc-20160816; b=Vbs8fiFGuqMusyp+YDZCXM6yAkMnR27gE01CS/eNNeo6fTThWhvKDzwfvX8wY8pAVW Poiqg9CUgbnouDotiqkWSVw+Wa7Z29T7cOtm7pQXolFSFN3g1jISvC6jm1HkjXKoHUEX z7qPKk0rqnU76e6cm0cthjuw5V/72uRPtQ1g9FLA1Tpz3amCU/QDqap+Iedjz1WnJTcZ ZnmQtT6eBiHRzRkARtJBEL0N0u8y2U2PLLJQDfJZCVbauLTioDz5f4/KSywAhLTgfEnc L28mb3kGJ++hLJS42xopRNV8xKj9bwvGn6qJgyZlYVvB79o60ewAnIH467Bt9C0/5FF2 IjAA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :organization:from:references:cc:to:content-language:subject :user-agent:mime-version:date:message-id:dkim-signature; bh=1nzUUEyUN53R/3LrUDJkNHrMrrKu1pIcJ20AJhQU8Ak=; fh=n8Mvttgqzz8nyeEULdhkkHQM3khASGPGRgAhQDRWIQs=; b=cF0C1STFUiBu1VrrKydbeWBSpDUSj1HO0VK9REwXBRUnqFm7pOodXhjHEX9TVv20tA kqcoiMcPQTm9106xFk+m5w0T5rX1Ufg7OjZgJKKVbY66wMm0Tk8u4IWqHBU1vZe8KXCp xFnJDhFcKst2BbHPMqp/26JfJvaOTQm8XkBu658F7Doxi1kYC9rIgOV1556lAd03rUZZ ptIIPEMa8U8f9Q3ZO4+oh17ms8FreUDFBiyEGTSAttgb4HwgC899BU/zru4JEL3FMK6+ ciJCdOtdDWMc1uigMhAKrZF3CpFUBZ3VwY19z+ajNfSqjI7VpDHuydIrpOpRvO5+w9pU q7nQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=FqqtKgZy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from howler.vger.email (howler.vger.email. [23.128.96.34]) by mx.google.com with ESMTPS id hk13-20020a17090b224d00b00265805c4b47si5437920pjb.190.2023.09.14.23.39.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Sep 2023 23:39:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) client-ip=23.128.96.34; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=FqqtKgZy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 9D3E38373816; Thu, 14 Sep 2023 08:28:47 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240878AbjINP2p (ORCPT + 99 others); Thu, 14 Sep 2023 11:28:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40804 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240914AbjINP2o (ORCPT ); Thu, 14 Sep 2023 11:28:44 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 5D8FBC6 for ; Thu, 14 Sep 2023 08:27:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694705277; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1nzUUEyUN53R/3LrUDJkNHrMrrKu1pIcJ20AJhQU8Ak=; b=FqqtKgZy2BLFcNIZJxyBP/I6CPBfv3vJQpIR4lQZvZlf+f/9Kz7Bak9oV7oP1WNBempaLC rQp4e380YVzp88Ym/hzVVhXNGn1FXL8y1hpz4+BhToV0fFU2/g7YLw0TMFEmKHn139jnu/ UaMlu2hp80WeURiGlfkfGrykhusGHtQ= Received: from mail-ed1-f71.google.com (mail-ed1-f71.google.com [209.85.208.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-539-ganyGMm6OTKMpYyybq-Iig-1; Thu, 14 Sep 2023 11:27:56 -0400 X-MC-Unique: ganyGMm6OTKMpYyybq-Iig-1 Received: by mail-ed1-f71.google.com with SMTP id 4fb4d7f45d1cf-52fc251c79cso772399a12.1 for ; Thu, 14 Sep 2023 08:27:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694705274; x=1695310074; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=1nzUUEyUN53R/3LrUDJkNHrMrrKu1pIcJ20AJhQU8Ak=; b=Si9ZJ7QWeNaobv2S91CRT/k7tNAmcuTrCsyDGOfqmPHUKRND54n8A30kcSSv2S5gMe KfCQboPWN2eOhCJw7ZxIA9YDo71ZYD8z8lCULMoJ6FtZ0PHDTLvZWB3ihWurhKjwTurB BjEK0xNjWrJsZpNd8CzDFJP5JyoR4ia7TiaSQISVIvH0lHhk9AN2dyeFcOOekahKqIu5 /lmkj3pFrCOotKfBzlcneI00vsP5fWBbStkBQHlSTVUS88AZBUPNgoU+MfRj/g02Tp23 Ate6+xiu0ash8WWZZt9qFlJN+j8VEnYPrsK5JFLF3yKY86WahRVteIRaSYHOSqPf+E67 P/Cw== X-Gm-Message-State: AOJu0YzTS2em4UFbPlw58cqZkJQQvz7zzVIc1YmU6IdrUFGdO3Gn9RU4 SwPD48kJscVUGgg+FxcAF3+jTSoeU1HB1eeAUo/Bbv5mKczKYDY0YgThm9Pu0OV2Ve/pBxTXq4B yCPdgGJIzWcEmrP2fEvC5yAxp4lU2ORSS X-Received: by 2002:aa7:c0cb:0:b0:523:47b0:9077 with SMTP id j11-20020aa7c0cb000000b0052347b09077mr5214086edp.38.1694705274427; Thu, 14 Sep 2023 08:27:54 -0700 (PDT) X-Received: by 2002:aa7:c0cb:0:b0:523:47b0:9077 with SMTP id j11-20020aa7c0cb000000b0052347b09077mr5214061edp.38.1694705273991; Thu, 14 Sep 2023 08:27:53 -0700 (PDT) Received: from ?IPV6:2a02:810d:4b3f:de9c:642:1aff:fe31:a15c? ([2a02:810d:4b3f:de9c:642:1aff:fe31:a15c]) by smtp.gmail.com with ESMTPSA id d14-20020a50fb0e000000b00521953ce6e0sm1031725edq.93.2023.09.14.08.27.52 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 14 Sep 2023 08:27:53 -0700 (PDT) Message-ID: Date: Thu, 14 Sep 2023 17:27:51 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation Content-Language: en-US To: =?UTF-8?Q?Thomas_Hellstr=c3=b6m?= Cc: matthew.brost@intel.com, sarah.walker@imgtec.com, nouveau@lists.freedesktop.org, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, boris.brezillon@collabora.com, donald.robson@imgtec.com, daniel@ffwll.ch, christian.koenig@amd.com, faith.ekstrand@collabora.com References: <20230909153125.30032-1-dakr@redhat.com> <20230909153125.30032-7-dakr@redhat.com> <701dfead-e240-b3fb-422c-d49fc7e04595@linux.intel.com> From: Danilo Krummrich Organization: RedHat In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Thu, 14 Sep 2023 08:28:47 -0700 (PDT) On 9/14/23 13:32, Thomas Hellström wrote: > > On 9/14/23 12:57, Danilo Krummrich wrote: >> On 9/13/23 14:16, Danilo Krummrich wrote: >> >> >> >>>>> And validate() can remove it while still holding all dma-resv locks, >>>>> neat! >>>>> However, what if two tasks are trying to lock the VA space >>>>> concurrently? What >>>>> do we do when the drm_gpuvm_bo's refcount drops to zero in >>>>> drm_gpuva_unlink()? >>>>> Are we guaranteed that at this point of time the drm_gpuvm_bo is not >>>>> on the >>>>> evicted list? Because otherwise we would call drm_gpuvm_bo_destroy() >>>>> with the >>>>> dma-resv lock held, which wouldn't be allowed, since >>>>> drm_gpuvm_bo_destroy() >>>>> might drop the last reference to the drm_gem_object and hence we'd >>>>> potentially >>>>> free the dma-resv lock while holding it, at least if it's an external >>>>> object. >>>> >>>> Easiest way in this scheme is to think of the lists as being protected >>>> by the vm's resv lock. That means anybody calling unlink() must also >>>> hold the vm's resv lock. (Which is OK from an UAF point of view, but >>>> perhaps not from a locking inversion POW from an async list update). >>> >>> This would mean that on unlink() we'd need to hold the VM's resv lock and the >>> corresponding GEM's resv lock (in case they're not the same anyways) because the >>> VM's resv lock would protect the external / evicted object lists and the GEM >>> objects resv lock protects the GEM's list of drm_gpuvm_bos and the >>> drm_gpuvm_bo's list of drm_gpuvas. >> >> As mentioned below the same applies for drm_gpuvm_bo_put() since it might >> destroy the vm_bo, which includes removing the vm_bo from external / evicted >> object lists and the GEMs list of vm_bos. >> >> As mentioned, if the GEM's dma-resv is different from the VM's dma-resv we need >> to take both locks. Ultimately, this would mean we need a drm_exec loop, because >> we can't know the order in which to take these locks. Doing a full drm_exec loop >> just to put() a vm_bo doesn't sound reasonable to me. >> >> Can we instead just have an internal mutex for locking the lists such that we >> avoid taking and dropping the spinlocks, which we use currently, in a loop? > > You'd have the same locking inversion problem with a mutex, right? Since in the eviction path you have resv->mutex, from exec you have resv->mutex->resv because validate would attempt to grab resv. Both lists, evict and extobj, would need to have a separate mutex, not a common one. We'd also need a dedicated GEM gpuva lock. Then the only rule would be that you can't hold the dma-resv lock when calling put(). Which I admit is not that nice. With the current spinlock solution drivers wouldn't need to worry about anything locking related though. So maybe I come back to your proposal of having a switch for external locking with dma-resv locks entirely. Such that with external dma-resv locking I skip all the spinlocks and add lockdep checks instead. I think that makes the most sense in terms of taking advantage of external dma-resv locking where possible and on the other hand having a self-contained solution if not. This should get all concerns out of the way, yours, Christian's and Boris'. > > That said, xe currently indeed does the vm+bo exec dance on vma put. > > One reason why that seemingly horrible construct is good, is that when evicting an extobj and you need to access individual vmas to Zap page table entries or TLB flush, those VMAs are not allowed to go away (we're not refcounting them). Holding the bo resv on gpuva put prevents that from happening. Possibly one could use another mutex to protect the gem->vm_bo list to achieve the same, but we'd need to hold it on gpuva put. > > /Thomas > > >> >> - Danilo >> >>> >>>> >>>>> >>>>>>> >>>>>>> For extobjs an outer lock would be enough in case of Xe, but I >>>>>>> really would not >>>>>>> like to add even more complexity just to get the spinlock out of >>>>>>> the way in case >>>>>>> the driver already has an outer lock protecting this path. >>>>>> >>>>>> I must disagree here. These spinlocks and atomic operations are >>>>>> pretty >>>>>> costly and as discussed earlier this type of locking was the reason >>>>>> (at >>>>>> least according to the commit message) that made Christian drop the >>>>>> XArray >>>>>> use in drm_exec for the same set of objects: "The locking overhead >>>>>> is >>>>>> unecessary and measurable". IMHO the spinlock is the added >>>>>> complexity and a >>>>>> single wide lock following the drm locking guidelines set out by >>>>>> Daniel and >>>>>> David should really be the default choice with an opt-in for a >>>>>> spinlock if >>>>>> needed for async and pushing out to a wq is not an option. >>>>> >>>>> For the external object list an outer lock would work as long as it's >>>>> not the >>>>> dma-resv lock of the corresponding GEM object, since here we actually >>>>> need to >>>>> remove the list entry from the external object list on >>>>> drm_gpuvm_bo_destroy(). >>>>> It's just a bit weird design wise that drivers would need to take >>>>> this outer >>>>> lock on: >>>>> >>>>> - drm_gpuvm_bo_extobj_add() >>>>> - drm_gpuvm_bo_destroy()        (and hence also drm_gpuvm_bo_put()) >>>>> - drm_gpuva_unlink()            (because it needs to call >>>>> drm_gpuvm_bo_put()) >>>>> - drm_gpuvm_exec_lock() >>>>> - drm_gpuvm_exec_lock_array() >>>>> - drm_gpuvm_prepare_range() >>>>> >>>>> Given that it seems reasonable to do all the required locking >>>>> internally. >>>> >>>>  From a design POW, there has been a clear direction in XE to make >>>> things similar to mmap() / munmap(), so this outer lock, which in Xe is >>>> an rwsem, is used in a similar way as the mmap_lock. It's protecting >>>> the page-table structures and vma rb tree, the userptr structures and >>>> the extobj list. Basically it's taken early in the exec IOCTL, the >>>> VM_BIND ioctl, the compute rebind worker and the pagefault handler, so >>>> all of the above are just asserting that it is taken in the correct >>>> mode. >>>> >>>> But strictly with this scheme one could also use the vm's dma_resv for >>>> the extobj list since with drm_exec, it's locked before traversing the >>>> list. >>>> >>>> The whole point of this scheme is to rely on locks that you already are >>>> supposed to be holding for various reasons and is simple to comprehend. >>> >>> I don't agree that we're supposed to hold the VM's resv lock anyways for >>> functions like drm_gpuvm_bo_put() or drm_gpuva_unlink(), but I'm fine using it >>> for that purpose nevertheless. >>> >>>> >>>>> >>>>> In order to at least place lockdep checks, the driver would need to >>>>> supply the >>>>> corresponding lock's lockdep_map, because the GPUVM otherwise doesn't >>>>> know about >>>>> the lock. >>>> >>>> Yes, that sounds reasonable. One lockdep map per list. >>> >>> I'd really like to avoid that, especially now that everything got simpler. We >>> should define the actual locks to take instead. >>> >>>> >>>>> >>>>> Out of curiosity, what is the overhead of a spin_lock() that doesn't >>>>> need to >>>>> spin? >>>> >>>> I guess it's hard to tell exactly, but it is much lower on modern x86 >>>> than what it used to be. Not sure about ARM, which is the other >>>> architecture important to us. I figure if there is little cache-line >>>> bouncing the main overhead comes from the implied barriers. >>>> >>>>> >>>>>> >>>>>> A pretty simple way that would not add much code would be >>>>>> >>>>>> static void gpuvm_cond_spin_lock(const struct drm_gpuvm *gpuvm, >>>>>> spinlock_t >>>>>> *lock) >>>>>> >>>>>> { >>>>>> >>>>>>      if (!gpuvm->resv_protected_lists) >>>>>>          spin_lock(lock); >>>>>> >>>>>> } >>>>>> >>>>>>>> For such drivers, that would require anybody calling unlink to >>>>>>>> hold the vm's >>>>>>>> resv, though. >>>>>>> In V4 I want to go back to having a dedicated lock for the GEMs >>>>>>> gpuva list (or >>>>>>> VM_BO list to be more precise). We can't just use the dma-resv >>>>>>> lock for that >>>>>>> with VM_BO abstractions, because on destruction of a VM_BO we >>>>>>> otherwise wouldn't >>>>>>> be allowed to already hold the dma-resv lock. That's the fix I >>>>>>> was referring to >>>>>>> earlier. >>>>>> >>>>>> Yeah, I can see the need for a dedicated lock for the GEM's gpuva >>>>>> list, but >>>>>> holding the vm's dma-resv lock across the unlink shouldn't be a >>>>>> problem. We >>>>>> may free the object and a pointer to the vm's resv during unlink >>>>>> but we >>>>>> don't free the vm's resv.  It'd be a matter of ensuring that any >>>>>> calls to >>>>>> unlink from *within* drm_gpuvm allows it to be held. >>>>> >>>>> Drivers calling unlink() from the fence signaling path can't use the >>>>> VM's >>>>> dma-resv lock. >>>> >>>> Yes, that made me a bit curious because in the current version the code >>>> required the object's dma_resv for unlink() which can't be grabbed >>>> either from the fence signaling path. So are there any drivers actually >>>> wanting to do that? If so, they will either need to resort to the >>>> current spinlock solution or they will need to call unlink from a >>>> workqueue item. >>> >>> As Boris already mentioned we have the dma-resv lock by default or a driver >>> specific GEM gpuva lock as opt-in. Now, we can get rid of the latter. >>> >>>>> >>>>> Also, what if the object is an external object? We can't use the VM's >>>>> dma-resv >>>>> lock here. >>>> >>>> Why? Typically (sync) unlink is only ever called from an unbind-like >>>> operation where it should be trivial to grab the vm's resv. Or, for >>>> that matter any outer lock protecting the extobj list. Rule would be >>>> the drm_gpuvm_bo::entry::extobj  and drm_gpuvm_bo::entry::evict would >>>> be protected by either the vm's dma_resv (or possibly an outer lock in >>>> the case of the extobj list). >>> >>> Outer lock wouldn't have been working for updates in the async path, but >>> shouldn't be relevant anymore. We could use the VM's resv for that. >>> >>>> >>>>>   And we can't have the GEM objs dma-resv lock held when calling >>>>> unlink(), since unlink() calls drm_gpuvm_bo_put(), which if the >>>>> refcount drops >>>>> to zero calls drm_gpuvm_bo_destroy() and drm_gpuvm_bo_destroy() might >>>>> drop the >>>>> last reference of the GEM object. >>>> >>>> Yes, but this is a different problem as to what exactly protects >>>> drm_gpuvm_bo::entry::gem. Either as you suggest an internal per bo list >>>> lock, or if we want to keep the bo's dma_resv we need to ensure that >>>> the caller of dma_resv_unlock(obj->resv) actually refcounts its obj >>>> pointer, and doesn't implicitly rely on the gpuvm_bo's refcount (I know >>>> Boris didn't like that, but requiring an explicit refcount for a >>>> pointer you dereference unless you're under a lock that ensures keeping >>>> the object alive is pretty much required?) But anyway for the >>>> drm_gpuvm_bo::entry::gem list protection (bo resv or internal spinlock) >>>> I don't have a strong preference. >>> >>> We can keep the GEM objects dma-resv lock, however as mentioned above >>> drm_gpuva_unlink() and drm_gpuvm_bo_put() then requires both the VM's resv lock >>> and the GEM's resv lock in case they differ. >>> >> >>>>>> >> >