Received: by 2002:a05:7412:b101:b0:e2:908c:2ebd with SMTP id az1csp3327246rdb; Thu, 16 Nov 2023 06:47:22 -0800 (PST) X-Google-Smtp-Source: AGHT+IEKHPTWWcFg6I4ifkdHIw1IOia5HYYnqtmeBCBy79ZlWbeUJQ8MAvkOPonxtuS7ipZHyG1i X-Received: by 2002:a05:6a00:6c94:b0:6be:2dce:cf5a with SMTP id jc20-20020a056a006c9400b006be2dcecf5amr17253402pfb.26.1700146041207; Thu, 16 Nov 2023 06:47:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700146041; cv=none; d=google.com; s=arc-20160816; b=DCb2k931Pum+s/pXRLhHiHB53s3TmSnK2asG0I2KS2su2THF1/oS0oSK2vKAnDLQIo ptS/f4S0bjdAh2KZ7Na6Iby320UbpGFCOjA0ntMpwW4hSgjAx/vTqQsqRhWn6FGWC6bf ijvlHwAMWYiLh3m+IDAtr8A/yXZ/xbgkkFAapBaBDMjVuvAvEynceNlIAaaYTzL+aFeD qglvPkQIgB/N8Yct1/aBOXWqe/Zb3IyLPR5ItI0wIMCYRdqRyzD31WHkYxGwQOCHaFOW +uQAVQ3xHlpUwUCob73zQ8o3UrGiCDKizm4MAcnK3ThuloSusrRs7ZychcUloa45tK3S wZRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date:dkim-signature; bh=54FLteRWYNWp66/hcv+NYqETxDgYbyeuFigFT0+Za0g=; fh=CHbh2YVtnwzeuuOtwpDAifciHnnUHqEfmHGn2DBD17M=; b=sUBwtIsDHf5Zt7wEY7l9+Qx/FxGzbbaIqcti8qvuBAu3VPwf48t/zH3juG7/ge+XCN 3YPL8jTn2ySPyBMn5HCopBGo04hXBPuuyN4yQHNz9feCFgD4SL8SiZwpD52SnYb5A0wP O9oWwM00o0/TnqzANbOpo1bfXWqsRQeigfRHWQWR7Cloqg3lsdrEZ5e8ZIimIPZYoxlX TW+cy55vtL93GODCOX9JRfa4FuC2Dn9hk1qRnTwBlLopL8t64/Aof7rRBYcCBsbF/yt3 Tu9d4Tfj0oMnEolK1pEJK8RXNaC7U+MYGRBLsLSiBOClIQ3nimQwMNPZW5dlrYcItUoN pjtw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@collabora.com header.s=mail header.b=YjOYFEKf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=collabora.com Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id m65-20020a632644000000b005bd5a50b559si12016091pgm.715.2023.11.16.06.47.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Nov 2023 06:47:21 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@collabora.com header.s=mail header.b=YjOYFEKf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=collabora.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id C478E821ADA7; Thu, 16 Nov 2023 06:47:19 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345317AbjKPOrT (ORCPT + 99 others); Thu, 16 Nov 2023 09:47:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55550 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345323AbjKPOrS (ORCPT ); Thu, 16 Nov 2023 09:47:18 -0500 Received: from madras.collabora.co.uk (madras.collabora.co.uk [46.235.227.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BE1FED4D for ; Thu, 16 Nov 2023 06:47:13 -0800 (PST) Received: from localhost (cola.collaboradmins.com [195.201.22.229]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bbrezillon) by madras.collabora.co.uk (Postfix) with ESMTPSA id C4EA2660734C; Thu, 16 Nov 2023 14:47:11 +0000 (GMT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1700146032; bh=ANLNJfWhLmya72L/lYatmTv2SBWTBnT/H2OPpKUbXQY=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=YjOYFEKfPK0ZrxtaGuBEK55kgHsjxZwJy8CJZCXuT7zTHrKHGRVLMNSf3slBAWh52 v07Pq53UOV7+0sTVu3b4AUzCoLt2eUgB9lG05p4AuRcNXlY/hIqI95Zdi5r7zTGXwm JdQse41KSkeW02ZEVThhPltnXgYRZcykuQtT0hZD1tVSZW9yHiNssyYXhEFRgD4+ap vFKvPbtFrYq/mvX0T55l4Gjws8EI1Wvhl8ctwSYYVjwKE99VEsK5uwOibpYHDdboc1 hxxEG7dnP865u3AlxKw4hCIMhFhyXiN5RuN3X1d32CfMEi9fWzTKYTFbUxH3eUVSbc ygjmUpaP2r4RA== Date: Thu, 16 Nov 2023 15:47:08 +0100 From: Boris Brezillon To: Thomas =?UTF-8?B?SGVsbHN0csO2bQ==?= Cc: intel-xe@lists.freedesktop.org, Rodrigo Vivi , Matthew Brost , Danilo Krummrich , Joonas Lahtinen , Oak Zeng , Daniel Vetter , Maarten Lankhorst , Francois Dugast , dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v4] Documentation/gpu: VM_BIND locking document Message-ID: <20231116154708.148ce241@collabora.com> In-Reply-To: <656d5f890de7ba3af05616122a9bd399badd35bc.camel@linux.intel.com> References: <20231115124937.6740-1-thomas.hellstrom@linux.intel.com> <20231116104851.114bdb08@collabora.com> <0850281b667c4b88163dab60737dbc945ad742fd.camel@linux.intel.com> <20231116142707.044aeec2@collabora.com> <656d5f890de7ba3af05616122a9bd399badd35bc.camel@linux.intel.com> Organization: Collabora X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Thu, 16 Nov 2023 06:47:20 -0800 (PST) On Thu, 16 Nov 2023 14:53:50 +0100 Thomas Hellstr=C3=B6m wrote: > On Thu, 2023-11-16 at 14:27 +0100, Boris Brezillon wrote: > > On Thu, 16 Nov 2023 12:48:45 +0100 > > Thomas Hellstr=C3=B6m wrote: > > =20 > > > Hi, Boris, > > >=20 > > > Thanks for reviewing. Some comments below: > > >=20 > > > On Thu, 2023-11-16 at 10:48 +0100, Boris Brezillon wrote: =20 > > > > Hi Thomas, > > > >=20 > > > > On Wed, 15 Nov 2023 13:49:37 +0100 > > > > Thomas Hellstr=C3=B6m wrote: > > > > =C2=A0 =20 > > > > > Add the first version of the VM_BIND locking document which is > > > > > intended to be part of the xe driver upstreaming agreement. > > > > >=20 > > > > > The document describes and discuss the locking used during > > > > > exec- > > > > > functions, evicton and for userptr gpu-vmas. Intention is to be > > > > > using the > > > > > same nomenclature as the drm-vm-bind-async.rst. > > > > >=20 > > > > > v2: > > > > > - s/gvm/gpu_vm/g (Rodrigo Vivi) > > > > > - Clarify the userptr seqlock with a pointer to > > > > > mm/mmu_notifier.c > > > > > =C2=A0 (Rodrigo Vivi) > > > > > - Adjust commit message accordingly. > > > > > - Add SPDX license header. > > > > >=20 > > > > > v3: > > > > > - Large update to align with the drm_gpuvm manager locking > > > > > - Add "Efficient userptr gpu_vma exec function iteration" > > > > > section > > > > > - Add "Locking at bind- and unbind time" section. > > > > >=20 > > > > > v4: > > > > > - Fix tabs vs space errors by untabifying (Rodrigo Vivi) > > > > > - Minor style fixes and typos (Rodrigo Vivi) > > > > > - Clarify situations where stale GPU mappings are occurring and > > > > > how > > > > > =C2=A0 access through these mappings are blocked. (Rodrigo Vivi) > > > > > - Insert into the toctree in implementation_guidelines.rst > > > > >=20 > > > > > Cc: Rodrigo Vivi > > > > > Signed-off-by: Thomas Hellstr=C3=B6m > > > > > > > > > > --- > > > > > =C2=A0Documentation/gpu/drm-vm-bind-locking.rst=C2=A0=C2=A0=C2=A0= =C2=A0 | 503 > > > > > ++++++++++++++++++ > > > > > =C2=A0.../gpu/implementation_guidelines.rst=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0 1 + > > > > > =C2=A02 files changed, 504 insertions(+) > > > > > =C2=A0create mode 100644 Documentation/gpu/drm-vm-bind-locking.rst > > > > >=20 > > > > > diff --git a/Documentation/gpu/drm-vm-bind-locking.rst > > > > > b/Documentation/gpu/drm-vm-bind-locking.rst > > > > > new file mode 100644 > > > > > index 000000000000..bc701157cb34 > > > > > --- /dev/null > > > > > +++ b/Documentation/gpu/drm-vm-bind-locking.rst > > > > > @@ -0,0 +1,503 @@ > > > > > +.. SPDX-License-Identifier: (GPL-2.0+ OR MIT) > > > > > + > > > > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > +VM_BIND locking > > > > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > + > > > > > +This document attempts to describe what's needed to get > > > > > VM_BIND > > > > > locking right, > > > > > +including the userptr mmu_notifier locking and it will also > > > > > discuss some > > > > > +optimizations to get rid of the looping through of all userptr > > > > > mappings and > > > > > +external / shared object mappings that is needed in the > > > > > simplest > > > > > +implementation. It will also discuss some implications for > > > > > faulting gpu_vms. > > > > > + > > > > > +Nomenclature > > > > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > + > > > > > +* ``Context``: GPU execution context. > > > > > +* ``gpu_vm``: Abstraction of a virtual GPU address space with > > > > > +=C2=A0 meta-data. Typically one per client (DRM file-private), or > > > > > one > > > > > per > > > > > +=C2=A0 context.=C2=A0 =20 > > > >=20 > > > > Should we mention that it's a driver object, likely inheriting > > > > from > > > > drm_gpuvm?=C2=A0 =20 > > >=20 > > > Yes, I can try to be a bit more drm_gpuvm-centric throughout the > > > document, although I want to avoid being too specific due to the > > > current rapid drm_gpuvm rate of change, which might also affect > > > this > > > document. I guess I will have to commit for now to update the > > > document > > > on each gpuvm series we land... =20 > >=20 > > Well, I'd suggest that the one doing changes to drm_gpuvm gets to > > update this document along the way, or even better, make this > > documentation part of the drm_gpuvm doc, so there's no excuse to not > > update it when drm_gpuvm is extended. =20 >=20 > Sure, Although I think initial merge will be as is, and then merging > with drm_gpuvm will come at a later point. Sure, I have no problem with that. > > > =20 > > > > =C2=A0 =20 > > > > > +=C2=A0=C2=A0 add_dma_fence(&obj->resv, job_dma_fence); > > > > > + > > > > > +=C2=A0=C2=A0 dma_resv_unlock(&obj->resv); > > > > > +=C2=A0=C2=A0 put_object(obj); > > > > > + > > > > > +Note that since the object is local to the gpu_vm, it will > > > > > share > > > > > the gpu_vm's > > > > > +dma_resv lock so that ``obj->resv =3D=3D gpu_vm->resv``. > > > > > +The gpu_vm_bos marked for eviction are put on the gpu_vm's > > > > > evict > > > > > list, > > > > > +which is protected by ``gpu_vm->resv``, that is always locked > > > > > while > > > > > +evicting, due to the above equality. > > > > > + > > > > > +For VM_BIND gpu_vms, gpu_vmas don't need to be unbound before > > > > > eviction, > > > > > +Since the driver must ensure that the eviction blit or copy > > > > > will > > > > > wait > > > > > +for GPU idle or depend on all previous GPU activity. > > > > > Furthermore, > > > > > any > > > > > +subsequent attempt by the GPU to access freed memory through > > > > > the > > > > > +gpu_vma will be preceded by a new exec function, with a > > > > > revalidation > > > > > +section which will make sure all gpu_vmas are rebound. The > > > > > eviction > > > > > +code holding the object's dma_resv while revalidating will > > > > > ensure > > > > > a > > > > > +new exec function may not race with the eviction. Note that > > > > > this > > > > > will > > > > > +not hold true, however, if only a subsets of vmas are, due to > > > > > the > > > > > +driver implementation, selected for rebinding the next exec > > > > > +function.=C2=A0 =20 > > > >=20 > > > > This last sentence is hard to follow. > > > >=20 > > > > " > > > > Note that this will not hold true if only a subset of vmas > > > > are selected for rebinding during the next exec call (for > > > > instance, > > > > due > > > > to some driver decision to only partially restore VMAs). > > > > " > > > > =C2=A0 =20 > > > > > Then all vmas *not* selected for rebinding needs to be > > > > > +properly unbound before re-enabling GPU access to the VM.=C2=A0 = =20 > > > >=20 > > > > I think I get the problem, but can we have a use case where > > > > partial > > > > VMA restoration is useful? I mean, if some VMAs are not needed, > > > > we > > > > might be able to save MMU page table allocation/setup-time, but > > > > given > > > > the mess it then is to track those non-live VMAs, I'm wondering > > > > if > > > > it's > > > > leaving the door open for that, unless there's a good reason to > > > > do > > > > it.=C2=A0 =20 > > >=20 > > > This is the use-case Christian has been flagging for for OpenGL and > > > Media where he warns that the single-vm-memory-overcommit case > > > would > > > otherwise make the app crawl. =20 > >=20 > > IIUC, the partial VMA restore is about not restoring all VMAs > > attached > > to a vm_bo, but as soon as you restore one, this makes the BO > > resident, > > so all you're saving here is the extra page table for non-restored > > VMAs. > > I don't think that would significantly help the overcommit use case, > > unless you have so many VMAs attached to a single vm_bo that the > > amount of extra page tables becomes non-negligible compared to the BO > > size itself. > >=20 > > What would really help the overcommit use case is not restoring all > > evicted BOs, if some of them are known to be in a range that's not > > accessed by a GPU job. In that situation, you can decide to leave > > vm_bos in the evicted list if none of their VMAs overlap any of the > > VM > > ranges used by a job. =20 >=20 > Yes this section here is the key: The vmas belonging to evicted bos not > restored would be the ones not "selected for rebind". Okay, but then I don't see the problem if you leave such vm_bos in the evicted list. Those will still be considered for 'rebind' next time the evicted list is walked (basically on the next exec), right? >=20 > > =20 > > >=20 > > > Generalized one might want to think of these as groups of (or > > > perhaps > > > virtual ranges of) vmas that need to be resident for a single job > > > submission. Currently we assume the residency-group <-> vm mapping > > > is > > > 1:1, allowing for the unbind-before-eviction to be ignored, but I > > > figure moving forward and addressing performance problems of real- > > > world > > > applications that may not always be true. =20 > >=20 > > Maybe I don't understand what unbind-before-eviction means. To me > > it's > > the operation of tearing down all VM mappings (unbind) before > > returning > > BO pages to the system (evict). I don't see a situation where the > > eviction of a BO doesn't imply killing all VM mapping pointing to > > this > > BO. =20 >=20 > It's the point of teardown that matters here. You can return the pages > to system without tearing down the GPU mappings, as long as you tear > them down before there are any GPU activity on that vm again. In xe we > tear down the old mappings as part of the rebind process after we've > validated the evicted bo again, (but before submitting the GPU job).=C2=A0 > The point is the stale mappings can be left as long as there is no gpu > job accessing them. I see. As long as you're sure the VM is completely inactive, and that such mappings are destroyed before the VM is used again, that's fine I guess. Might be worth a detailed explanation about the different scenarios though.