Received: by 2002:a05:7412:3784:b0:e2:908c:2ebd with SMTP id jk4csp1851269rdb; Tue, 3 Oct 2023 03:06:32 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEIZZjvxiawnN5fcs1F+d8B1S9vmSYmyRMzVmV8wRzcorqPYcUfYx3OxD/t+Z4kqpdSiiS3 X-Received: by 2002:a05:6358:938c:b0:143:321:f36b with SMTP id h12-20020a056358938c00b001430321f36bmr14171775rwb.18.1696327591763; Tue, 03 Oct 2023 03:06:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696327591; cv=none; d=google.com; s=arc-20160816; b=KueUwazy5vgRjypVAbvzyvpOxehLgCBrv4+IWJMnH6dqcIbQBBKaR2VWEyDBkL641M 1ZzE5o1jAD4f1ccBD0hXgUx6UJX30o+Uv+FyiQ5kSQD+ZP2XQA0pTy/MA4tMfD8wZH+d FYVZGfrCnqqN8GR4hZugwJKcT995f2Wd7rd3TfpNzj4YVF8ObrKp55S25238GLGxLEAg plPZ41AGTcNDz/VEvX+BwT3zmUjuMXr3yyW1JkIRRnns7Yl0DCJSYw4+bJI7s+26iI5R PxTWuItG2q5qRTA52/aBjwbhi1/sv6hadS4nbY53p7F3QOjuQk7T0Qm7hKtvfgrK9GYq P9ZQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date:dkim-signature; bh=NwM3zI00ULE6n6zxBzi1HZmxoGEsJqbJ2Vii5VwRZII=; fh=IsvNflLAOTOdxBwCPG0gJzLIfvv1/vZsMZJ1RkrV/Co=; b=0QZ36zL1YJ0KwAUHnJRGkbh+ZV7R4T+z2PgM2gA9jrulukklA8nRWRnzpNIQUUIy0W RCOdvfm0fL75tFS7pDTl3xyRp0ZFh9g0sPTfF6xoj92N9mSLeW5nfRbf6I87zA1L/tn2 e6k10hVNB3RqDFOEXdtFi4TbF8dS86jLcyn5R2/xT0u8uqLQYls7BEZsLx2u4UvcE80a PSDu7ImHDqXI8b6x5tjG+wNOJWkZDnQ42nQvmc6aAaV1oIPl1EvPzDIgGhpzEtYNzUxf /6GmFAv3Xv5tltQNAAZhNziIv+508llnJI9YB3zrxtocsndOKlQewWRNo90veC6p+ndw Yoew== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@collabora.com header.s=mail header.b=ipi2QljE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=collabora.com Return-Path: Received: from howler.vger.email (howler.vger.email. [23.128.96.34]) by mx.google.com with ESMTPS id w23-20020a63f517000000b005775e2f160fsi1019686pgh.667.2023.10.03.03.06.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 Oct 2023 03:06:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) client-ip=23.128.96.34; Authentication-Results: mx.google.com; dkim=pass header.i=@collabora.com header.s=mail header.b=ipi2QljE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=collabora.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 505A6804C6F5; Tue, 3 Oct 2023 03:06:12 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239818AbjJCKGE (ORCPT + 99 others); Tue, 3 Oct 2023 06:06:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46108 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230392AbjJCKGD (ORCPT ); Tue, 3 Oct 2023 06:06:03 -0400 Received: from madras.collabora.co.uk (madras.collabora.co.uk [46.235.227.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C269C91 for ; Tue, 3 Oct 2023 03:05:59 -0700 (PDT) Received: from localhost (unknown [IPv6:2a01:e0a:2c:6930:5cf4:84a1:2763:fe0d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bbrezillon) by madras.collabora.co.uk (Postfix) with ESMTPSA id C32B76607181; Tue, 3 Oct 2023 11:05:57 +0100 (BST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1696327558; bh=Uea1kzbzCEAJv9egM0FbWhO2rAR8f8GiAjog6bbqjK8=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=ipi2QljETxWw6nUTpWbl4ctzFrUaNHlu+c2+q65SXjz+KjG3uvVXHvwBOY+QYHm+B RDJ/ZcNSJDP63J/DdmRPx/GZuCRFzufzwEP1rPK7deEt/+vRMgbNSMxLVuprskN88J dgukou0kAa2sTvJkZe1EJZunYHx6MT8WOUNmAbiOazJklYNk8RB0Vb9Z4hXcKxHF70 IAcAVWsVp3b1Gn2lwXkD+V9HquYu4hEdqf5uXgUjTe6baAvXeUE+3cAmPQZo6Ut+rF 3IufimnoHfz9IhkMm0x3C7x9+J7hYAPrwZKBz6gSthEkDUHWMDmbtr0V51wnor7GHh 0+DlndPNGz4ng== Date: Tue, 3 Oct 2023 12:05:54 +0200 From: Boris Brezillon To: Thomas =?UTF-8?B?SGVsbHN0csO2bQ==?= Cc: Danilo Krummrich , airlied@gmail.com, daniel@ffwll.ch, matthew.brost@intel.com, sarah.walker@imgtec.com, donald.robson@imgtec.com, christian.koenig@amd.com, faith@gfxstrand.net, dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects Message-ID: <20231003120554.547090bc@collabora.com> In-Reply-To: References: <20230928191624.13703-1-dakr@redhat.com> <20230928191624.13703-5-dakr@redhat.com> Organization: Collabora X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Tue, 03 Oct 2023 03:06:12 -0700 (PDT) Hello Thomas, On Tue, 3 Oct 2023 10:36:10 +0200 Thomas Hellstr=C3=B6m wrote: > > +/** > > + * get_next_vm_bo_from_list() - get the next vm_bo element > > + * @__gpuvm: The GPU VM > > + * @__list_name: The name of the list we're iterating on > > + * @__local_list: A pointer to the local list used to store already it= erated items > > + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_= cached_vm_bo() > > + * > > + * This helper is here to provide lockless list iteration. Lockless as= in, the > > + * iterator releases the lock immediately after picking the first elem= ent from > > + * the list, so list insertion deletion can happen concurrently. > > + * > > + * Elements popped from the original list are kept in a local list, so= removal > > + * and is_empty checks can still happen while we're iterating the list. > > + */ > > +#define get_next_vm_bo_from_list(__gpuvm, __list_name, __local_list, _= _prev_vm_bo) \ > > + ({ \ > > + struct drm_gpuvm_bo *__vm_bo =3D NULL; \ > > + \ > > + drm_gpuvm_bo_put(__prev_vm_bo); \ > > + \ > > + spin_lock(&(__gpuvm)->__list_name.lock); \ =20 >=20 > Here we unconditionally take the spinlocks while iterating, and the main= =20 > point of DRM_GPUVM_RESV_PROTECTED was really to avoid that? >=20 >=20 > > + if (!(__gpuvm)->__list_name.local_list) \ > > + (__gpuvm)->__list_name.local_list =3D __local_list; \ > > + else \ > > + WARN_ON((__gpuvm)->__list_name.local_list !=3D __local_list); \ > > + \ > > + while (!list_empty(&(__gpuvm)->__list_name.list)) { \ > > + __vm_bo =3D list_first_entry(&(__gpuvm)->__list_name.list, \ > > + struct drm_gpuvm_bo, \ > > + list.entry.__list_name); \ > > + if (kref_get_unless_zero(&__vm_bo->kref)) { =20 > And unnecessarily grab a reference in the RESV_PROTECTED case. > > \ > > + list_move_tail(&(__vm_bo)->list.entry.__list_name, \ > > + __local_list); \ > > + break; \ > > + } else { \ > > + list_del_init(&(__vm_bo)->list.entry.__list_name); \ > > + __vm_bo =3D NULL; \ > > + } \ > > + } \ > > + spin_unlock(&(__gpuvm)->__list_name.lock); \ > > + \ > > + __vm_bo; \ > > + }) =20 >=20 > IMHO this lockless list iteration looks very complex and should be=20 > pretty difficult to maintain while moving forward, also since it pulls=20 > the gpuvm_bos off the list, list iteration needs to be protected by an=20 > outer lock anyway. As being partly responsible for this convoluted list iterator, I must say I agree with you. There's so many ways this can go wrong if the user doesn't call it the right way, or doesn't protect concurrent list iterations with a separate lock (luckily, this is a private iterator). I mean, it works, so there's certainly a way to get it right, but gosh, this is so far from the simple API I had hoped for. > Also from what I understand from Boris, the extobj=20 > list would typically not need the fine-grained locking; only the evict=20 > list? Right, I'm adding the gpuvm_bo to extobj list in the ioctl path, when the GEM and VM resvs are held, and I'm deferring the drm_gpuvm_bo_put() call to a work that's not in the dma-signalling path. This being said, I'm still not comfortable with the gem =3D drm_gem_object_get(vm_bo->gem); dma_resv_lock(gem->resv); drm_gpuvm_bo_put(vm_bo); dma_resv_unlock(gem->resv); drm_gem_object_put(gem); dance that's needed to avoid a UAF when the gpuvm_bo is the last GEM owner, not to mention that drm_gpuva_unlink() calls drm_gpuvm_bo_put() after making sure the GEM gpuvm_list lock is held, but this lock might differ from the resv lock (custom locking so we can call gpuvm_unlink() in the dma-signalling path). So we now have paths where drm_gpuvm_bo_put() are called with the resv lock held, and others where they are not, and that only works because we're relying on the the fact those drm_gpuvm_bo_put() calls won't make the refcount drop to zero, because the deferred vm_bo_put() work still owns a vm_bo ref. All these tiny details add to the overall complexity of this common layer, and to me, that's not any better than the get_next_vm_bo_from_list() complexity you were complaining about (might be even worth, because this sort of things leak to users). Having an internal lock partly solves that, in that the locking of the extobj list is now entirely orthogonal to the GEM that's being removed from this list, and we can lock/unlock internally without forcing the caller to take weird actions to make sure things don't explode. Don't get me wrong, I get that this locking overhead is not acceptable for Xe, but I feel like we're turning drm_gpuvm into a white elephant that only few people will get right. This is just my personal view on this, and I certainly don't want to block or delay the merging of this patchset, but I thought I'd share my concerns. As someone who's been following the evolution of this drm_gpuva/vm series for weeks, and who's still sometimes getting lost, I can't imagine how new drm_gpuvm users would feel... > Also it seems that if we are to maintain two modes here, for=20 > reasonably clean code we'd need two separate instances of=20 > get_next_bo_from_list(). >=20 > For the !RESV_PROTECTED case, perhaps one would want to consider the=20 > solution used currently in xe, where the VM maintains two evict lists.=20 > One protected by a spinlock and one protected by the VM resv. When the=20 > VM resv is locked to begin list traversal, the spinlock is locked *once*= =20 > and the spinlock-protected list is looped over and copied into the resv=20 > protected one. For traversal, the resv protected one is used. Oh, so you do have the same sort of trick where you move the entire list to another list, such that you can let other paths update the list while you're iterating your own snapshot. That's interesting... Regards, Boris