Received: by 2002:a05:7412:31a9:b0:e2:908c:2ebd with SMTP id et41csp3953249rdb; Thu, 14 Sep 2023 07:40:57 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFyO/ErhoXKS5ZI4MOJGVJyqwpF2GByWHmZBUVQ/9wGe5M1caMnk6NJJ1ewQ9sLfQXQ1RXd X-Received: by 2002:a05:6a20:a111:b0:14e:3ba7:2933 with SMTP id q17-20020a056a20a11100b0014e3ba72933mr6178985pzk.54.1694702457354; Thu, 14 Sep 2023 07:40:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694702457; cv=none; d=google.com; s=arc-20160816; b=rtcDrl0WRcF0+A5UStz31CV2oNcrM1MxmiGthWe3mK+7i4tstYKH6BqCMWRyOrklTF zj1zfoWnnPdwhgxEVYmbTLUhjMaJx6OuvxZqowZweMyZBob3SDeRCkQBDQhWLPDC/LDA NTxK1CeDTTMmA41qTK6wZyo16pRNQsEQhyRMx2wlkJ3CKybh8WTCmPbWanRosyO6j7Hz NkTiLD5sT6XOvt6C0emNzJnQoTylgaDE7I+wqC6+fRS9wxlGGflY1q25hwv2sdNn2Gsu C7rHsaE3Aiupr8FdjGpxQz0WAuAoOgqLVPh229WipMz0Rm1Tffd9kEQtk4pC0EwrR52p zXQQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date:dkim-signature; bh=FMwzDFO7k+XYtvAfqodRqq4C5HGo2l6lYcLrPWIOnZk=; fh=uxLNaPoPRUzmkUc1KOgGwCEU/bTSjK8Yed2Up6ovXLY=; b=NihvbRrLJeXFrNDkWay3AOvCkj9M0sqQ2TwY31DO/BFG3l45gigtSkjhaA1jMRa6bx Ywzfh24S9+idONw5tTLfMUMMmtTF4QaxSKFzPhAmGUyLwHQUKbnkFY69fqmFz8ilZmT1 quPWJA1mtUMDTqbLvO0gVT0qjXGoA9X3fOt4X3ySz6Xp7JLR1mk1DD7Q5K+DQwngY4f+ gIGIn4xxsfGD+UnDv7JcFGhHzRSUlFGz8R0IBv5fEOOIv6Pe/No45oxFnxtil3uLbK35 0Nh54vkMyLmu8WKJXc44dmuaUaZg1CVoPSOGZYsogEGcVO7Pqs2lBwQoF8wYyJfRp/TH ojHQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@collabora.com header.s=mail header.b=Xsb80TXE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=collabora.com Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id s33-20020a056a0017a100b0068a685bf30fsi1839311pfg.271.2023.09.14.07.40.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Sep 2023 07:40:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@collabora.com header.s=mail header.b=Xsb80TXE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=collabora.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id C118882F4E34; Thu, 14 Sep 2023 04:58:52 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237307AbjINL6u (ORCPT + 99 others); Thu, 14 Sep 2023 07:58:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50424 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233897AbjINL6t (ORCPT ); Thu, 14 Sep 2023 07:58:49 -0400 Received: from madras.collabora.co.uk (madras.collabora.co.uk [IPv6:2a00:1098:0:82:1000:25:2eeb:e5ab]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1649BCF3 for ; Thu, 14 Sep 2023 04:58:45 -0700 (PDT) Received: from localhost (unknown [IPv6:2a01:e0a:2c:6930:5cf4:84a1:2763:fe0d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bbrezillon) by madras.collabora.co.uk (Postfix) with ESMTPSA id 2CB4E6607347; Thu, 14 Sep 2023 12:58:43 +0100 (BST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1694692723; bh=am4vkW1Rc3Zf2CXsajZe99O+YaptgGzz6iCED2+PmdI=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=Xsb80TXEFg/eSsS7LdAG0ahjdeM0T9acIkaX/vS+JSSOICKqtLj5o/XBNEv+GO/dL 0pgGKsZEuu8jEI/LJnwmP7G7Ie7a5TsZCaJk233G8OQGDjfZy9de75P4pFEaqTiGLJ UI/7ugeGpcD7Fh7z2hwO603jxR1f0D9G0BtNIFv+HVNC9+Zf4ylIJpsKiED66QKN+K Szm3CAWIap3+3oYE9JFgvRaljRTYXVepHZ5n+iBCKmppbvIBjM9sLhpgpY5KviFXSM M0mMO55OunloWbIAksXEJwM+T3FtztWeKnL65gTp4d0MkO/Qxq5uSMBiDgz3sQPnR3 Sd+AoO+tbcunw== Date: Thu, 14 Sep 2023 13:58:40 +0200 From: Boris Brezillon To: Dmitry Osipenko Cc: David Airlie , Gerd Hoffmann , Gurchetan Singh , Chia-I Wu , Daniel Vetter , Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , Christian =?UTF-8?B?S8O2bmln?= , Qiang Yu , Steven Price , Emma Anholt , Melissa Wen , dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, kernel@collabora.com, virtualization@lists.linux-foundation.org Subject: Re: [PATCH v16 15/20] drm/shmem-helper: Add memory shrinker Message-ID: <20230914135840.5e0e11fe@collabora.com> In-Reply-To: References: <20230903170736.513347-1-dmitry.osipenko@collabora.com> <20230903170736.513347-16-dmitry.osipenko@collabora.com> <20230905100306.3564e729@collabora.com> <26f7ba6d-3520-0311-35e2-ef5706a98232@collabora.com> <20230913094832.3317c2df@collabora.com> <20230914093626.19692c24@collabora.com> <21dda0bd-4264-b480-dbbc-29a7744bc96c@collabora.com> <20230914102737.08e61498@collabora.com> Organization: Collabora X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Thu, 14 Sep 2023 04:58:52 -0700 (PDT) On Thu, 14 Sep 2023 14:36:23 +0300 Dmitry Osipenko wrote: > On 9/14/23 11:27, Boris Brezillon wrote: > > On Thu, 14 Sep 2023 10:50:32 +0300 > > Dmitry Osipenko wrote: > > > >> On 9/14/23 10:36, Boris Brezillon wrote: > >>> On Thu, 14 Sep 2023 07:02:52 +0300 > >>> Dmitry Osipenko wrote: > >>> > >>>> On 9/13/23 10:48, Boris Brezillon wrote: > >>>>> On Wed, 13 Sep 2023 03:56:14 +0300 > >>>>> Dmitry Osipenko wrote: > >>>>> > >>>>>> On 9/5/23 11:03, Boris Brezillon wrote: > >>>>>>>> * But > >>>>>>>> + * acquiring the obj lock in drm_gem_shmem_release_pages_locked() can > >>>>>>>> + * cause a locking order inversion between reservation_ww_class_mutex > >>>>>>>> + * and fs_reclaim. > >>>>>>>> + * > >>>>>>>> + * This deadlock is not actually possible, because no one should > >>>>>>>> + * be already holding the lock when drm_gem_shmem_free() is called. > >>>>>>>> + * Unfortunately lockdep is not aware of this detail. So when the > >>>>>>>> + * refcount drops to zero, don't touch the reservation lock. > >>>>>>>> + */ > >>>>>>>> + if (shmem->got_pages_sgt && > >>>>>>>> + refcount_dec_and_test(&shmem->pages_use_count)) { > >>>>>>>> + drm_gem_shmem_do_release_pages_locked(shmem); > >>>>>>>> + shmem->got_pages_sgt = false; > >>>>>>>> } > >>>>>>> Leaking memory is the right thing to do if pages_use_count > 1 (it's > >>>>>>> better to leak than having someone access memory it no longer owns), but > >>>>>>> I think it's worth mentioning in the above comment. > >>>>>> > >>>>>> It's unlikely that it will be only a leak without a following up > >>>>>> use-after-free. Neither is acceptable. > >>>>> > >>>>> Not necessarily, if you have a page leak, it could be that the GPU has > >>>>> access to those pages, but doesn't need the GEM object anymore > >>>>> (pages are mapped by the iommu, which doesn't need shmem->sgt or > >>>>> shmem->pages after the mapping is created). Without a WARN_ON(), this > >>>>> can go unnoticed and lead to memory corruptions/information leaks. > >>>>> > >>>>>> > >>>>>> The drm_gem_shmem_free() could be changed such that kernel won't blow up > >>>>>> on a refcnt bug, but that's not worthwhile doing because drivers > >>>>>> shouldn't have silly bugs. > >>>>> > >>>>> We definitely don't want to fix that, but we want to complain loudly > >>>>> (WARN_ON()), and make sure the risk is limited (preventing memory from > >>>>> being re-assigned to someone else by not freeing it). > >>>> > >>>> That's what the code did and continues to do here. Not exactly sure what > >>>> you're trying to say. I'm going to relocate the comment in v17 to > >>>> put_pages(), we can continue discussing it there if I'm missing yours point. > >>>> > >>> > >>> I'm just saying it would be worth mentioning that we're intentionally > >>> leaking memory if shmem->pages_use_count > 1. Something like: > >>> > >>> /** > >>> * shmem->pages_use_count should be 1 when ->sgt != NULL and > >>> * zero otherwise. If some users still hold a pages reference > >>> * that's a bug, and we intentionally leak the pages so they > >>> * can't be re-allocated to someone else while the GPU/CPU > >>> * still have access to it. > >>> */ > >>> drm_WARN_ON(drm, > >>> refcount_read(&shmem->pages_use_count) == (shmem->sgt ? 1 : 0)); > >>> if (shmem->sgt && refcount_dec_and_test(&shmem->pages_use_count)) > >>> drm_gem_shmem_free_pages(shmem); > >> > >> That may be acceptable, but only once there will a driver using this > >> feature. > > > > Which feature? That's not related to a specific feature, that's just > > how drm_gem_shmem_get_pages_sgt() works, it takes a pages ref that can > > only be released in drm_gem_shmem_free(), because sgt users are not > > refcounted and the sgt stays around until the GEM object is freed or > > its pages are evicted. The only valid cases we have at the moment are: > > > > - pages_use_count == 1 && sgt != NULL > > - pages_use_count == 0 > > > > any other situations are buggy. > > sgt may belong to dma-buf for which pages_use_count=0, this can't be > done until sgt mess is sorted out No it can't, not in that path, because the code you're adding is in the if (!obj->import_branch) branch: if (obj->import_attach) { drm_prime_gem_destroy(obj, shmem->sgt); } else { ... // Your changes are here. ... } >