Received: by 2002:a05:6358:45e:b0:b5:b6eb:e1f9 with SMTP id 30csp4137708rwe; Tue, 30 Aug 2022 05:24:04 -0700 (PDT) X-Google-Smtp-Source: AA6agR6AxNEdCa97591z456UzC3cg61IyvcWCabE+YZo5wC2G/vPpmuxRwXFeVgw1qq4t2COFsrf X-Received: by 2002:a17:90b:1d0d:b0:1fd:d706:b53c with SMTP id on13-20020a17090b1d0d00b001fdd706b53cmr9034891pjb.192.1661862243943; Tue, 30 Aug 2022 05:24:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1661862243; cv=none; d=google.com; s=arc-20160816; b=ARnzF1B1C+sjJzXWNzR9KZic+dXWNMN4GNw4mNQfWMWA7sjnX1H6T0ClKM5NKtlEj2 8/QDkT0P0HxnABdNRclrQv8H3xZHiORnS4ZpGz3ongCkSG51WelVEzPqwq4WnCgKKF+j PJZPHK6xvjEY8edS1J666BW1RqFHFrSuAEYaPs8unqDwr1dSqd5LbHLoh7lr+uRw+JGy jo76gOp9n3IlpWeNRhabxSfjtGVAujM+GlaW05/7yShuUwTzviwRW1Mypia2oWfqxbXJ xuXhm/wo7RJt/yBVBiq9zH4vACgm47Hc04rtmAe9ja+MvuLngfuz+n766nj04K8xiV7l zBfg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:subject :organization:from:references:cc:to:content-language:user-agent :mime-version:date:message-id:dkim-signature; bh=wStWHojG+mqSkx5w9odtf8yhDsAk+/Zs+ge/ktP/P14=; b=t/RdfjtoFQALjiijQ9rTalNx2kaMvnC6xhOqhZ9h6f9h55aYkY5n1z+Bpai2A90qaN WULYJcDkCdItGFBHV/563YiBjshCYrs1A3kyi6o232Vx0rJwieEWbp7pnn/IGKlXEJDb KS7hd5ch45tdTrUN3N2JA5bkmQyqHioS/dRqfoAk8VtG6foJCFk8/nZz/hat3OEqq2a+ vLOs2r/YzquRJkusQiFUYJVfZiboNcw4+O6wuhBpxmWqj1kPKEwKnZn2KC8w6MAOgXOr T8ygMWV3rijDRa6QpaKfYl0RDN4TnJvGIIcseT+VH4M8rs6VK39L3zhTZbWHZCciuK3z ZpVg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=gUknbzr4; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r12-20020a63440c000000b0042a96804a6dsi2043206pga.423.2022.08.30.05.23.41; Tue, 30 Aug 2022 05:24:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=gUknbzr4; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230291AbiH3MRw (ORCPT + 99 others); Tue, 30 Aug 2022 08:17:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43056 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230241AbiH3MRu (ORCPT ); Tue, 30 Aug 2022 08:17:50 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7D035B531A for ; Tue, 30 Aug 2022 05:17:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1661861868; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wStWHojG+mqSkx5w9odtf8yhDsAk+/Zs+ge/ktP/P14=; b=gUknbzr40ySmQrJTnRyYtFl6KXzqRHlIMia/kSNPkvEUmGrJz6wz/3aek16LVQmUKBSYG8 9bMjt4ksHrjp3Hlv1nr3lgrq+zs4YIzC4B6TlcYPkokZUDRnrG3L2FAREKYNpI5HGOgu9i 5joC+JnkEWt6mBt38dvX+R1LmeSyoLc= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-26-BzmcMgtcOneqME9iuJnZSQ-1; Tue, 30 Aug 2022 08:17:47 -0400 X-MC-Unique: BzmcMgtcOneqME9iuJnZSQ-1 Received: by mail-wm1-f71.google.com with SMTP id n1-20020a7bc5c1000000b003a682987306so1927298wmk.1 for ; Tue, 30 Aug 2022 05:17:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc; bh=wStWHojG+mqSkx5w9odtf8yhDsAk+/Zs+ge/ktP/P14=; b=lferv8Tz2hGBWKcHNJMBD3/3TUUPa9BTAEMzigWEVZtwirizrM/KRI/wH5POc8YuGP s4umZutF+hTT5/+HNzKA4BLZT0AESD9VzYZQZsR7tmzsfqBgg1Mpu2qtnsZroEC2hRRz F2rJPD81JDfHXwHqXjnarhRy0Vkatl+hEIUuJj4qr9DkJdBC8sHxwrkvk0ww+BF+gu5g QtinKr59/7h2wge2XGrkQu/N8quiWH4CYc6cPBaFUCfGQlgPwV2cmYxRFakXvn9WOmhb cIqvuooOglUx/YEdbFx+fV+KSaekObMcSe1owMynmWVXnZUSEyFFd+lMGF9gpvrJa1vR wP9w== X-Gm-Message-State: ACgBeo3rZalYUTwf/+T1I956TqBua4TOPi+6MLpwO7SyBWC38bjhtkQf 28PB6nwE5Ifdd7UJL202G9KoXTjOluAQQ34Zh3kq/A2uBY1UGR3RZ1iE/uPcL1t2w3UQFJYLKUv 4spnPjIjhlp+KdP1HXDl9 X-Received: by 2002:adf:ed50:0:b0:225:4c37:5346 with SMTP id u16-20020adfed50000000b002254c375346mr8810795wro.207.1661861866420; Tue, 30 Aug 2022 05:17:46 -0700 (PDT) X-Received: by 2002:adf:ed50:0:b0:225:4c37:5346 with SMTP id u16-20020adfed50000000b002254c375346mr8810749wro.207.1661861865960; Tue, 30 Aug 2022 05:17:45 -0700 (PDT) Received: from ?IPV6:2003:cb:c70a:1000:ecb4:919b:e3d3:e20b? (p200300cbc70a1000ecb4919be3d3e20b.dip0.t-ipconnect.de. [2003:cb:c70a:1000:ecb4:919b:e3d3:e20b]) by smtp.gmail.com with ESMTPSA id z20-20020a05600c0a1400b003a5e9337967sm12853676wmp.13.2022.08.30.05.17.44 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 30 Aug 2022 05:17:45 -0700 (PDT) Message-ID: <5aa08b4f-251e-a63d-c36c-324a04ba24f4@redhat.com> Date: Tue, 30 Aug 2022 14:17:44 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.12.0 Content-Language: en-US To: John Hubbard , Andrew Morton Cc: Jens Axboe , Alexander Viro , Miklos Szeredi , Christoph Hellwig , "Darrick J . Wong" , Trond Myklebust , Anna Schumaker , Jan Kara , Logan Gunthorpe , linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, LKML References: <20220827083607.2345453-1-jhubbard@nvidia.com> <20220827083607.2345453-2-jhubbard@nvidia.com> <10a9d33a-58a3-10b3-690b-53100d4e5440@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH 1/6] mm/gup: introduce pin_user_page() In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On 29.08.22 21:33, John Hubbard wrote: > On 8/29/22 05:07, David Hildenbrand wrote: >>> +/** >>> + * pin_user_page() - apply a FOLL_PIN reference to a page >>> + * >>> + * @page: the page to be pinned. >>> + * >>> + * This is similar to get_user_pages(), except that the page's refcount is >>> + * elevated using FOLL_PIN, instead of FOLL_GET. > > Actually, my commit log has a more useful documentation of this routine, > and given the questions below, I think I'll change to that: > > * pin_user_page() is an externally-usable version of try_grab_page(), but with > * semantics that match get_page(), so that it can act as a drop-in replacement > * for get_page(). > * > * pin_user_page() elevates a page's refcount using FOLL_PIN rules. This means > * that the caller must release the page via unpin_user_page(). Some thoughts: a) Can we generalize such that pages with a dedicated pincount (multi-page folios) are also covered? Maybe avoiding the refcount terminology would be best. b) Should we directly work on folios? c) Would it be valid to pass in a tail page right now? > >>> + * >>> + * IMPORTANT: The caller must release the page via unpin_user_page(). >>> + * >>> + */ >>> +void pin_user_page(struct page *page) >>> +{ >>> + struct folio *folio = page_folio(page); >>> + >>> + WARN_ON_ONCE(folio_ref_count(folio) <= 0); >>> + >> >> We should warn if the page is anon and !exclusive. > > That would be sort of OK, because pin_user_page() is being created > specifically for file system (O_DIRECT cases) use, and so the pages > should mostly be file-backed, rather than anon. Although I'm a little > vague about whether all of these iov_iter cases are really always > file-backed pages, especially for cases such as splice(2) to an > O_DIRECT-opened file, that Al Viro mentioned [1]. If we can, we should document that this interface is not for anonymous pages and WARN if pinning an anonymous page via this interface. The only reasonable way to obtain a pin on an anonymous page is via the page table. Here, FOLL_PIN should be used to do the right thing -- for example, unshare first (break COW) instead of pinning a shared anonymous page. Nothing would speak against duplicating such a pin using this interface (we'd have to sanity check that the page we're pinning may already be pinned), but I assume the pages we pin here are *not* necessarily obtained via GUP FOLL_PIN. I would be curious under which scenarios we could end up here with an anonymous page and how we obtained that reference (if not via GUP). > > Can you walk me through the reasoning for why we need to keep out > anon shared pages? We make sure to only pin anonymous pages that are exclusive and check when unpinning -- see sanity_check_pinned_pages(), there is also a comment in there -- that pinned anonymous pages are in fact still exclusive, otherwise we might have a BUG lurking somewhere that can result in memory corruptions or leaking information between processes. For example, once we'd pinned an anonymous pages that are not marked exclusive (!PageAnonExclusive), or we'd be sharing a page that is pinned, the next write fault would replace the page in the user page table due to breaking COW, and the GUP pin would point at a different page than the page table. Disallowing pinning of anon pages that may be shared in any case (FOLL_LONGTERM or not) simplifies GUP handling and allows for such sanity checks. (side note: after recent VM_BUG_ON discussions we might want to convert the VM_BUG_ON_PAGE in sanity_check_pinned_pages()) > >> >> I assume the intend is to use pin_user_page() only to duplicate pins, right? >> > > Well, yes or no, depending on your use of the term "pin": > > pin_user_page() is used on a page that already has a refcount >= 1 (so > no worries about speculative pinning should apply here), but the page > does not necessarily have any FOLL_PIN's applied to it yet (so it's not > "pinned" in the FOLL_PIN sense). Okay, then we should really figure out if/how anonymous pages could end up here. I assume they can't, really. But let's see :) -- Thanks, David / dhildenb