Received: by 2002:a25:7ec1:0:0:0:0:0 with SMTP id z184csp2257282ybc; Wed, 13 Nov 2019 11:24:55 -0800 (PST) X-Google-Smtp-Source: APXvYqxbhfWwHqnfA3c0xkeMTfD0c/4CRF2mYzp4V0qMwg8/7zR9mOAWZFq87Tpx8nxDullvE+bP X-Received: by 2002:a17:906:68d9:: with SMTP id y25mr4590967ejr.136.1573673095266; Wed, 13 Nov 2019 11:24:55 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573673095; cv=none; d=google.com; s=arc-20160816; b=U0poH9Yu0Zf0z6s3A3FVQrcS8f6yvtL6hx6rs7KLUi3Iml4vQl6+6uYLhwhOoI9Ths /5GlEGF2HXPTIXFgQPOzMu0CtB6euwnSBK/SfKDUnDD9ihGlLtUDhZy5J/DCBTUCfngR UqU9e6+XV8zPE2N68B68CrOrUUkhFegamzyFdIL35Af5FHFdX3WJNisUn8u38An67KFm OL1YuEew4uxZjqw07nRVFHQk1JGTjntZYxFc6BND76+rgST9sAmkFs+JyDAWXSeyks2k U+YafbBA0DhNW5FN7eAkzRnv7t2TrsIKPNuAwDsh7T9QCm8jfexRs4BDQtUvFNjbt7jT 85iw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=cT4Ynp7AABnbetkKbYtY9L84HJMiir4x9aUvgO3qIkA=; b=aGkn2vOfz13ESQTNp1tHo4fos18RGyK70xvGFANU985AHAffV/yXpjzUIest6lYtGj KlMY2kXLKdAwfJKaeiZkk+arvwjuF4xoFiseTxzHpZwA4zPYXngTx2j86HCQtO8ZaWeA uso2hCL+Do2CDAk5MWYGvupmeUHKjnNSffBn8XEocUt94jPO/SGNF1Ydt0sbE8h/k/yG 0VqyacBJyY0EyMzrMGMDY7B+ohMXWg80V482QjURDyNzO/jI2Gx9ruYBZUeMBkUqeUp9 iVu4AAepdyqqZwcsTkaMK0adqtEBu/9gYO91Ifh0CLRi2sAWDRpiaAs8pwikfsQUpCmm 7bng== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=a7A+dcs4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d6si2270684eda.262.2019.11.13.11.24.30; Wed, 13 Nov 2019 11:24:55 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=a7A+dcs4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728367AbfKMSqC (ORCPT + 99 others); Wed, 13 Nov 2019 13:46:02 -0500 Received: from mail-ot1-f66.google.com ([209.85.210.66]:37306 "EHLO mail-ot1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726409AbfKMSqB (ORCPT ); Wed, 13 Nov 2019 13:46:01 -0500 Received: by mail-ot1-f66.google.com with SMTP id d5so2554818otp.4 for ; Wed, 13 Nov 2019 10:46:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=cT4Ynp7AABnbetkKbYtY9L84HJMiir4x9aUvgO3qIkA=; b=a7A+dcs4qLyRwykFz5P8aF/5gHL9KK36xtIfXVr/kBk3uPlOEckoeys38y+nnoealE xzZU97hBkbVqSDW3gJxcm8meBZzrkQaKb0fOpplJk89VtBOEN/d/rvjF/p1FfJJqTTXv 4RFlHCPjmONBQGpTT3WPud0MHuI1arLibsaS9qkCPzQA4NQZsJVxDT/hq2PSgxZUbK6j hmavUnr3qxAC5dpB11ue+Kc8RjftT0go7LTB3CRWT06X6W6/Vr35u8YiRNHJv46YGe1d 7YKSAbhOh/ee5b8uqRA8DDWS3/IOwVUmAGMREeaTodHgoCOvb7Na9VVd6Z5oKT3OMDr8 TTJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=cT4Ynp7AABnbetkKbYtY9L84HJMiir4x9aUvgO3qIkA=; b=gyzX8SDruSxIdOKS7HwFR338K2fTcvTfMznVywzB38w42xqAtSHtjgXTkPdK4/TGeH zt1kkmmzoCpMtIDOZ4ZYvpiNEqEpQ5ZYQuouA9bdqoqZtFZ0cRPHVEZZiGynoNKw+zw6 wmkYMfy4kzX0l2MKRoPCv/1ElAcYaWdF5YkTw2ZzKuW6d7Am1SIf7Dq9mlbBm6WCOENf QxDNKR45Dyx4VTpMSa/Xl7SdpiAG6sAvB9G05EFMI3ny/CJu2PwHTD6doJ5pI4tRCUKa g6SFi4+nSFU6GSEJtiJqHwepToR1LGyVNkKwrjH2+hVb6DNt5XxTAEwilxDMU0kjFXHB j9Aw== X-Gm-Message-State: APjAAAWscXzE2aVx3BUeUTTvDsfISAoZoRhHKi7a6shKxQ8Zt/iSzYrB Zwa4v+NdzBxwXkHEgtr96fk0BHt2eztbxlmCiT6ucg== X-Received: by 2002:a05:6830:1af7:: with SMTP id c23mr4066831otd.247.1573670761142; Wed, 13 Nov 2019 10:46:01 -0800 (PST) MIME-Version: 1.0 References: <20191113042710.3997854-1-jhubbard@nvidia.com> <20191113042710.3997854-10-jhubbard@nvidia.com> <20191113104308.GE6367@quack2.suse.cz> In-Reply-To: <20191113104308.GE6367@quack2.suse.cz> From: Dan Williams Date: Wed, 13 Nov 2019 10:45:49 -0800 Message-ID: Subject: Re: [PATCH v4 09/23] mm/gup: introduce pin_user_pages*() and FOLL_PIN To: Jan Kara Cc: John Hubbard , Andrew Morton , Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= , Christoph Hellwig , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jason Gunthorpe , Jens Axboe , Jonathan Corbet , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , Mauro Carvalho Chehab , Michael Ellerman , Michal Hocko , Mike Kravetz , Paul Mackerras , Shuah Khan , Vlastimil Babka , bpf@vger.kernel.org, Maling list - DRI developers , KVM list , linux-block@vger.kernel.org, Linux Doc Mailing List , linux-fsdevel , linux-kselftest@vger.kernel.org, "Linux-media@vger.kernel.org" , linux-rdma , linuxppc-dev , Netdev , Linux MM , LKML , Mike Rapoport Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 13, 2019 at 2:43 AM Jan Kara wrote: > > On Tue 12-11-19 20:26:56, John Hubbard wrote: > > Introduce pin_user_pages*() variations of get_user_pages*() calls, > > and also pin_longterm_pages*() variations. > > > > These variants all set FOLL_PIN, which is also introduced, and > > thoroughly documented. > > > > The pin_longterm*() variants also set FOLL_LONGTERM, in addition > > to FOLL_PIN: > > > > pin_user_pages() > > pin_user_pages_remote() > > pin_user_pages_fast() > > > > pin_longterm_pages() > > pin_longterm_pages_remote() > > pin_longterm_pages_fast() > > > > All pages that are pinned via the above calls, must be unpinned via > > put_user_page(). > > > > The underlying rules are: > > > > * These are gup-internal flags, so the call sites should not directly > > set FOLL_PIN nor FOLL_LONGTERM. That behavior is enforced with > > assertions, for the new FOLL_PIN flag. However, for the pre-existing > > FOLL_LONGTERM flag, which has some call sites that still directly > > set FOLL_LONGTERM, there is no assertion yet. > > > > * Call sites that want to indicate that they are going to do DirectIO > > ("DIO") or something with similar characteristics, should call a > > get_user_pages()-like wrapper call that sets FOLL_PIN. These wrappers > > will: > > * Start with "pin_user_pages" instead of "get_user_pages". That > > makes it easy to find and audit the call sites. > > * Set FOLL_PIN > > > > * For pages that are received via FOLL_PIN, those pages must be returne= d > > via put_user_page(). > > > > Thanks to Jan Kara and Vlastimil Babka for explaining the 4 cases > > in this documentation. (I've reworded it and expanded upon it.) > > > > Reviewed-by: Mike Rapoport # Documentation > > Reviewed-by: J=C3=A9r=C3=B4me Glisse > > Cc: Jonathan Corbet > > Cc: Ira Weiny > > Signed-off-by: John Hubbard > > Thanks for the documentation. It looks great! > > > diff --git a/mm/gup.c b/mm/gup.c > > index 83702b2e86c8..4409e84dff51 100644 > > --- a/mm/gup.c > > +++ b/mm/gup.c > > @@ -201,6 +201,10 @@ static struct page *follow_page_pte(struct vm_area= _struct *vma, > > spinlock_t *ptl; > > pte_t *ptep, pte; > > > > + /* FOLL_GET and FOLL_PIN are mutually exclusive. */ > > + if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) =3D=3D > > + (FOLL_PIN | FOLL_GET))) > > + return ERR_PTR(-EINVAL); > > retry: > > if (unlikely(pmd_bad(*pmd))) > > return no_page_table(vma, flags); > > How does FOLL_PIN result in grabbing (at least normal, for now) page refe= rence? > I didn't find that anywhere in this patch but it is a prerequisite to > converting any user to pin_user_pages() interface, right? > > > +/** > > + * pin_user_pages_fast() - pin user pages in memory without taking loc= ks > > + * > > + * Nearly the same as get_user_pages_fast(), except that FOLL_PIN is s= et. See > > + * get_user_pages_fast() for documentation on the function arguments, = because > > + * the arguments here are identical. > > + * > > + * FOLL_PIN means that the pages must be released via put_user_page().= Please > > + * see Documentation/vm/pin_user_pages.rst for further details. > > + * > > + * This is intended for Case 1 (DIO) in Documentation/vm/pin_user_page= s.rst. It > > + * is NOT intended for Case 2 (RDMA: long-term pins). > > + */ > > +int pin_user_pages_fast(unsigned long start, int nr_pages, > > + unsigned int gup_flags, struct page **pages) > > +{ > > + /* FOLL_GET and FOLL_PIN are mutually exclusive. */ > > + if (WARN_ON_ONCE(gup_flags & FOLL_GET)) > > + return -EINVAL; > > + > > + gup_flags |=3D FOLL_PIN; > > + return internal_get_user_pages_fast(start, nr_pages, gup_flags, p= ages); > > +} > > +EXPORT_SYMBOL_GPL(pin_user_pages_fast); > > I was somewhat wondering about the number of functions you add here. So w= e > have: > > pin_user_pages() > pin_user_pages_fast() > pin_user_pages_remote() > > and then longterm variants: > > pin_longterm_pages() > pin_longterm_pages_fast() > pin_longterm_pages_remote() > > and obviously we have gup like: > get_user_pages() > get_user_pages_fast() > get_user_pages_remote() > ... and some other gup variants ... > > I think we really should have pin_* vs get_* variants as they are very > different in terms of guarantees and after conversion, any use of get_* > variant in non-mm code should be closely scrutinized. OTOH pin_longterm_* > don't look *that* useful to me and just using pin_* instead with > FOLL_LONGTERM flag would look OK to me and somewhat reduce the number of > functions which is already large enough? What do people think? I don't fe= el > too strongly about this but wanted to bring this up. I'd vote for FOLL_LONGTERM should obviate the need for {get,pin}_user_pages_longterm(). It's a property that is passed by the call site, not an internal flag.