Received: by 2002:a05:6358:45e:b0:b5:b6eb:e1f9 with SMTP id 30csp120198rwe; Wed, 31 Aug 2022 17:45:55 -0700 (PDT) X-Google-Smtp-Source: AA6agR6qHDCPvpIXh+xyL1pUQsDBkToG+RsunVR4EDl2LfcoZebnAauE6RIeh8mKkj2lhY7bSHXK X-Received: by 2002:aa7:c519:0:b0:448:c61:8c30 with SMTP id o25-20020aa7c519000000b004480c618c30mr20170185edq.109.1661993155345; Wed, 31 Aug 2022 17:45:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1661993155; cv=none; d=google.com; s=arc-20160816; b=KHMgWpRCx9Ljt5MYcrlS72B02Lch9BISK1xW1tFS9G4L3m/q67/8jhoLudxW6XKoFP qrLBRrEAPiJCxyHs7ZruSaUW6WDJEkaNdwbWvNAmVLR1PMmrm8Z4ABO1NUVJqccBglty lRaMTgUpftmvRVFOYRaQhE0SGj1sZxgX+A0vmyVej/i1CmoRi6RYHSwej6OpY0guX1By 284FZi79lvX3XIb+SBM/1gXmspgTUeqdPpXR0sQ2mmmqUCTF2LvdCYEo6W7gamVx+2RV YyVEmNd7brUVYtkbEXojuzwun6/dvT2fKsgBsx0npOjZELmx0DMuecTJg+TJNOuDBWFM 6A7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=pAUCRheD7HEjVi/ypvCmDmDlGNpLzHPc+8Ry1BLGL5Y=; b=r9LiCloD0eWesg7rkUs1rvF/X5H6tKpEoKMrYrbX+DXdGrWs79b0i6qajMwFWhUJzL C/KDRS2xJlV1wIViAW0i5Wu17efAkkWg/nP7srj2jPc8Jz/XyQ7JQP1DT+AEv8clgVto szoG7St7kUNcRBv8FuB7UDqgDfm/jZ+66IocHALobEjJHR+bts2IIw6q5Ak8Atw3l+BO vRGCEUML/DPI/Gn1ZJ4fj7w2F2xpa89ZMAYeGJp39ZQw6+9ZgGpvJpYvaww+Qgog8cJ5 OFcNxxx3ZzksMtNjFk9dDtoHLOnqD9Bu/W1ZLcbs9mHanVCSEoea3hit8KDOsvrYVJRe Ecbg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.org.uk header.s=zeniv-20220401 header.b="Kkt/ouct"; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zeniv.linux.org.uk Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h12-20020a05640250cc00b0044816f7f574si596036edb.308.2022.08.31.17.45.25; Wed, 31 Aug 2022 17:45:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.org.uk header.s=zeniv-20220401 header.b="Kkt/ouct"; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zeniv.linux.org.uk Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229638AbiIAAim (ORCPT + 99 others); Wed, 31 Aug 2022 20:38:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58382 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232112AbiIAAil (ORCPT ); Wed, 31 Aug 2022 20:38:41 -0400 Received: from zeniv.linux.org.uk (zeniv.linux.org.uk [IPv6:2a03:a000:7:0:5054:ff:fe1c:15ff]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 69DDAFE057; Wed, 31 Aug 2022 17:38:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=linux.org.uk; s=zeniv-20220401; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=pAUCRheD7HEjVi/ypvCmDmDlGNpLzHPc+8Ry1BLGL5Y=; b=Kkt/ouct9XIMw6tkOmsU6S4gh0 m0aJRIol5LWmRPlMQj6EEcUbBd6CL+yJgxrzADT1Wf4exlM1w2B2Y+elhLK/xDSiNb5IXf9nRPHgD QjwWzunxbAmTwk68UOmqRMuAV2798RKPDIsqzqeinPmWVr0BEr8hXs4WNMYw9ptH/PlsoIgQ2VtRa QfsfNpmxbm5W0JGnHVtKlwXEKt33ktMGCPx2QvIKidb3ZAUk4/ImLf5Z3IP3dx7GZWeM1jAx2DZ88 vGTWM4OPiRSVC5Pkow1cvRVGFwb/9rxFEgEAcWniGihBMrZSR6pHMtXNYHrf9fmx7BSpC/ZX1TGME nUdUtbPQ==; Received: from viro by zeniv.linux.org.uk with local (Exim 4.95 #2 (Red Hat Linux)) id 1oTYDt-00AnMP-NE; Thu, 01 Sep 2022 00:38:21 +0000 Date: Thu, 1 Sep 2022 01:38:21 +0100 From: Al Viro To: Jan Kara Cc: John Hubbard , Andrew Morton , Jens Axboe , Miklos Szeredi , Christoph Hellwig , "Darrick J . Wong" , Trond Myklebust , Anna Schumaker , Logan Gunthorpe , linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, LKML Subject: Re: [PATCH 5/6] NFS: direct-io: convert to FOLL_PIN pages Message-ID: References: <20220827083607.2345453-1-jhubbard@nvidia.com> <20220827083607.2345453-6-jhubbard@nvidia.com> <353f18ac-0792-2cb7-6675-868d0bd41d3d@nvidia.com> <217b4a17-1355-06c5-291e-7980c0d3cea6@nvidia.com> <20220829160808.rwkkiuelipr3huxk@quack3> <20220831094349.boln4jjajkdtykx3@quack3> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220831094349.boln4jjajkdtykx3@quack3> Sender: Al Viro X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Wed, Aug 31, 2022 at 11:43:49AM +0200, Jan Kara wrote: > So after looking into that a bit more, I think a clean approach would be to > provide iov_iter_pin_pages2() and iov_iter_pages_alloc2(), under the hood > in __iov_iter_get_pages_alloc() make sure we use pin_user_page() instead of > get_page() in all the cases (using this in pipe_get_pages() and > iter_xarray_get_pages() is easy) and then make all bio handling use the > pinning variants for iters. I think at least iov_iter_is_pipe() case needs > to be handled as well because as I wrote above, pipe pages can enter direct > IO code e.g. for splice(2). > > Also I think that all iov_iter_get_pages2() (or the _alloc2 variant) users > actually do want the "pin page" semantics in the end (they are accessing > page contents) so eventually we should convert them all to > iov_iter_pin_pages2() and remove iov_iter_get_pages2() altogether. But this > will take some more conversion work with networking etc. so I'd start with > converting bios only. Not sure, TBH... FWIW, quite a few of the callers of iov_iter_get_pages2() do *NOT* need to grab any references for BVEC/XARRAY/PIPE cases. What's more, it would be bloody useful to have a variant that doesn't grab references for !iter->user_backed case - that could be usable for KVEC as well, simplifying several callers. Requirements: * recepients of those struct page * should have a way to make dropping the page refs conditional (obviously); bio machinery can be told to do so. * callers should *NOT* do something like "set an ITER_BVEC iter, with page references grabbed and stashed in bio_vec array, call async read_iter() and drop the references in array - the refs we grab in dio will serve" Note that for sync IO that pattern is fine whether we grab/drop anything inside read_iter(); for async we could take depopulating the bio_vec array to the IO completion or downstream of that. * the code dealing with the references returned by iov_iter_..._pages should *NOT* play silly buggers with refcounts - something like "I'll grab a reference, start DMA and report success; page will stay around until I get around to dropping the ref and callers don't need to wait for that" deep in the bowels of infinibad stack (or something equally tasteful) is seriously asking for trouble. Future plans from the last cycle included iov_iter_find_pages{,_alloc}() that would *not* grab references on anything other than IOVEC and UBUF (would advance the iterator, same as iov_iter_get_pages2(), though). Then iov_iter_get_...() would become a wrapper for that. After that - look into switching the users of ..._get_... to ..._find_.... Hadn't done much in that direction yet, though - need to redo the analysis first. That primitive might very well do FOLL_PIN instead of FOLL_GET for IOVEC and UBUF...