Received: by 2002:a05:6358:45e:b0:b5:b6eb:e1f9 with SMTP id 30csp289476rwe; Wed, 31 Aug 2022 02:55:16 -0700 (PDT) X-Google-Smtp-Source: AA6agR7IjXLYgGeINZ8yVaV/qIxaMHuPGf3/l8QDgu0qR+EMKPVak3WRQX7PbtR7vZ78hzorrAmV X-Received: by 2002:aa7:d913:0:b0:447:bac0:4ea9 with SMTP id a19-20020aa7d913000000b00447bac04ea9mr23318167edr.426.1661939716276; Wed, 31 Aug 2022 02:55:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1661939716; cv=none; d=google.com; s=arc-20160816; b=XyK/yYdKX7Ql3Y8RPihDCGFVLq0fqEzb1pXuqi9OvJFmtM+maTc2CHtgBh8MOcRr+1 Gy+0EWzjrJMVc1y4OON5yM9EyvplQvaAg4S8rqwhoywGEuxhAKeBU/h51PRcfOfXTwxX /Yhrskc5mPbT9YQBcGFy2g8QDQYI1VnDqDBlwc3gD15guKbEJjAYStLyBQL3/FgbYl69 72ulBStYe/AjzqWATx4+utCRiu1l1tCIQlzKqw1mOLuOGUaOT/iA9R/oNMjEAgId2tHx 6len2rZsgvxG5YxC7r6XergOi2sxq7qvaHn3W7dv3p+mDXIRI6jWufs2Rufb7/f9TxZ1 zUJQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature :dkim-signature; bh=Wf21pEEc2qn4pGtb/SmWthvBwjNCurQ6Mq3fuLKCgXM=; b=PONWIwmKdnc6UhZ3XvA1ghQJEMpM5D7gpn2irF05f1XIqXsASCqSDh40CDMKtQwLJ9 7FkIhE4fNKpVd7BFf6FqIZXH6Np2MTEz+HYFR1IVfTw7qQIcy6gdNxXQ065FyU7DgWYY 8Q0FS9jLTwm/3gy37yvZFX+D218SybNgWRgzLhG6196Jvx2Gx6k9duql/AOwQzouaZt4 b2v/w94lNdSbnqMdLUpZehuU9yFEvnqxl7C93PEaRPrFL8DHphsW94X9OL+0vVngUkxo qJq0p9h1we0v/7wQgz61J6Je01GNYnAFC60FFcMDcjcmodySQiQKWE+fPHjI3EHSPDZW 2moA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=ZCv8hkQM; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519 header.b="dbelnQ/V"; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hg7-20020a1709072cc700b00730a539ac16si6206773ejc.982.2022.08.31.02.54.43; Wed, 31 Aug 2022 02:55:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=ZCv8hkQM; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519 header.b="dbelnQ/V"; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230023AbiHaJn5 (ORCPT + 99 others); Wed, 31 Aug 2022 05:43:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49042 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230014AbiHaJnx (ORCPT ); Wed, 31 Aug 2022 05:43:53 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 606A8CE4BC; Wed, 31 Aug 2022 02:43:51 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id CD2992226F; Wed, 31 Aug 2022 09:43:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1661939029; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Wf21pEEc2qn4pGtb/SmWthvBwjNCurQ6Mq3fuLKCgXM=; b=ZCv8hkQMv8VLUCjw+uWsF2LGUYsHrCxsVqq3OQGJcExeJM0/HRGBDw8+/sxrpshYy2Oa6v 2swiTLpvpZjw3UzuYHRoCD3p8cxabr9zKD9K0aRKKqOX8D3uLDV5eAFnU9JbpejhZ7Ws3a iDt5IueX4Sm74kxBss2tlSq3x1auDNg= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1661939029; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Wf21pEEc2qn4pGtb/SmWthvBwjNCurQ6Mq3fuLKCgXM=; b=dbelnQ/VdnnD4iu+J0ymYt3LS1MSca1P78Kt0qZxPB51wzUaPAjopEjwqz/m9HSF/ZLIPg JDWp+J0k//jhQACg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id B6DDE1332D; Wed, 31 Aug 2022 09:43:49 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id FXOYLFUtD2PMTAAAMHmgww (envelope-from ); Wed, 31 Aug 2022 09:43:49 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id 2F5D0A067B; Wed, 31 Aug 2022 11:43:49 +0200 (CEST) Date: Wed, 31 Aug 2022 11:43:49 +0200 From: Jan Kara To: John Hubbard Cc: Jan Kara , Al Viro , Andrew Morton , Jens Axboe , Miklos Szeredi , Christoph Hellwig , "Darrick J . Wong" , Trond Myklebust , Anna Schumaker , Logan Gunthorpe , linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, LKML Subject: Re: [PATCH 5/6] NFS: direct-io: convert to FOLL_PIN pages Message-ID: <20220831094349.boln4jjajkdtykx3@quack3> References: <20220827083607.2345453-1-jhubbard@nvidia.com> <20220827083607.2345453-6-jhubbard@nvidia.com> <353f18ac-0792-2cb7-6675-868d0bd41d3d@nvidia.com> <217b4a17-1355-06c5-291e-7980c0d3cea6@nvidia.com> <20220829160808.rwkkiuelipr3huxk@quack3> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_SOFTFAIL, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Mon 29-08-22 12:59:26, John Hubbard wrote: > On 8/29/22 09:08, Jan Kara wrote: > >> However, the core block/bio conversion in patch 4 still does depend upon > >> a key assumption, which I got from a 2019 email discussion with > >> Christoph Hellwig and others here [1], which says: > >> > >> "All pages released by bio_release_pages should come from > >> get_get_user_pages...". > >> > >> I really hope that still holds true. Otherwise this whole thing is in > >> trouble. > >> > >> [1] https://lore.kernel.org/kvm/20190724053053.GA18330@infradead.org/ > > > > Well as far as I've checked that discussion, Christoph was aware of pipe > > pages etc. (i.e., bvecs) entering direct IO code. But he had some patches > > [2] which enabled GUP to work for bvecs as well (using the kernel mapping > > under the hood AFAICT from a quick glance at the series). I suppose we > > could also handle this in __iov_iter_get_pages_alloc() by grabbing pin > > reference instead of plain get_page() for the case of bvec iter. That way > > we should have only pinned pages in bio_release_pages() even for the bvec > > case. > > OK, thanks, that looks viable. So, that approach assumes that the > remaining two cases in __iov_iter_get_pages_alloc() will never end up > being released via bio_release_pages(): > > iov_iter_is_pipe(i) > iov_iter_is_xarray(i) > > I'm actually a little worried about ITER_XARRAY, which is a recent addition. > It seems to be used in ways that are similar to ITER_BVEC, and cephfs is > using it. It's probably OK for now, for this series, which doesn't yet > convert cephfs. So after looking into that a bit more, I think a clean approach would be to provide iov_iter_pin_pages2() and iov_iter_pages_alloc2(), under the hood in __iov_iter_get_pages_alloc() make sure we use pin_user_page() instead of get_page() in all the cases (using this in pipe_get_pages() and iter_xarray_get_pages() is easy) and then make all bio handling use the pinning variants for iters. I think at least iov_iter_is_pipe() case needs to be handled as well because as I wrote above, pipe pages can enter direct IO code e.g. for splice(2). Also I think that all iov_iter_get_pages2() (or the _alloc2 variant) users actually do want the "pin page" semantics in the end (they are accessing page contents) so eventually we should convert them all to iov_iter_pin_pages2() and remove iov_iter_get_pages2() altogether. But this will take some more conversion work with networking etc. so I'd start with converting bios only. Honza -- Jan Kara SUSE Labs, CR