Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp1197696rwb; Fri, 23 Sep 2022 09:19:58 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6n+p1/coXeClfY9LJt5gSVN6bap+Bg9YsoJJ+sDd/v+vppQvkUeWougIQ/1LbyPzl+FSgm X-Received: by 2002:a17:907:3e86:b0:6f5:917:10cc with SMTP id hs6-20020a1709073e8600b006f5091710ccmr7987305ejc.53.1663949998543; Fri, 23 Sep 2022 09:19:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1663949998; cv=none; d=google.com; s=arc-20160816; b=k7V5K7xPQyrvjphzAtDZgcWpDXTJbea3gZQB4nqoMOjr/jjloJRKvqVzCVAvUgZH8k /9WwQikACvkliZwnC3o403fQOTvgTAIdMmzrZAdQ07aOHAaMrSd3ls+7ealtSrti3gZl td2LhwG5/rmLbxYdyhOKovXeHnomwSDbME9PGJA8GbgGn4Y3uziIZGGAmsX5rtdlCZ29 BwvVdeXZpYefvo+vsKN2DKgdAUed98janmCWeM2WsqqmkmvhoyHeaQKUzxA+XoqaLyry X0ucH4C2lTY6DqRKevwe73mQKhPWryT+KXCcnLRvHh+buprecWuSfKZQwbH0YElEQAEY D8FQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=YgHH6pk3nJLXW2jbXtGoxm+4FKcaifQJYVKj+aRwkxg=; b=TmuzE78Xwyw9ZxPJ1udF4+4YVaHwIlgJJRp37MlQ5yD1RiyrYv5BNkA921XTXwwE97 U0pqyytVhaFx5phMnidPsYBIueM2/CwDved8KkB7KDypKQznnfb/v6sM/nd/o+GDwqt9 RQbGFV4mBZyebpUsT2nr9R1sylLK0JG8+U8pTEIr1B8pTLcWcOsIfPx9b6JC/LYEve1X 448F3PvYQ1rt13BLSG09MfGmxE+wOyBn5vEXpAnkDuZrdu74lW/Nud7CuEugV5mHr//q wTnqDH9NZcIlPfsTddBikia7pQxDW0fbD65szDHQ27oS//FOK0T6l6wtZHrdQtBltkaF 7y9g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.org.uk header.s=zeniv-20220401 header.b=ELnSTSol; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zeniv.linux.org.uk Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l14-20020aa7c30e000000b0045240b50f7fsi7332863edq.515.2022.09.23.09.19.20; Fri, 23 Sep 2022 09:19:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.org.uk header.s=zeniv-20220401 header.b=ELnSTSol; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zeniv.linux.org.uk Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231819AbiIWQOF (ORCPT + 99 others); Fri, 23 Sep 2022 12:14:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36328 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229495AbiIWQOE (ORCPT ); Fri, 23 Sep 2022 12:14:04 -0400 Received: from zeniv.linux.org.uk (zeniv.linux.org.uk [IPv6:2a03:a000:7:0:5054:ff:fe1c:15ff]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 439E91280F1; Fri, 23 Sep 2022 09:14:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=linux.org.uk; s=zeniv-20220401; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=YgHH6pk3nJLXW2jbXtGoxm+4FKcaifQJYVKj+aRwkxg=; b=ELnSTSol/sfK6s31ClYuS8gszF sgXyMtpRrY266URjhhF2+1SDFZPQ78gjuIuhX2IKz28nBNhRVMvO+lvdYB0u9QfxT77iP5VaWVadQ XQ7c8C2XILfSmvEUEZNndcp74iadwyK5fGrJ2BFrCVFtzqPwlVzV8idNgG0D9vuRVL5c60jrYWDzq 7t0rwxdFdeOKfLhnBFBDnyEJQ9KRzGyGnf7Y6ovolAFvEL3wYKawkB7tGYFSqpHRkMjTbebYoWz82 P+56yyxTj49Ngg8b0QDC7L5mid0UQH+ffCszwa9c1DnG3p+HPNkrve7pWm/ViH77+sQ5wQ2HwLSDK 1uOznKEw==; Received: from viro by zeniv.linux.org.uk with local (Exim 4.96 #2 (Red Hat Linux)) id 1oblJ8-002uO3-0s; Fri, 23 Sep 2022 16:13:42 +0000 Date: Fri, 23 Sep 2022 17:13:42 +0100 From: Al Viro To: Christoph Hellwig Cc: Jan Kara , John Hubbard , Andrew Morton , Jens Axboe , Miklos Szeredi , "Darrick J . Wong" , Trond Myklebust , Anna Schumaker , David Hildenbrand , Logan Gunthorpe , linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, LKML Subject: Re: [PATCH v2 4/7] iov_iter: new iov_iter_pin_pages*() routines Message-ID: References: <20220906102106.q23ovgyjyrsnbhkp@quack3> <20220914145233.cyeljaku4egeu4x2@quack3> <20220915081625.6a72nza6yq4l5etp@quack3> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: Al Viro X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Fri, Sep 23, 2022 at 01:44:33AM -0700, Christoph Hellwig wrote: > Why would I? We generall do have or should have the iov_iter around. Not for async IO. > And for the common case where we don't (bios) we can carry that > information in the bio as it needs a special unmap helper anyway. *Any* async read_iter is like that. > > Where are they getting > > dropped and what guarantees that IO is complete by that point? > > The exact place depens on the exact taaraget frontend of which we have > a few. But it happens from the end_io callback that is triggered > through a call to target_complete_cmd. OK... > > The reason I'm asking is that here you have an ITER_BVEC possibly fed to > > __blkdev_direct_IO_async(), with its > > if (iov_iter_is_bvec(iter)) { > > /* > > * Users don't rely on the iterator being in any particular > > * state for async I/O returning -EIOCBQUEUED, hence we can > > * avoid expensive iov_iter_advance(). Bypass > > * bio_iov_iter_get_pages() and set the bvec directly. > > */ > > bio_iov_bvec_set(bio, iter); > > which does *not* grab the page referneces. Sure, bio_release_pages() knows > > to leave those alone and doesn't drop anything. However, what is the > > mechanism preventing the pages getting freed before the IO completion > > in this case? > > The contract that callers of bvec iters need to hold their own > references as without that doing I/O do them would be unsafe. It they > did not hold references the pages could go away before even calling > bio_iov_iter_get_pages (or this open coded bio_iov_bvec_set). You are mixing two issues here - holding references to pages while using iov_iter instance is obvious; holding them until async IO is complete, even though struct iov_iter might be long gone by that point is a different story. And originating iov_iter instance really can be long-gone by the time of IO completion - requirement to keep it around would be very hard to satisfy. I've no objections to requiring the pages in ITER_BVEC to be preserved at least until the IO completion by means independent of whatever ->read_iter/->write_iter does to them, but * that needs to be spelled out very clearly and * we need to verify that it is, indeed, the case for all existing iov_iter_bvec callers, preferably with comments next to non-obvious ones (something that is followed only by the sync IO is obvious) That goes not just for bio - if we make get_pages *NOT* grab references on ITER_BVEC (and I'm all for it), we need to make sure that those pages won't be retained after the original protection runs out. Which includes the reference put into struct nfs_request, for example, as well as whatever ceph transport is doing, etc. Another thing that needs to be sorted out is __zerocopy_sg_from_iter() and its callers - AFAICS, all of those are in ->sendmsg() with MSG_ZEROCOPY in flags. It's a non-trivial amount of code audit - we have about 40 iov_iter_bvec() callers in the tree, and while many are clearly sync-only... the ones that are not tend to balloon into interesting analysis of call chains, etc. Don't get me wrong - that analysis needs to be done, just don't expect it to be trivial. And it will require quite a bit of cooperation from the folks familiar with e.g. drivers/target, or with ceph transport layer, etc. FWIW, my preference would be along the lines of Backing memory for any non user-backed iov_iter should be protected from reuse by creator of iov_iter and that protection should continue through not just all synchronous operations with iov_iter in question - it should last until all async operations involving that memory are finished. That continued protection must not depend upon taking extra page references, etc. while we are working with iov_iter. But getting there will take quite a bit of code audit and probably some massage as well.