Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp164296rwb; Wed, 21 Sep 2022 19:40:14 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7OoDbCZztGsfotmIEauNu3vBV3iVBcNxeTXKSVMFgenQuE+AwJNnBxOucCTTIeqeyx2pK4 X-Received: by 2002:a17:902:dac4:b0:178:3037:680a with SMTP id q4-20020a170902dac400b001783037680amr1191107plx.37.1663814414004; Wed, 21 Sep 2022 19:40:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1663814413; cv=none; d=google.com; s=arc-20160816; b=JeAsgPGlcNr9pBMQJpsm0PjVkpoo1skezzQz5jY+b4zCC0xbz6b/lY3D53qty2eS4b Li4r46TBol6idTmWlYaTzUWQFelh4Llp3DfTDZgR6YC5dDnB0Sjnx56GSDdN3MluDgZX P43Fx3vPf2CMevOFP6kyrv3M0fj4zV56hr5Urbz0LfuoF7gYOmr5WKtb1Rk6iAjbK540 jK6TawPc+xnYCwow/lFaY4HafJBKz2YlT7ln87dSGTwZz6nd8GwIv+sTlw/YjE7CLOaw JKwHIvMZ+wDgpps0UU4IeZTAeiATSOyorM7uAtVNGw0b28ozDKhnFvk7zYq/HoFZ9JLx JiLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=ZiPBxKPeBhyg1ydjp9NE+T7YJ8ZxnXylq6Qe68BttcM=; b=WS3m/NCDzGvLPtUdGZmWxK66lyZ5Xy/iBGlGV1rwBMs6n4ndQZHiBUH60jvxl0Lh4e xE6VmAZB7fyi1DAqkBDjE/fuXt7veUsE2LEe3HVZdANo423Urmux/Mqdaca8XnFHAKRC J39VXIcw5uIlCVkReobu4yUoNSz3e8lNhGhtNcqk+sfH5TFanfMYLh5fR65n2IlmnZvt 5RsocpX4qWFXm2TSbShbOmOfVi+2JyX9DBX48aPPSf5HXHInm6WUCV0jfG1e0I9dDd1i +vgOLZuF6yI7XyPlOmO1noqdjUftRk7KCHDTrDEimVRJPkxaL7EK+3CmzoYN2pTr6dfj b08w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.org.uk header.s=zeniv-20220401 header.b=X5nKHSb5; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zeniv.linux.org.uk Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x11-20020a63db4b000000b004206ecd8f9esi4584963pgi.552.2022.09.21.19.39.50; Wed, 21 Sep 2022 19:40:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.org.uk header.s=zeniv-20220401 header.b=X5nKHSb5; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zeniv.linux.org.uk Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229986AbiIVCXN (ORCPT + 99 others); Wed, 21 Sep 2022 22:23:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53448 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229803AbiIVCXL (ORCPT ); Wed, 21 Sep 2022 22:23:11 -0400 Received: from zeniv.linux.org.uk (zeniv.linux.org.uk [IPv6:2a03:a000:7:0:5054:ff:fe1c:15ff]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 277F070E5B; Wed, 21 Sep 2022 19:23:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=linux.org.uk; s=zeniv-20220401; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=ZiPBxKPeBhyg1ydjp9NE+T7YJ8ZxnXylq6Qe68BttcM=; b=X5nKHSb5zMgYZl+FhXsP6DCn72 bEglB3uwj1u2ckoCMYriAyG5kPjUhImLESuKJd2vtNNiLCcrYBHeBdt3F6cQh6Wm3StOrkOcjQeJu 4qrCsHH5WIU26q176+DcvQKvcrbj0BY9yaqWeThAwanr0qzriUoC+LqfYejfH/Fnsff8pbPPZ71Qb NjRsD6GZMQI/wiIBI5colWhci+DcZ9INZTqBN7Niz8E0f9Fh+HHg3qeHhqTiWGfHStzVjtBSLeI0K W8V4q7B54U3lalSJFlSdAHcj1YZ49zmVV2ju998nI+9rCL2JEeqKV0rTw/xbrhrKpjGLwJQMEI0p/ suiYbjiA==; Received: from viro by zeniv.linux.org.uk with local (Exim 4.96 #2 (Red Hat Linux)) id 1obBrU-002FEi-1w; Thu, 22 Sep 2022 02:22:48 +0000 Date: Thu, 22 Sep 2022 03:22:48 +0100 From: Al Viro To: Jan Kara Cc: Christoph Hellwig , John Hubbard , Andrew Morton , Jens Axboe , Miklos Szeredi , "Darrick J . Wong" , Trond Myklebust , Anna Schumaker , David Hildenbrand , Logan Gunthorpe , linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, LKML Subject: Re: [PATCH v2 4/7] iov_iter: new iov_iter_pin_pages*() routines Message-ID: References: <20220831041843.973026-5-jhubbard@nvidia.com> <103fe662-3dc8-35cb-1a68-dda8af95c518@nvidia.com> <20220906102106.q23ovgyjyrsnbhkp@quack3> <20220914145233.cyeljaku4egeu4x2@quack3> <20220915081625.6a72nza6yq4l5etp@quack3> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220915081625.6a72nza6yq4l5etp@quack3> Sender: Al Viro X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Thu, Sep 15, 2022 at 10:16:25AM +0200, Jan Kara wrote: > > How would that work? What protects the area where you want to avoid running > > into pinned pages from previously acceptable page getting pinned? If "they > > must have been successfully unmapped" is a part of what you are planning, we > > really do have a problem... > > But this is a very good question. So far the idea was that we lock the > page, unmap (or writeprotect) the page, and then check pincount == 0 and > that is a reliable method for making sure page data is stable (until we > unlock the page & release other locks blocking page faults and writes). But > once suddently ordinary page references can be used to create pins this > does not work anymore. Hrm. > > Just brainstorming ideas now: So we'd either need to obtain the pins early > when we still have the virtual address (but I guess that is often not > practical but should work e.g. for normal direct IO path) or we need some > way to "simulate" the page fault when pinning the page, just don't map it > into page tables in the end. This simulated page fault could be perhaps > avoided if rmap walk shows that the page is already mapped somewhere with > suitable permissions. OK. As far as I can see, the rules are along the lines of * creator of ITER_BVEC/ITER_XARRAY is responsible for pages being safe. That includes * page known to be locked by caller * page being privately allocated and not visible to anyone else * iterator being data source * page coming from pin_user_pages(), possibly as the result of iov_iter_pin_pages() on ITER_IOVEC/ITER_UBUF. * ITER_PIPE pages are always safe * pages found in ITER_BVEC/ITER_XARRAY are safe, since the iterator had been created with such. My preference would be to have iov_iter_get_pages() and friends pin if and only if we have data-destination iov_iter that is user-backed. For data-source user-backed we only need FOLL_GET, and for all other flavours (ITER_BVEC, etc.) we only do get_page(), if we need to grab any references at all. What I'd like to have is the understanding of the places where we drop the references acquired by iov_iter_get_pages(). How do we decide whether to unpin? E.g. pipe_buffer carries a reference to page and no way to tell whether it's a pinned one; results of iov_iter_get_pages() on ITER_IOVEC *can* end up there, but thankfully only from data-source (== WRITE, aka. ITER_SOURCE) iov_iter. So for those we don't care. Then there's nfs_request; AFAICS, we do need to pin the references in those if they are coming from nfs_direct_read_schedule_iovec(), but not if they come from readpage_async_filler(). How do we deal with coalescence, etc.? It's been a long time since I really looked at that code... Christoph, could you give any comments on that one? Note, BTW, that nfs_request coming from readpage_async_filler() have pages locked by caller; the ones from nfs_direct_read_schedule_iovec() do not, and that's where we want them pinned. Resulting page references end up (after quite a trip through data structures) stuffed into struct rpc_rqst ->rc_recv_buf.pages[] and when a response arrives from server, they get picked by xs_read_bvec() and fed to iov_iter_bvec(). In one case it's safe since the pages are locked; in another - since they would come from pin_user_pages(). The call chain at the time they are used has nothing to do with the originator - sunrpc is looking at the arrived response to READ that matches an rpc_rqst that had been created by sender of that request and safety is the sender's responsibility.