Received: by 2002:a05:7412:31a9:b0:e2:908c:2ebd with SMTP id et41csp438977rdb; Fri, 8 Sep 2023 05:44:28 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGVu2Z2h7XhNDUtJz7RUsv1NB8iHo/PqtykmqCrND8KmP5bgSH5sCfu/NqJSPGwDoNwD5KM X-Received: by 2002:a17:90a:a614:b0:26b:1364:e784 with SMTP id c20-20020a17090aa61400b0026b1364e784mr2450962pjq.0.1694177067729; Fri, 08 Sep 2023 05:44:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694177067; cv=none; d=google.com; s=arc-20160816; b=NA+C0gCGurgcW1ebSwXT1ig4h99Xt7If9OPFnZJv20u6meTo5XNBghcwgVfwPuhaf8 mklLPY/q7Fcv9mlYfqQwpVudqaYoHxA8BPqf+3niCyvOjOYbkvwjwt6QKjtQKLHF1EN8 J3fPjp3tKAoEAgb5wlf3TR2+7vte9cNqpEnj0zkYYx4BQihPXNst2wDT0dAcgJfTpCHo COrq6ZT/9q+OTa2JTH2IYMEA+l7g6bu0qvgQFoegXMUOM918ZzQTpwuhpYrB56UkJCJO INnnGBb4vtHysr38KCAbgsT5k+WPgQa674eTYLwqTorAujeNXlkhFPA+JM1KNyeTPjM/ F7bg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=3FtlDQ8mcq9p1qJYt4lbWM2ds27ofM5ZnUOlno83y8I=; fh=gWVhxCmXMYA5CUgkIfc4Vzidcqpn2C849ZpeWXDcKtk=; b=GsQ3omy1ELEyvVB4H7rZbLeV0JQ/5J9KOh5RhVq9GfGNCjJjcvwE6xmoK0gCxIPTJM 4rilO13k9qKnBxHprzvsKDeqv5jEgKQ7Bsi+t9ZtTqe/fnRqQEZ+no2IW6isMse1P7rz ZbwOWjRNp/ieshrh0wQ47quPu3YDCqmJMAbuMeJLyIfEg1DeWQw/4/1XkviufmahoWuE NbPcA+kDqrCUX94R2oWcwAqNHZ4dUuB2OVSs9aKf5xA8ssUbOmijjbJI5DVjDpO+sd8a AyZQdGAZvB1wpHEm6nsEVBToNLbA+GvFesCP1j3s/IkaZwmOLit8BUCFvpTFrmTV78hO r14w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v8-20020a17090a6b0800b0026b38330c7asi3326540pjj.179.2023.09.08.05.44.00; Fri, 08 Sep 2023 05:44:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231583AbjIHIPz (ORCPT + 99 others); Fri, 8 Sep 2023 04:15:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42834 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230272AbjIHIPz (ORCPT ); Fri, 8 Sep 2023 04:15:55 -0400 Received: from verein.lst.de (verein.lst.de [213.95.11.211]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 123E01BD3; Fri, 8 Sep 2023 01:15:51 -0700 (PDT) Received: by verein.lst.de (Postfix, from userid 2407) id D265168B05; Fri, 8 Sep 2023 10:15:44 +0200 (CEST) Date: Fri, 8 Sep 2023 10:15:44 +0200 From: Christoph Hellwig To: David Hildenbrand Cc: Christoph Hellwig , Jan Kara , David Howells , Peter Xu , Lei Huang , miklos@szeredi.hu, Xiubo Li , Ilya Dryomov , Jeff Layton , Trond Myklebust , Anna Schumaker , Latchesar Ionkov , Dominique Martinet , Christian Schoenebeck , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , John Fastabend , Jakub Sitnicki , Boris Pismenny , linux-nfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, ceph-devel@vger.kernel.org, linux-mm@kvack.org, v9fs@lists.linux.dev, netdev@vger.kernel.org Subject: Re: getting rid of the last memory modifitions through gup(FOLL_GET) Message-ID: <20230908081544.GB8240@lst.de> References: <20230905141604.GA27370@lst.de> <0240468f-3cc5-157b-9b10-f0cd7979daf0@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <0240468f-3cc5-157b-9b10-f0cd7979daf0@redhat.com> User-Agent: Mutt/1.5.17 (2007-11-01) X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Wed, Sep 06, 2023 at 11:42:33AM +0200, David Hildenbrand wrote: >> and iov_iter_get_pages_alloc2. We have three file system direct I/O >> users of those left: ceph, fuse and nfs. Lei Huang has sent patches >> to convert fuse to iov_iter_extract_pages which I'd love to see merged, >> and we'd need equivalent work for ceph and nfs. >> >> The non-file system uses are in the vmsplice code, which only reads > > vmsplice really has to be fixed to specify FOLL_PIN|FOLL_LONGTERM for good; > I recall that David Howells had patches for that at one point. (at least to > use FOLL_PIN) Hmm, unless I'm misreading the code vmsplace is only using iov_iter_get_pages2 for reading from the user address space anyway. Or am I missing something? >> After that we might have to do an audit of the raw get_user_pages APIs, >> but there probably aren't many that modify file backed memory. > > ptrace should apply that ends up doing a FOLL_GET|FOLL_WRITE. Yes, if that ends up on file backed shared mappings we also need a pin. > Further, KVM ends up using FOLL_GET|FOLL_WRITE to populate the second-level > page tables for VMs, and uses MMU notifiers to synchronize the second-level > page tables with process page table changes. So once a PTE goes from > writable -> r/o in the process page table, the second level page tables for > the VM will get updated. Such MMU users are quite different from ordinary > GUP users. Can KVM page tables use file backed shared mappings? > Converting ptrace might not be desired/required as well (the reference is > dropped immediately after the read/write access). But the pin is needed to make sure the file system can account for dirtying the pages. Something we fundamentally can't do with get. > The end goal as discussed a couple of times would be the to limit FOLL_GET > in general only to a couple of users that can be audited and keep using it > for a good reason. Arbitrary drivers that perform DMA should stop using it > (and ideally be prevented from using it) and switch to FOLL_PIN. Agreed, that's where I'd like to get to. Preferably with the non-pin API not even beeing epxorted to modules.