Received: by 2002:a05:7412:31a9:b0:e2:908c:2ebd with SMTP id et41csp1194751rdb; Sat, 9 Sep 2023 13:18:04 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGtUuwV7bfoqmohPvH11MTtf4SWgP5XnmLB3cxHubS7Q/IrdwHQza/qsBp85dwVegNkpvPV X-Received: by 2002:a05:6830:22f8:b0:6b9:a926:4a12 with SMTP id t24-20020a05683022f800b006b9a9264a12mr6465237otc.28.1694290683893; Sat, 09 Sep 2023 13:18:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694290683; cv=none; d=google.com; s=arc-20160816; b=Z1LzbOzjN6aEVcjyHE2OzubY0SjscLhxHSc03yL6VoS4AOr3Rvs4rSfj+cW/ApUSx1 Cj2BEOATHpmHClqn/1pkz2YSlTaffIerlygahCvfB8rF6dqiwYGALOztUw97t1Qm85oh foLKAVSFeGQeV/hKawcfGkQ5EatWVDsw57MY7bSHPpOT8/3bAwAavZNE2s8l7W3ezDiH CLJ1sj+eORWOlDsMukHXPBJgj5sAwNl8RHsEiiuMe+XGGInhgtZfNWghCEcJZWL0CzN/ TdXW8UMH353RA8eHYCMtOfXbgpNCIM+E9i7o+kz2Z5CoOmcyDFVQbXALnnegxXYx7faE e24g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=rpfyh4e1MVZaISYm0PQnbRauN04bvUl9gF86VrCKggY=; fh=gWVhxCmXMYA5CUgkIfc4Vzidcqpn2C849ZpeWXDcKtk=; b=t19XndShD1i1fs+bLO0Vc1Klg4BurO8dfVyxAqZMLKe5GaJh1lkjmX/1aov5ykbeW3 agkK1fybJJLk3dooM7vKKqjtV0H/m+YRz7qfVJVNV4XOouwSho/TcQU0lQezAuUPyRtj FJYJ00GQ677ml0Wt4Hy6mLviS/kTBO03lyBmlIyEglZ7qCMKMUpUf1xBE9JZrN6xmYXp uBFlDU/aA9Z3Ls3UGMv68H9XK9kA4FULFWk0QxbgdmDh1nmToNDAITAlw2WolOuIBpEf Nzq8/iLf8lUoE1Q66C4PFZtcGO7rW6aO50heCXPHErijGgyN0Y9b/wbbjtRAYMkJQMk+ TWsA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d21-20020a656215000000b005440b9f013csi3526527pgv.899.2023.09.09.13.17.55; Sat, 09 Sep 2023 13:18:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238316AbjIILSq (ORCPT + 28 others); Sat, 9 Sep 2023 07:18:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36948 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231315AbjIILSp (ORCPT ); Sat, 9 Sep 2023 07:18:45 -0400 Received: from verein.lst.de (verein.lst.de [213.95.11.211]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 249F4CF2; Sat, 9 Sep 2023 04:18:41 -0700 (PDT) Received: by verein.lst.de (Postfix, from userid 2407) id 54BCD6732D; Sat, 9 Sep 2023 13:18:35 +0200 (CEST) Date: Sat, 9 Sep 2023 13:18:34 +0200 From: Christoph Hellwig To: David Hildenbrand Cc: Christoph Hellwig , Jan Kara , David Howells , Peter Xu , Lei Huang , miklos@szeredi.hu, Xiubo Li , Ilya Dryomov , Jeff Layton , Trond Myklebust , Anna Schumaker , Latchesar Ionkov , Dominique Martinet , Christian Schoenebeck , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , John Fastabend , Jakub Sitnicki , Boris Pismenny , linux-nfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, ceph-devel@vger.kernel.org, linux-mm@kvack.org, v9fs@lists.linux.dev, netdev@vger.kernel.org Subject: Re: getting rid of the last memory modifitions through gup(FOLL_GET) Message-ID: <20230909111834.GA11859@lst.de> References: <20230905141604.GA27370@lst.de> <0240468f-3cc5-157b-9b10-f0cd7979daf0@redhat.com> <20230908081544.GB8240@lst.de> <8698ba1f-fc5d-a82e-842b-100dc8957f2f@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8698ba1f-fc5d-a82e-842b-100dc8957f2f@redhat.com> User-Agent: Mutt/1.5.17 (2007-11-01) X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Fri, Sep 08, 2023 at 06:48:05PM +0200, David Hildenbrand wrote: > vmsplice_to_pipe() -> iter_to_pipe() -> iov_iter_get_pages2() > > So it ends up calling get_user_pages_fast() > > ... and not using FOLL_PIN|FOLL_LONGTERM > > Why FOLL_LONGTERM? Because it's a longterm pin, where unprivileged users > can grab a reference on a page for all eternity, breaking CMA and memory > hotunplug (well, and harming compaction). > > Why FOLL_PIN? Well FOLL_LONGTERM only applies to FOLL_PIN. But for > anonymous memory, this will also take care of the last remaining hugetlb > COW test (trigger COW unsharing) as commented back in: > > https://lore.kernel.org/all/02063032-61e7-e1e5-cd51-a50337405159@redhat.com/ Well, I'm not against it. It just isn't required for deadling with file system writeback vs GUP modification race this thread was started for. >> Can KVM page tables use file backed shared mappings? > > Yes, usually shmem and hugetlb. But with things like emulated > NVDIMMs/virtio-pmem for VMs, easily also ordinary files. > > But it's really not ordinary write access through GUP. It's write access > via a secondary page table (secondary MMU), that's synchronized to the > process page table -- just like if the CPU would be writing to the page > using the process page tables (primary MMU). Writing through the process page tables takes a write faul when first writing, which calls into ->page_mkwrite in the file system. Does the synchronization take care of that? If not we need to add or emulate it. > ptrace will find the pagecache page writable in the page table (PTE write > bit set), if it intends to write to the page (FOLL_WRITE). If it is not > writable, it will trigger a page fault that informs the file system. Yes, that case is (mostly) fine. > > With an FS that wants writenotify, we will not map a page writable (PTE > write bit not set) unless it is dirty (PTE dirty bit set) IIRC. > > So are we concerned about a race between the filesystem removing the PTE > write bit (to catch next write access before it gets dirtied again) and > ptrace marking the page dirty? Yes. This is the race that we've run into with various GUP users. > Yes. However, secondary MMU users (like KVM) would need some way to keep > making use of that; ideally, using a proper separate interface instead of > (ab)using plain GUP and confusing people :) I'mm all for that.