Received: by 2002:a05:6a10:9afc:0:0:0:0 with SMTP id t28csp735263pxm; Fri, 25 Feb 2022 18:43:01 -0800 (PST) X-Google-Smtp-Source: ABdhPJwOxBELQFoZxpW7iwXW07XbxSmMXrucSTzDEB6Z4fqfwIpxkWWQRqeW+oa+UfjpQAH8yrxW X-Received: by 2002:a05:6808:118e:b0:2d4:6fe7:6bd7 with SMTP id j14-20020a056808118e00b002d46fe76bd7mr1708487oil.146.1645843381542; Fri, 25 Feb 2022 18:43:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645843381; cv=none; d=google.com; s=arc-20160816; b=N9273VsVViwR/z9wGkoN6NqdFWWGumwq43BgtRPoU8x0CVII1CopUu1/7rfQ7euo1m rC0T3Bb2IHjKHmP9kKfid3BlSszWNr80oQZdXcHhc3ji4bXfWVY1BWq4b396hAHQHny8 vwPemEqFfA1om08dV9+CcMIIv9jabUrOlTUKH8IQ4ApIWqKnDfgk225kGRRqagcpWUuj 0dLqPTWbFOJ/Vkfjv8wLnicmfm/V397cakKX1HQO86jYtTDWfEjoyCIsxrZFh+KC9rFm hErYNBsJiFz3hDzPRdinveAAq+PuubRnlmm1aVtJvpe9X43rtNfQOTHH1Yc0b1y7ADIL 7y4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=Q8f5wCo9Vvyu7MpzvW5hwwuZiPnQ2yb/OeICZFQczdY=; b=h360lJAvzhmdWc4D5Ips0p7yI6qnxJkGsbzWGJ17/PIN6QT8xqKCHY04oyOQRam0lE cWLF8AZImumZXl8cXjIIkqK+1G4k9ToHJExDp00oM5LGSpsLVVA8FYRCv3+N2u0AQ6Ci oV2WeaXDM+jgoOwVwA98wahllj752uz4A9c2FBVLB0R8OQjCUX3m1QJdzizDH61bR1/Z gQXFVMJQZzavjHk8YYSvYNh0z5YdkbWiEnW+OBC4yZcmWL8Bd+CRXgpSh6ZFutoiTIB+ auZaUffhIwmykgO4jeGRFnyBcSYXoIAbSSOPudyNB0xrTS16X0EKRdOYDA25wAHeJTTC iunw== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning linux-ext4-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=mit.edu Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id 90-20020a9d08e3000000b005af3e5bd000si2211107otf.259.2022.02.25.18.43.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Feb 2022 18:43:01 -0800 (PST) Received-SPF: softfail (google.com: domain of transitioning linux-ext4-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning linux-ext4-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=mit.edu Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 4BB5A31B047; Fri, 25 Feb 2022 18:07:46 -0800 (PST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229571AbiBZBmQ (ORCPT + 99 others); Fri, 25 Feb 2022 20:42:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60100 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229498AbiBZBmP (ORCPT ); Fri, 25 Feb 2022 20:42:15 -0500 Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D92A292EC8; Fri, 25 Feb 2022 17:41:35 -0800 (PST) Received: from cwcc.thunk.org (pool-108-7-220-252.bstnma.fios.verizon.net [108.7.220.252]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 21Q1eagt021711 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 25 Feb 2022 20:40:36 -0500 Received: by cwcc.thunk.org (Postfix, from userid 15806) id 0635415C0038; Fri, 25 Feb 2022 20:40:36 -0500 (EST) Date: Fri, 25 Feb 2022 20:40:36 -0500 From: "Theodore Ts'o" To: John Hubbard Cc: Eric Biggers , Lee Jones , linux-ext4@vger.kernel.org, Christoph Hellwig , Dave Chinner , Goldwyn Rodrigues , "Darrick J . Wong" , Bob Peterson , Damien Le Moal , Andreas Gruenbacher , Ritesh Harjani , Greg Kroah-Hartman , Johannes Thumshirn , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, cluster-devel@redhat.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH -v3] ext4: don't BUG if kernel subsystems dirty pages without asking ext4 first Message-ID: References: <2f9933b3-a574-23e1-e632-72fc29e582cf@nvidia.com> <303059e6-3a33-99cb-2952-82fe8079fa45@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <303059e6-3a33-99cb-2952-82fe8079fa45@nvidia.com> X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Fri, Feb 25, 2022 at 04:41:14PM -0800, John Hubbard wrote: > > > f2fs and btrfs's compressed file write support, by making things work > > much like the write(2) system call. Imagine if we had a > > "pin_user_pages_local()" which calls write_begin(), and a > > "unpin_user_pages_local()" which calls write_end(), and the > > Right, that would supply the missing connection to the filesystems. > > In fact, maybe these names about right: > > pin_user_file_pages() > unpin_user_file_pages() > > ...and then put them in a filesystem header file, because these are now > tightly coupled to filesystems, what with the need to call > .write_begin() and .write_end(). Well, that makes it process_vm_writev()'s is that it needs to know when to call pin_user_file_pages(). I suspect that for many use cases --- for example, if this is being used by a debugger to modify a variable on a stack, or an anonymous page in the program's data segment, process_vm_writev() *isn't* actually pinning a file. So they want some kind of interface that automatically DTRT regardless of whether the user pages being edited are file-backed or not file-backed. So some kind of [un]pin_user_pages_local() which will call write_{begin,end}() if necessary would be the most convenient for users such as process_vm_writev(). And perhaps would it make sense for pin_user_pages to optionally (or by default?) check for file-backed pages, and if it finds any, return an error or stop pinning pages at that point, so the system call can return EOPNOSUPP to the user, instead of silently causing user data to be lost or corrupted as is currently the case with xfs and btrfs (and ext4 once I patch it so it doesn't BUG). I'll note that at least one caller of pin_user_pages, in fs/io_uring.c takes it upon itself to check for file-backed pages, and returns EOPNOTSUPP if there are any found. Many that should be lifted to pin_user_pages()? For that matter, maybe pin_user_pages() and friends should take some new FOLL_ flags to indicate whether file-backed pages should be rejected, or perhaps they can promise they will only be holding the pin for a very short amount of time (FOLL_SHORTERM?), and then pin_user_pages() and unpin_user_pages() can automagically call write_begin() and write_end() if necessary? I dunno.... - Ted