Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49B39C433EF for ; Tue, 21 Dec 2021 18:05:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240818AbhLUSF5 (ORCPT ); Tue, 21 Dec 2021 13:05:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40536 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240821AbhLUSFz (ORCPT ); Tue, 21 Dec 2021 13:05:55 -0500 Received: from mail-ed1-x52b.google.com (mail-ed1-x52b.google.com [IPv6:2a00:1450:4864:20::52b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B4C36C061746 for ; Tue, 21 Dec 2021 10:05:54 -0800 (PST) Received: by mail-ed1-x52b.google.com with SMTP id j6so34512768edw.12 for ; Tue, 21 Dec 2021 10:05:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=VIbA3tyKRgvmJRKKTWC54hBhDnm0QxqACXVEDucCono=; b=hi49HjIoBTBvSXG2A2+cFTDsRuJjpXTbXPT2YO0B5WLC6WmWLV13V5z0TR/SlDr/63 iOmlX+Uu844DfzR1WbHs0Qw1BOG13MhIrY+PoLF4H5qxGf8+AMW/UNirqtZtyPp4EF02 DRy+25A1pXY0epijI8EyJvK8QbbP7eNmzrRr8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=VIbA3tyKRgvmJRKKTWC54hBhDnm0QxqACXVEDucCono=; b=sMvSrB2KHY6aGzV59t3BDG2U6y5V6LUqmXwcJPnFszk/Lb+gzUgfF/bKFEGlPv5ijb hwX9jh5AQKuykabyE/vrBioNophjjAiu3j5/Ve5dypUiQ9Ecah4BK3A6gAVKCe5Uz+q4 XNbx+AaPctZ6ZtdV1BiZsgOWDRNqB3D4Lcu6uZYJOlfoX37BGTiSymtavl6573/NWbud iFVmry+RdQCXpLwmih7rG10MZXcS3cCzpIHSrJkSx9r7EJHmb4kQMp2NiVaBDBLZDvXs qw32QnvVIQPUKBwFUWVmxcrM8IQyEm30v48KCRcl2NRW5LworSOJx3NRlPSVlzPBOP1u OQpQ== X-Gm-Message-State: AOAM530YuuADDdINahwGlSvf8LvP1rGYK8bQSP77qALE+mHTqQAYtTyA Vd9O/bbMYKI7oitdPEKZ91awk/Avv6yq0XRExc4= X-Google-Smtp-Source: ABdhPJx23p3rxyQxmqlGMRo0zdr9ln5LHRsrQCJ+p4N3APsTQcgZybBTwedaV6riNAKcH2wBCVXwaQ== X-Received: by 2002:a17:907:7f9e:: with SMTP id qk30mr3476152ejc.313.1640109953140; Tue, 21 Dec 2021 10:05:53 -0800 (PST) Received: from mail-wm1-f44.google.com (mail-wm1-f44.google.com. [209.85.128.44]) by smtp.gmail.com with ESMTPSA id gt7sm5497856ejc.180.2021.12.21.10.05.52 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 21 Dec 2021 10:05:52 -0800 (PST) Received: by mail-wm1-f44.google.com with SMTP id l4so169507wmq.3 for ; Tue, 21 Dec 2021 10:05:52 -0800 (PST) X-Received: by 2002:a7b:cb17:: with SMTP id u23mr3769746wmj.155.1640109639062; Tue, 21 Dec 2021 10:00:39 -0800 (PST) MIME-Version: 1.0 References: <20211218184233.GB1432915@nvidia.com> <5CA1D89F-9DDB-4F91-8929-FE29BB79A653@vmware.com> <4D97206A-3B32-4818-9980-8F24BC57E289@vmware.com> <5A7D771C-FF95-465E-95F6-CD249FE28381@vmware.com> <20211221010312.GC1432915@nvidia.com> <900b7d4a-a5dc-5c7b-a374-c4a8cc149232@redhat.com> In-Reply-To: <900b7d4a-a5dc-5c7b-a374-c4a8cc149232@redhat.com> From: Linus Torvalds Date: Tue, 21 Dec 2021 10:00:22 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v1 06/11] mm: support GUP-triggered unsharing via FAULT_FLAG_UNSHARE (!hugetlb) To: David Hildenbrand Cc: Jason Gunthorpe , Nadav Amit , Linux Kernel Mailing List , Andrew Morton , Hugh Dickins , David Rientjes , Shakeel Butt , John Hubbard , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Linux-MM , "open list:KERNEL SELFTEST FRAMEWORK" , "open list:DOCUMENTATION" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Dec 21, 2021 at 9:40 AM David Hildenbrand wrote: > > > I do think the existing "maybe_pinned()" logic is fine for that. The > > "exclusive to this VM" bit can be used to *help* that decision - > > because only an exclusive page can be pinned - bit I don't think it > > should _replace_ that logic. > > The issue is that O_DIRECT uses FOLL_GET and cannot easily be changed to > FOLL_PIN unfortunately. So I'm *trying* to make it more generic such > that such corner cases can be handled as well correctly. But yeah, I'll > see where this goes ... O_DIRECT has to be fixed one way or the other. > > John H. mentioned that he wants to look into converting that to > FOLL_PIN. So maybe that will work eventually. I'd really prefer that as the plan. What exactly is the issue with O_DIRECT? Is it purely that it uses "put_page()" instead of "unpin", or what? I really think that if people look up pages and expect those pages to stay coherent with the VM they looked it up for, they _have_ to actively tell the VM layer - which means using FOLL_PIN. Note that this is in absolutely no way a "new" issue. It has *always* been true. If some O_DIORECT path depends on pinning behavior, it has never worked correctly, and it is entirely on O_DIRECT, and not at all a VM issue. We've had people doing GUP games forever, and being burnt by those games not working reliably. GUP (before we even had the notion of pinning) would always just take a reference to the page, but it would not guarantee that that exact page then kept an association with the VM. Now, in *practice* this all works if: (a) the GUP user had always written to the page since the fork (either explicitly, or with FOLL_WRITE obviously acting as such) (b) the GUP user never forks afterwards until the IO is done (c) the GUP user plays no other VM games on that address and it's also very possible that it has worked by pure luck (ie we've had a lot of random code that actively mis-used things and it would work in practice just because COW would happen to cut the right direction etc). Is there some particular GUP user you happen to care about more than others? I think it's a valid option to try to fix things up one by one, even if you don't perhaps fix _all_ cases. Linus