Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8FB9BC433F5 for ; Sun, 19 Dec 2021 17:45:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236165AbhLSRpC (ORCPT ); Sun, 19 Dec 2021 12:45:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37092 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232418AbhLSRpB (ORCPT ); Sun, 19 Dec 2021 12:45:01 -0500 Received: from mail-ed1-x531.google.com (mail-ed1-x531.google.com [IPv6:2a00:1450:4864:20::531]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0D5FFC061574 for ; Sun, 19 Dec 2021 09:45:01 -0800 (PST) Received: by mail-ed1-x531.google.com with SMTP id y22so29340107edq.2 for ; Sun, 19 Dec 2021 09:45:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=37CNA591iIXmlmMafqkXJhXbVBzDgE9AqG9TjO4gWSk=; b=EGjpeBlMwidry0LgAPZByewyzxqi1TxS8Ma+x7tUC7MePHkwhun+ud8sjUPXmbw8lb tSNLnq69ftAxYPRofGwjVYV/1gVXtPZQf0iYqC5xHN53fV2Gel87pMwvLJTWstmkmOSx eTyb/hZEid7yI9vmpkQVOXMCCcCHMmVfAeXDY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=37CNA591iIXmlmMafqkXJhXbVBzDgE9AqG9TjO4gWSk=; b=A3QyAee+SGGE+oJu9EA2eC3mkR6GnF7U8RNhxgq18g6eqvj8RDI2uWwUdz5M0dGEK1 mlhJqeyd4NqxjNgDKjclMsm+70+pTc5EeLlx0D3xdqB2mweB0YK5Zr7CTB5E3+z4+Aom UwxftSoV3ubl3Mcmmme12jCni+JaMBDAxYa195Lm9+jqrsZEYets+bsN5D7JZtKYA1R6 xcU4c5zq8nEWauh3vr1kWgDdzwuU2lQ5beyloZL/mgnrj/MbjOKDrvcrHbb34vkcazSV Gqxi/cRfJVzvOV9biv1OTld4O1REKZXjSyCUBCEk49skbqkuZ9akQfTvAcx8So2ICOrh /vhA== X-Gm-Message-State: AOAM532aoSWq1LO1A1dQrm02mVsXjkynyBNRM3JPVUjSYEX83dPDjEUA ncOJJBpeqifp2H4XZegR+gbA3T7EQ/RxdDiP0Po= X-Google-Smtp-Source: ABdhPJyiIeg//HLli1NpY+ihs2iTo0FKY+FRbk384fUK7EftM76MbLOrGfh/b4WE9ae69mdAVs/GCw== X-Received: by 2002:a17:907:3e0f:: with SMTP id hp15mr1285743ejc.432.1639935899469; Sun, 19 Dec 2021 09:44:59 -0800 (PST) Received: from mail-wr1-f41.google.com (mail-wr1-f41.google.com. [209.85.221.41]) by smtp.gmail.com with ESMTPSA id f22sm5939383edf.93.2021.12.19.09.44.58 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 19 Dec 2021 09:44:58 -0800 (PST) Received: by mail-wr1-f41.google.com with SMTP id j18so15420023wrd.2 for ; Sun, 19 Dec 2021 09:44:58 -0800 (PST) X-Received: by 2002:adf:f54e:: with SMTP id j14mr10061016wrp.442.1639935898113; Sun, 19 Dec 2021 09:44:58 -0800 (PST) MIME-Version: 1.0 References: <54c492d7-ddcd-dcd0-7209-efb2847adf7c@redhat.com> <20211217204705.GF6385@nvidia.com> <2E28C79D-F79C-45BE-A16C-43678AD165E9@vmware.com> <20211218030509.GA1432915@nvidia.com> <5C0A673F-8326-4484-B976-DA844298DB29@vmware.com> <20211218184233.GB1432915@nvidia.com> <5CA1D89F-9DDB-4F91-8929-FE29BB79A653@vmware.com> <4D97206A-3B32-4818-9980-8F24BC57E289@vmware.com> <5A7D771C-FF95-465E-95F6-CD249FE28381@vmware.com> In-Reply-To: From: Linus Torvalds Date: Sun, 19 Dec 2021 09:44:41 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v1 06/11] mm: support GUP-triggered unsharing via FAULT_FLAG_UNSHARE (!hugetlb) To: Nadav Amit Cc: David Hildenbrand , Jason Gunthorpe , Linux Kernel Mailing List , Andrew Morton , Hugh Dickins , David Rientjes , Shakeel Butt , John Hubbard , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Linux-MM , "open list:KERNEL SELFTEST FRAMEWORK" , "open list:DOCUMENTATION" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org David, you said that you were working on some alternative model. Is it perhaps along these same lines below? I was thinking that a bit in the page tables to say "this page is exclusive to this VM" would be a really simple thing to deal with for fork() and swapout and friends. But we don't have such a bit in general, since many architectures have very limited sets of SW bits, and even when they exist we've spent them on things like UDDF_WP., But the more I think about the "bit doesn't even have to be in the page tables", the more I think maybe that's the solution. A bit in the 'struct page' itself. For hugepages, you'd have to distribute said bit when you split the hugepage. But other than that it looks quite simple: anybody who does a virtual copy will inevitably be messing with the page refcount, so clearing the "exclusive ownership" bit wouldn't be costly: the 'struct page' cacheline is already getting dirtied. Or what was your model you were implying you were thinking about in your other email? You said "I might have had an idea yesterday on how to fix most of the issues without relying on the mapcount, doing it similar [..]" but I didn't then reply to that email because I had just written this other long email to Nadav. Linus On Sun, Dec 19, 2021 at 9:27 AM Linus Torvalds wrote: > > Adding another bit in the page tables - *purely* to say "this VM owns > the page outright" - would be fairly powerful. And fairly simple. > > Then any COW event will set that bit - because when you actually COW, > the page you install is *yours*. No questions asked. > [ snip snip ] > > Btw, the extra bit doesn't really have to be in the page tables. It > could be a bit in the page itself. We could add another page bit that > we just clear when we do the "add ref to page as you make a virtual > copy during fork() etc". > > And no, we can't use "pincount" either, because it's not exact. The > fact that the page count is so elevated that we think it's pinned is a > _heuristic_, and that's ok when you have the opposite problem, and ask > "*might* this page be pinned". You want to never get a false negative, > but it can get a false positive. > > Linus