Received: by 2002:a05:6a10:2785:0:0:0:0 with SMTP id ia5csp3221693pxb; Tue, 12 Jan 2021 09:11:31 -0800 (PST) X-Google-Smtp-Source: ABdhPJxgg7nOhjP/O7f+0G62G7/CmXKDwL8RWsxZ7zhSmJ9/aAdBkQe5p3rqCT1n/dwJB4HeREkq X-Received: by 2002:a17:907:417f:: with SMTP id oe23mr3708565ejb.259.1610471491445; Tue, 12 Jan 2021 09:11:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1610471491; cv=none; d=google.com; s=arc-20160816; b=nth+/cOqvPK9RqAMYJsvVFPCx8hGf0xqI+QHm9L148oCs6RZtduHYo1ZAyyt0Ds8IC wFNG9ABf3oQjN7MGpypjE1vcLJH96xzeBtDBfmlxRenV6uf9hTnhbYzVlZLvcf6MpJon +0Zu6YeAW3lzw6il60ivBceRP4n3WOgk03CWdKTG+JYXsctNEttd7xR1auV7DAeg4tbV 37M4LYwsTVppNu6vl4Z9XptvVQhfudkj7xf7vBxutVkewFjnT7z4UGexiofjlxksd2Bi 4SB0K91EClI27nSbk0tfckzjV3VTLy+lCQBloSlPRX6LJJOGWQRfppEh/7eiH/F5c/tP sdZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=GVD5MjoIzjvlsMuQH4aDwj0JiXmMzP8uI8E4vjfzXjc=; b=Uiou+9XYHTsROYxBalj8o9pCP3mWTRbB//WX4Ra4ifay1dpC2jvCrnmUqKBDooDy0j 8dEOkYyzAWjlMgrA1gmH0iCts4SscUzocE71FdYvrRhTzMTqaVBQlFSR9AxqhioqHKx0 B9lOqMPzJxvXo/b0HnBlW+OjmsnbKuHw89lopgn9oF3eDQPJktg4ig75l+cJLUZ4Hjd5 IzypL5zPTBgqTp5W4mlMsSawAXn21BQODvzbkHUE0y8E3XWGlC/KTbxlTK62zD9sgItZ WFWfbwuD3xx7I2NsOym6bH2p254qbGn4bWU6jqWvUhvD10/1kZUn8lWuBwo80VrReZ32 EByA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=aiX6W+2h; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y90si1700448edy.314.2021.01.12.09.11.07; Tue, 12 Jan 2021 09:11:31 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=aiX6W+2h; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2392577AbhALRI1 (ORCPT + 99 others); Tue, 12 Jan 2021 12:08:27 -0500 Received: from mail.kernel.org ([198.145.29.99]:49580 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391783AbhALRI1 (ORCPT ); Tue, 12 Jan 2021 12:08:27 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 8D26E2311F for ; Tue, 12 Jan 2021 17:07:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1610471266; bh=cvSSa71LyJluu9Pmxtib+8PkanVmQCGP38gGA91zZLo=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=aiX6W+2hh9vDBoB5yv2VVNmnSpj5y6boU05JLBlRKrkF6ssO1umskBbq9qpjMFke7 DfaIo30hZbYD26zXLgy0bUfbG7lGPEhk/WKRlS7KnJ/WY8cXIiOHa8UvajWoVj1lJ/ b/ZibR6VJzQ7PgQfiTRZs2C9vHeXYEKiSnEVGfoOdJHfjLVaXxeWScOZB78JkKE7YF /TYMBS6N+gUlsmY5LW/jZq7grlkjntd0SeaTJd0vpJ3d9M3GuZa1NR3IFkhg6W5F/J tyQpOIL1gfwdQIRnbA6Uq3+sqDvYc+3BIb34M/dOQ2JHItgbGGAE4zVNqUROb1Fi4j xI1wJKljfY3VQ== Received: by mail-ed1-f44.google.com with SMTP id i24so3125138edj.8 for ; Tue, 12 Jan 2021 09:07:46 -0800 (PST) X-Gm-Message-State: AOAM532tXuKbdKZJzQAaasE4PYNuzBd9iMkT8wzK+CgWSqezBgKH7x+5 tRw6fTYRr7Dsg0uRgL5lwO+6gG0fNWTwUdVrs+6/Yw== X-Received: by 2002:aa7:c3cd:: with SMTP id l13mr91352edr.97.1610471265004; Tue, 12 Jan 2021 09:07:45 -0800 (PST) MIME-Version: 1.0 References: <20210110004435.26382-1-aarcange@redhat.com> <45806a5a-65c2-67ce-fc92-dc8c2144d766@nvidia.com> In-Reply-To: From: Andy Lutomirski Date: Tue, 12 Jan 2021 09:07:31 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 0/1] mm: restore full accuracy in COW page reuse To: Linus Torvalds Cc: John Hubbard , Andrea Arcangeli , Andrew Morton , Linux-MM , Linux Kernel Mailing List , Yu Zhao , Andy Lutomirski , Peter Xu , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , Minchan Kim , Will Deacon , Peter Zijlstra , Hugh Dickins , "Kirill A. Shutemov" , Matthew Wilcox , Oleg Nesterov , Jann Horn , Kees Cook , Leon Romanovsky , Jason Gunthorpe , Jan Kara , Kirill Tkhai , Nadav Amit , Jens Axboe Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jan 11, 2021 at 2:18 PM Linus Torvalds wrote: > > On Mon, Jan 11, 2021 at 11:19 AM Linus Torvalds > wrote: > Actually, what I think might be a better model is to actually > strengthen the rules even more, and get rid of GUP_PIN_COUNTING_BIAS > entirely. > > What we could do is just make a few clear rules explicit (most of > which we already basically hold to). Starting from that basic > > (a) Anonymous pages are made writable (ie COW) only when they have a > page_count() of 1 Seems reasonable to me. > > That very simple rule then automatically results in the corollary > > (b) a writable page in a COW mapping always starts out reachable > _only_ from the page tables Seems reasonable. I guess that if the COW is triggered by GUP, then it starts out reachable only from the page tables but then because reachable through GUP very soon thereafter. > > and now we could have a couple of really simple new rules: > > (c) we never ever make a writable page in a COW mapping read-only > _unless_ it has a page_count() of 1 I don't love this. Having mprotect() fail in a multithreaded process because another thread happens to be doing a short-lived IO seems like it may result in annoying intermittent bugs. As I understand it, the issue is that the way we determine that we need to COW a COWable page is that we see that it's read-only. It would be nice if we could separately track "the VMA allows writes" and "this PTE points to a page that is private to the owning VMA", but maybe there's no bit available for the latter other than looking at RO vs RW directly. > > (d) we never create a swap cache page out of a writable COW mapping page > > Now, if you combine these rules, the whole need for the > GUP_PIN_COUNTING_BIAS basically goes away. > > Why? Because we know that the _only_ thing that can elevate the > refcount of a writable COW page is GUP - we'll just make sure nothing > else touches it. How common is !FOLL_WRITE GUP? We could potentially say that a short-term !FOLL_WRITE GUP is permitted on an RO COW page and that a subsequent COW on the page will wait for the GUP to go away. This might be too big a can of worms for the benefit it would provide, though. --Andy