Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D9D1EC433F5 for ; Sun, 19 Dec 2021 17:59:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236250AbhLSR75 (ORCPT ); Sun, 19 Dec 2021 12:59:57 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:45857 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233274AbhLSR74 (ORCPT ); Sun, 19 Dec 2021 12:59:56 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1639936795; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5t0uLGB88euDBAUi/hIHWgd5nmy1opcjQo/ctPwzp+0=; b=PZF0pycGn+yHd/HDvzlVlvV5nwCsAdNGi1AYGjfJYzSs8ehmVoNMXnao7qm1DOSt/S6hRI e8Ue/raeHAcm2d2R5eY7FapxrKy/+2vDgqB0Vqm3px98SZezW9tu8veJIswqGDzpfxlg8d ET85TUPSFtwljOti1NWhizKXkw2guVk= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-544-14D7AFtFM_mXYWPxAhELHA-1; Sun, 19 Dec 2021 12:59:54 -0500 X-MC-Unique: 14D7AFtFM_mXYWPxAhELHA-1 Received: by mail-wm1-f72.google.com with SMTP id n41-20020a05600c502900b003335ab97f41so2860127wmr.3 for ; Sun, 19 Dec 2021 09:59:54 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=5t0uLGB88euDBAUi/hIHWgd5nmy1opcjQo/ctPwzp+0=; b=FoZ5ynYwkueP7nyx463uQGGqzg70Kos6iAkmz3b7BrbjRLa0wYpA8XZzD/o4KlpZ+M ZLdYHdkjJLIIm7D99oedhDLgp/cLbzKiwWIhAXJxJLhvLSftC6ZkjD6GgO+QHltfacJj sA4kTey2u3MDTn8FFPOU5gjJD9ieVbuRH0F0azhA8nBdcVARx3e494todruoTTMdBPWv by3HFLlm1O54DpIuaUgLXpYUfd+S3v1rYJtwmTzfER5qxn3Pmb127xbUmQNHy5FEB6b6 Px4aKQ5gmiPeWo9YcLF/57FwXJ5Bte7xiGHQgQUcyrI6JLEL07RyG2QGZ4ZF5952/p5K zVVQ== X-Gm-Message-State: AOAM533tWlw7ycOpLQ8hkNXneCBWujurp8hU3/m5Y8WrICzcWPAiGyp8 qcbpSp537g96fVCC45G+c9sBDC8/p6u0nt1LUOJx8mlFZhYz+Gh49MzSO4Ls2LCrOyQYbGoddr8 92pvu9oaqWGU6uL85+jaUUQw6 X-Received: by 2002:a05:600c:2943:: with SMTP id n3mr8015437wmd.167.1639936793327; Sun, 19 Dec 2021 09:59:53 -0800 (PST) X-Google-Smtp-Source: ABdhPJwsJEpquzSqMiXV3ss+CHHi+TlMEUCcvL8NS/DbzmPPCdr4VFbmvYDBP18cD5LrX1DYfzRajQ== X-Received: by 2002:a05:600c:2943:: with SMTP id n3mr8015420wmd.167.1639936793109; Sun, 19 Dec 2021 09:59:53 -0800 (PST) Received: from [192.168.3.132] (p4ff23c6f.dip0.t-ipconnect.de. [79.242.60.111]) by smtp.gmail.com with ESMTPSA id b197sm13271359wmb.24.2021.12.19.09.59.51 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 19 Dec 2021 09:59:52 -0800 (PST) Message-ID: Date: Sun, 19 Dec 2021 18:59:51 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0 Content-Language: en-US To: Linus Torvalds , Nadav Amit Cc: Jason Gunthorpe , Linux Kernel Mailing List , Andrew Morton , Hugh Dickins , David Rientjes , Shakeel Butt , John Hubbard , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Linux-MM , "open list:KERNEL SELFTEST FRAMEWORK" , "open list:DOCUMENTATION" References: <54c492d7-ddcd-dcd0-7209-efb2847adf7c@redhat.com> <20211217204705.GF6385@nvidia.com> <2E28C79D-F79C-45BE-A16C-43678AD165E9@vmware.com> <20211218030509.GA1432915@nvidia.com> <5C0A673F-8326-4484-B976-DA844298DB29@vmware.com> <20211218184233.GB1432915@nvidia.com> <5CA1D89F-9DDB-4F91-8929-FE29BB79A653@vmware.com> <4D97206A-3B32-4818-9980-8F24BC57E289@vmware.com> <5A7D771C-FF95-465E-95F6-CD249FE28381@vmware.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v1 06/11] mm: support GUP-triggered unsharing via FAULT_FLAG_UNSHARE (!hugetlb) In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 19.12.21 18:44, Linus Torvalds wrote: > David, you said that you were working on some alternative model. Is it > perhaps along these same lines below? > > I was thinking that a bit in the page tables to say "this page is > exclusive to this VM" would be a really simple thing to deal with for > fork() and swapout and friends. > > But we don't have such a bit in general, since many architectures have > very limited sets of SW bits, and even when they exist we've spent > them on things like UDDF_WP., > > But the more I think about the "bit doesn't even have to be in the > page tables", the more I think maybe that's the solution. > > A bit in the 'struct page' itself. > Exactly what I am prototyping right now. > For hugepages, you'd have to distribute said bit when you split the hugepage. Yes, that's one tricky part ... > > But other than that it looks quite simple: anybody who does a virtual > copy will inevitably be messing with the page refcount, so clearing > the "exclusive ownership" bit wouldn't be costly: the 'struct page' > cacheline is already getting dirtied. > > Or what was your model you were implying you were thinking about in > your other email? You said I'm playing with the idea of not setting the bit always during COW but only on GUP request to set the bit (either manually if possible or via FOLL_UNSHARE). That's a bit more tricky but allows for decoupling that approach completely from the page_pin() counter. fork() is allowed to clear the bit if page_count() == 1 and share the page. So no GUP->no fork() performance changes (!) . Otherwise the bit can only vanish if we swapout/migrate the page: in which case there are no additional GUP/references on the page that rely on it! The bit can be set directly if we have to copy the page in the fault handler (COW or unshare). Outside of COW/Unshare code, the bit can only be set if page_count() == 1 and we sync against fork(). (and that's the problem for gup-fast-only that I'm investigating right now, because it would then always have to fallback to the slow variant if the bit isn't already set) So the bit can "vanish" whenever there is no additional reference on the page. GUP syncs against fork() and can thereby set the bit/request to set the bit. I'm trying to decouple it completely from the page_pin() counter to also be able to handle FOLL_GET (O_DIRECT reproducers unfortunately) correctly. Not set it stone, just an idea what I'm playing with right now ... and I have to tripple-check if * page is PTE mapped in the page table I'm walking * page_count() == 1 Really means that "this is the only reference.". I do strongly believe so .. :) -- Thanks, David / dhildenb