Received: by 2002:a05:6a10:af89:0:0:0:0 with SMTP id iu9csp5477901pxb; Wed, 26 Jan 2022 13:00:01 -0800 (PST) X-Google-Smtp-Source: ABdhPJzMiytEWIVCQhqDIs0NLdhJf/ZlRaZ/Ru5CYd+iljcVe/d1GhCV9K7smHojTVmbfBhJ46ti X-Received: by 2002:a05:6402:14c8:: with SMTP id f8mr718588edx.205.1643230801494; Wed, 26 Jan 2022 13:00:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1643230801; cv=none; d=google.com; s=arc-20160816; b=hBvgdHrSFJPsmmTTBWLlNYLUnB+4xHJSXLRP/ADFftCuRaeFB3RjJVBZILBMpxMbIW IOsVJp8WwxkmcjaGi465VHyKk4izQH0byXqAPGJ5OLGasN1wz6Jj8RaFoGYy6R53VC0B rq9q+jT7dTpR6ZUsHwuiRS0jDo3JhgOVNcHNW6ir8mOTe//R0qRUnM60zlr7TMEReEeQ 06gJ4fg/V+IO/0mNmTpvL5tq4Sfi1hcODffXMZhmn1cT0LJreYSUk341S78qjuh+PLAh NCSY53cwRii6EpOR52NHhkd0XFiQ1ECfNzvIkbWsNaBZ6TPhUI/JhadjIW1oq6sajAi7 x74w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:subject :organization:from:references:cc:to:content-language:user-agent :mime-version:date:message-id:dkim-signature; bh=i+iYnOCszqZtWRIHCDiP4bSvj4+rWsxJQtP1TOvOfEg=; b=rnfpwxYIunXiLMmo5Oyw3WGY4XgV7Vp7c+7D09BoNzprb1R/pVxTQIyvUywUKoCAZB QpeCtikqc0+k4Di8ePZ6V+F5on4+C6caMssPGpOMRuOjROPNMyP9MVZZoTlCUIcIsuMI 7N6UUYp57fdi9rl+AhTS7QGVjeoaZtVtgl3n/11bh9m0kmJOI3Wm3ECXBHMGLqv0Krbg E48yh+OOzyPLi1X0sIiSJYTT4t+nusgfmZ4qLsgQ92B7F3nktVyrIsdAFnm1O782ohq3 YDLNX5tEdHB5+LzUE3DKxNRBxbLnQWBc19uTP0LsnxA/Jv0+1dNq2PDjSfjmJWPCXbTl dMbg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=h9huao2w; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f9si179419edd.415.2022.01.26.12.59.36; Wed, 26 Jan 2022 13:00:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=h9huao2w; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239802AbiAZKQt (ORCPT + 99 others); Wed, 26 Jan 2022 05:16:49 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:59868 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232071AbiAZKQr (ORCPT ); Wed, 26 Jan 2022 05:16:47 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643192207; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=i+iYnOCszqZtWRIHCDiP4bSvj4+rWsxJQtP1TOvOfEg=; b=h9huao2wPaaeqjPnpODf8ow11SeM3qHDM10+blDOcW5QThGD8fSWjrb7qzRtAOELQjn1l5 aWB7UjlcDaErnY1QbN6+T0cyXGNHL42yOv98Au5fiq3K/NrWv14UU82ZU29knFmsqVYseT Sn44Rh23zhUEcCAXnIio8NarkCgl0ZA= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-563-DIusorWnPLybADSk88XE1A-1; Wed, 26 Jan 2022 05:16:45 -0500 X-MC-Unique: DIusorWnPLybADSk88XE1A-1 Received: by mail-wm1-f69.google.com with SMTP id d184-20020a1c1dc1000000b00350769d4bcfso1182405wmd.4 for ; Wed, 26 Jan 2022 02:16:45 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=i+iYnOCszqZtWRIHCDiP4bSvj4+rWsxJQtP1TOvOfEg=; b=syBClro7pXNLOAeJI0Rd+ukrjUzWwrvZEo0OSbDXsZ7HJFa6bqRVkXyIRewOxEC6fZ vTlUoWzzQXGxIhK7A4YHV9qrwj7dvf02qObh55PdTX62nsTHGOjq5B3uCbrX+PzPYpjf hGvPpxOjgUF76PdPuz/vrxrzIboqrsh19kwmD+Dj0TSDrPRny4KJ8/J2kF2bk47i3UJc NbatOUhYfWbsWj80XzkNaFDPG2t2/22q6zb5I2RZD9SEC1q0I0DkaNaFyOBpZ7fezJVn WKnMo0eYprgXO9nYYpsAlm/eXedYldPYH5db7n4iU7iIschuqbOa51B7rt1lK603JgBw K3eQ== X-Gm-Message-State: AOAM532aQdztNVddwM/YrrlPfcETP1R3AoBcG72KPGbnP6kMCktEauuL qEqx53oN0lA+v6Jc7EzLiQgZEl79LlYAqVVsZhz9YGRWY3gPkIXzqfhTSfSMSu8/b4A2s4bGEX/ 6qI+c7RiXJkSO+zWwBLP/abZ9 X-Received: by 2002:a1c:7416:: with SMTP id p22mr6891612wmc.30.1643192204647; Wed, 26 Jan 2022 02:16:44 -0800 (PST) X-Received: by 2002:a1c:7416:: with SMTP id p22mr6891585wmc.30.1643192204361; Wed, 26 Jan 2022 02:16:44 -0800 (PST) Received: from ?IPV6:2003:cb:c709:2700:cdd8:dcb0:2a69:8783? (p200300cbc7092700cdd8dcb02a698783.dip0.t-ipconnect.de. [2003:cb:c709:2700:cdd8:dcb0:2a69:8783]) by smtp.gmail.com with ESMTPSA id g6sm16786801wrq.97.2022.01.26.02.16.43 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 26 Jan 2022 02:16:43 -0800 (PST) Message-ID: Date: Wed, 26 Jan 2022 11:16:42 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.0 Content-Language: en-US To: Matthew Wilcox , "Kirill A. Shutemov" Cc: Khalid Aziz , akpm@linux-foundation.org, longpeng2@huawei.com, arnd@arndb.de, dave.hansen@linux.intel.com, rppt@kernel.org, surenb@google.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Peter Xu References: <20220125114212.ks2qtncaahi6foan@box.shutemov.name> <20220125135917.ezi6itozrchsdcxg@box.shutemov.name> <20220125185705.wf7p2l77vggipfry@box.shutemov.name> From: David Hildenbrand Organization: Red Hat Subject: Re: [RFC PATCH 0/6] Add support for shared PTEs across processes In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 26.01.22 05:04, Matthew Wilcox wrote: > On Tue, Jan 25, 2022 at 06:59:50PM +0000, Matthew Wilcox wrote: >> On Tue, Jan 25, 2022 at 09:57:05PM +0300, Kirill A. Shutemov wrote: >>> On Tue, Jan 25, 2022 at 02:09:47PM +0000, Matthew Wilcox wrote: >>>>> I think zero-API approach (plus madvise() hints to tweak it) is worth >>>>> considering. >>>> >>>> I think the zero-API approach actually misses out on a lot of >>>> possibilities that the mshare() approach offers. For example, mshare() >>>> allows you to mmap() many small files in the shared region -- you >>>> can't do that with zeroAPI. >>> >>> Do you consider a use-case for many small files to be common? I would >>> think that the main consumer of the feature to be mmap of huge files. >>> And in this case zero enabling burden on userspace side sounds like a >>> sweet deal. >> >> mmap() of huge files is certainly the Oracle use-case. With occasional >> funny business like mprotect() of a single page in the middle of a 1GB >> hugepage. > > Bill and I were talking about this earlier and realised that this is > the key point. There's a requirement that when one process mprotects > a page that it gets protected in all processes. You can't do that > without *some* API because that's different behaviour than any existing > API would produce. A while ago I talked with Peter about an extended uffd (here: WP) mechanism that would work on fds instead of the process address space. The rough idea would be to register the uffd (or however that would be called) handler on an fd instead of a virtual address space of a single process and write-protect pages in that fd. Once anybody would try writing to such a protected range (write, mmap, ...), the uffd handler would fire and user space could handle the event (-> unprotect). The page cache would have to remember the uffd information ("wp using uffd"). When (un)protecting pages using this mechanism, all page tables mapping the page would have to be updated accordingly using the rmap. At that point, we wouldn't care if it's a single page table (e.g., shared similar to hugetlb) or simply multiple page tables. It's a completely rough idea, I just wanted to mention it. -- Thanks, David / dhildenb