Received: by 2002:a05:6a10:6d10:0:0:0:0 with SMTP id gq16csp836041pxb; Tue, 12 Apr 2022 14:50:18 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxWoHAPQPT6RuqfLe0L3mGxbJGoltlJlBccBlktcVZB6d6vI1wMU9m3xzKcZKhWYgi+LEI+ X-Received: by 2002:a17:902:7684:b0:156:25b3:ef6b with SMTP id m4-20020a170902768400b0015625b3ef6bmr39177316pll.39.1649800218112; Tue, 12 Apr 2022 14:50:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649800218; cv=none; d=google.com; s=arc-20160816; b=mw3oNrkui+RyvP4eTUWc1F5EAnE5ut/EG37PrJgohCtmFGU2OV/4jIvQPUfOk2ecer sLWN8h+MdAjNmgJMOiwSLowphqOPVAZbRl10eLzHY8Ld6tN1VhmkgOzgPf51U+phpMi6 tk5RX8yQ+Dxjqgtt+mGN3LlVDOn7fWlb0jKtTEz/Shy1fphUCSJmE/9NarAAckNy9Wge F15ZBOBYQ5c79Dr7cbEKtepimNIpgilchwZ2D/FBda4LWnRqK4uVkGbuA7hQ4Cfnb2Iv UmszmLorEAvF40Gec57mok+jtTjNcJryl3GP+bCjU5spb9K8hnLJ+Jr81z+VLiADiUU3 QjxA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=bMrJHV/z+bX/cuAH7pcyQOkcKK7bzLU7VyTOOCpZyb4=; b=EM+IQyHSe3WpZ/msEyKzZlKNB/DjeeFkv96CDW7pt25KvqBxK2Q8fik+Yoc6NO0Hwc qAFYLnPu3xg3QiWnXIz7+DOBc5OxDI87l+N/hm5fEt1VC+KMavZ82FUprIMhm0dopUpj vaT4q/Z11yK8xGrPKt1mLKNlwe5qaJ/R1j1PzXGhYzdLpDfB6VxNruk5RuAWjbuCnkFV oxoUzcWKQNjL5PmdTncMebfCIvD4ezZFMeWSlx1navG0e6330DLnVWBWTYvDcYl8jIu1 5csgfBh5Pda4ZipqXVPPRDl4rEguTf4al3FTuaiOVmk3zrIFKKJwm8iwPS3g0qqJ4pkN XpiQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=erkRbNGL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id hk8-20020a17090b224800b001cbc357005asi3409907pjb.174.2022.04.12.14.50.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Apr 2022 14:50:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=erkRbNGL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id A804D1480C7; Tue, 12 Apr 2022 13:52:19 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347695AbiDKRkP (ORCPT + 99 others); Mon, 11 Apr 2022 13:40:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37376 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231695AbiDKRkP (ORCPT ); Mon, 11 Apr 2022 13:40:15 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EBAA521834; Mon, 11 Apr 2022 10:37:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=bMrJHV/z+bX/cuAH7pcyQOkcKK7bzLU7VyTOOCpZyb4=; b=erkRbNGLY+gX7G3uPm3SG0mTx3 Jdx7PliLE1m6zK4NNVitRkQJ6unZgko9E1n00r2oC94gRsCHlLsiGCswwo30LyKn8N3eyiV0eHzGK KVQo5nx4BpZwhqUwnIow2CDwbB9TMMWDq5CJBWafSeNZYkXctNhJmdLRMiPKfS0wr8sH/Y3MkvOCu I6C7teJwx6u095ud9PIjrshxJo1aZWTwEIXb+0lfYBvYJZsVtn4juZ7I/t1Dy8HbVWfv5MuWCO5fF Uu/6C3J3litqwbHDD4o+p6RHN5Igw3cMd9w3oAoQfGeia3NmBkehIjhXciZ3tFpmt7WX/oWeThaxq N/Ln3s2A==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1ndxyI-00CX12-Kx; Mon, 11 Apr 2022 17:37:02 +0000 Date: Mon, 11 Apr 2022 18:37:02 +0100 From: Matthew Wilcox To: Khalid Aziz Cc: akpm@linux-foundation.org, aneesh.kumar@linux.ibm.com, arnd@arndb.de, 21cnbao@gmail.com, corbet@lwn.net, dave.hansen@linux.intel.com, david@redhat.com, ebiederm@xmission.com, hagen@jauu.net, jack@suse.cz, keescook@chromium.org, kirill@shutemov.name, kucharsk@gmail.com, linkinjeon@kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, longpeng2@huawei.com, luto@kernel.org, markhemm@googlemail.com, pcc@google.com, rppt@kernel.org, sieberf@amazon.com, sjpark@amazon.de, surenb@google.com, tst@schoebel-theuer.de, yzaikin@google.com Subject: Re: [PATCH v1 00/14] Add support for shared PTEs across processes Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 11, 2022 at 10:05:44AM -0600, Khalid Aziz wrote: > Page tables in kernel consume some of the memory and as long as number > of mappings being maintained is small enough, this space consumed by > page tables is not objectionable. When very few memory pages are > shared between processes, the number of page table entries (PTEs) to > maintain is mostly constrained by the number of pages of memory on the > system. As the number of shared pages and the number of times pages > are shared goes up, amount of memory consumed by page tables starts to > become significant. All of this is true. However, I've found a lot of people don't see this as compelling. I've had more success explaining this from a different direction: --- 8< --- Linux supports processes which share all of their address space (threads) and processes that share none of their address space (tasks). We propose a useful intermediate model where two or more cooperating processes can choose to share portions of their address space with each other. The shared portion is referred to by a file descriptor which processes can choose to attach to their own address space. Modifications to the shared region affect all processes sharing that region, just as changes by one thread affect all threads in a multithreaded program. This implies a certain level of trust between the different processes (ie malicious processes should not be allowed access to the mshared region). --- 8< --- Another argument that MM developers find compelling is that we can reduce some of the complexity in hugetlbfs where it has the ability to share page tables between processes. One objection that was raised is that the mechanism for starting the shared region is a bit clunky. Did you investigate the proposed approach of creating an empty address space, attaching to it and using an fd-based mmap to modify its contents? > int mshare_unlink(char *name) > > A shared address range created by mshare() can be destroyed using > mshare_unlink() which removes the shared named object. Once all > processes have unmapped the shared object, the shared address range > references are de-allocated and destroyed. > > mshare_unlink() returns 0 on success or -1 on error. Can you explain why this is a syscall instead of being a library function which does int dirfd = open("/sys/fs/mshare"); err = unlinkat(dirfd, name, 0); close(dirfd); return err; Does msharefs support creating directories, so that we can use file permissions to limit who can see the sharable files? Or is it strictly a single-level-deep hierarchy?