Received: by 2002:a05:6358:111d:b0:dc:6189:e246 with SMTP id f29csp1857835rwi; Thu, 3 Nov 2022 09:49:31 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5QEQdAUNm6Q6lNh9SpADzeN7F6839hhjoi7IIk8jnpRK29ehyu/Rz08TFUROUZyMeCP33v X-Received: by 2002:a17:906:cc58:b0:7a9:6b4e:79e4 with SMTP id mm24-20020a170906cc5800b007a96b4e79e4mr29062482ejb.57.1667494171600; Thu, 03 Nov 2022 09:49:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667494171; cv=none; d=google.com; s=arc-20160816; b=QE59VU+aWASvTzzdG5HNDyEVMcD+BWJ3zMYD780GhaWduLmpzd6gierDbzm6ULWt+p Ld6/GhvJqT58p+C7xQJgPCmb0RZLb9Oq63D7K9481RVWyxG9OKYBNp4URu0oigLGDMT5 XisyNt0h3Z+z1UyV6UJf7BLJvdtDzQipvV8P+ekuoHcCjyclyXemcYT21j1xlErOqHMs HZH+UG+TGUYrdKt3NwB3ehRlwu0XstgbtYBgwjLe/GpGONjZD4t2OfkSOBhCCIU7E1Ql zVoHAIEIMkgqiclIO7McaYb0DvOUGbXDVng1E2WECL+GJacrR3MmmeYgDxc+ZpD1WiOI BgCg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=Qbc9/7qEOtavwLw8LQp9XE09/9bhelNZbWmBfVDGw/4=; b=0AAIZj+DM5Kmg7/KiOwdWuKKdudbDnuk/qDFgi4ddothwvoPIkPhPa01FxFMphEyab WQ4itJAbvv3U4iiGMf4p98889507fg+YwJfziECGOlOUiq0i9jUrfjvK7y6rDC9JwJOW vQzjn4GM7AcVE/cmTv9OQVjo6z6TuNm2dmQCVfi9y4gTYELiQVRxaRkSt3ILfYF77FMh L0fuM8h0r1Wbo1VgTsi1YHlYXKS+OXqea4ZKqW11VB2JlyULkoYNc0SttlPqW1W2NJ2v fPtT2cUj6guMfv3nPmDIBfCp0Ywv1ybg+ZFAuZpd8XzhFRJCX3tDCoZPlJTAXPtGj7r7 miXQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=nRKfJ4QQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id jg14-20020a170907970e00b00732fa9d3df0si1881891ejc.795.2022.11.03.09.49.07; Thu, 03 Nov 2022 09:49:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=nRKfJ4QQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231822AbiKCQ2S (ORCPT + 97 others); Thu, 3 Nov 2022 12:28:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51810 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231789AbiKCQ1t (ORCPT ); Thu, 3 Nov 2022 12:27:49 -0400 Received: from mail-pf1-x42b.google.com (mail-pf1-x42b.google.com [IPv6:2607:f8b0:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5AC221CB2B for ; Thu, 3 Nov 2022 09:27:23 -0700 (PDT) Received: by mail-pf1-x42b.google.com with SMTP id i3so2075285pfc.11 for ; Thu, 03 Nov 2022 09:27:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Qbc9/7qEOtavwLw8LQp9XE09/9bhelNZbWmBfVDGw/4=; b=nRKfJ4QQ3+p6c+xIQjyLkoMwJvMnwYkAuaPcWumPlxJnVKcvToJsNiOAGYenmja0Sy 0RoJkXdIm/TX+G8gnDZ+obkXWdspQdCFRI8jyQ0Jbs1WkCbaBSCKMFWUKWu6G3aYiNf+ UQS/IMoncWgX6OOaPRMaCTiHd4UobAg5+BLo68MES2g/g2g52drNelqM/o5S6rL276vQ 59B9YcfdRSBbO8RH0qvr4sgiqXAKyJqC46UCoo6fS3cz6SnqinDkZiU40iLNXr90+JsU JTENrIEpqr75AzmQXB9iCQJUqbZTKhHGlXw36gDu1kRfRLSYiBKX6y1lz/w+ys/ieg8s 2YFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Qbc9/7qEOtavwLw8LQp9XE09/9bhelNZbWmBfVDGw/4=; b=3PCOkPwtIuJFooQwWbdr/M6P0w2+6lYkvNEkxcQVwbzgfM0WCciGJ/Sid0FMwG9697 4ewwu3s4C3AMsyORTdhwISXyx0YoYJkBGwMcsbCH6EZyiHPk9CDD6W6+UQh5LG5vrC7M 5ip27yC/M+8HTItfHWlBxB1ZI2MnLU8XXHIMDqPJzy5WKuabZnzvhzCjfzpg962YZbgY OagowY86OJFWNDm7Qi7aqTYxDLEuj/Yt2T6qCRUsLvfvW4BZCyi3IS/GpzHwQH6yzW7D Ls1vMa52ll/CPXcinjF5m679hd5174M4orI5LbsFLpV602G+Wujz0Dt7LAZR33QGVzLa lI7g== X-Gm-Message-State: ACrzQf1s6DCXcC0K158NrrPVIbRf4u71iLae1+oTG6eTcicDYS9q9bGp K7DHyBX+VCajGMkwgar8mOg9a0bIVdZVLylfxqXOww== X-Received: by 2002:a63:c4c:0:b0:46f:e243:503a with SMTP id 12-20020a630c4c000000b0046fe243503amr14314192pgm.483.1667492842447; Thu, 03 Nov 2022 09:27:22 -0700 (PDT) MIME-Version: 1.0 References: <20220915142913.2213336-1-chao.p.peng@linux.intel.com> <20220915142913.2213336-2-chao.p.peng@linux.intel.com> <20221021134711.GA3607894@chaop.bj.intel.com> <20221024145928.66uehsokp7bpa2st@box.shutemov.name> In-Reply-To: <20221024145928.66uehsokp7bpa2st@box.shutemov.name> From: Vishal Annapurve Date: Thu, 3 Nov 2022 21:57:11 +0530 Message-ID: Subject: Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd To: "Kirill A . Shutemov" Cc: Sean Christopherson , Chao Peng , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, qemu-devel@nongnu.org, Paolo Bonzini , Jonathan Corbet , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Shuah Khan , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Yu Zhang , luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com, aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, Quentin Perret , Michael Roth , mhocko@suse.com, Muchun Song , wei.w.wang@intel.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 24, 2022 at 8:30 PM Kirill A . Shutemov wrote: > > On Fri, Oct 21, 2022 at 04:18:14PM +0000, Sean Christopherson wrote: > > On Fri, Oct 21, 2022, Chao Peng wrote: > > > > > > > > In the context of userspace inaccessible memfd, what would be a > > > > suggested way to enforce NUMA memory policy for physical memory > > > > allocation? mbind[1] won't work here in absence of virtual address > > > > range. > > > > > > How about set_mempolicy(): > > > https://www.man7.org/linux/man-pages/man2/set_mempolicy.2.html > > > > Andy Lutomirski brought this up in an off-list discussion way back when the whole > > private-fd thing was first being proposed. > > > > : The current Linux NUMA APIs (mbind, move_pages) work on virtual addresses. If > > : we want to support them for TDX private memory, we either need TDX private > > : memory to have an HVA or we need file-based equivalents. Arguably we should add > > : fmove_pages and fbind syscalls anyway, since the current API is quite awkward > > : even for tools like numactl. > > Yeah, we definitely have gaps in API wrt NUMA, but I don't think it be > addressed in the initial submission. > > BTW, it is not regression comparing to old KVM slots, if the memory is > backed by memfd or other file: > > MBIND(2) > The specified policy will be ignored for any MAP_SHARED mappings in the > specified memory range. Rather the pages will be allocated according to > the memory policy of the thread that caused the page to be allocated. > Again, this may not be the thread that called mbind(). > > It is not clear how to define fbind(2) semantics, considering that multiple > processes may compete for the same region of page cache. > > Should it be per-inode or per-fd? Or maybe per-range in inode/fd? > David's analysis on mempolicy with shmem seems to be right. set_policy on virtual address range does seem to change the shared policy for the inode irrespective of the mapping type. Maybe having a way to set numa policy per-range in the inode would be at par with what we can do today via mbind on virtual address ranges. > fmove_pages(2) should be relatively straight forward, since it is > best-effort and does not guarantee that the page will note be moved > somewhare else just after return from the syscall. > > -- > Kiryl Shutsemau / Kirill A. Shutemov