Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp3173528rwi; Tue, 11 Oct 2022 20:25:06 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6ldaru+LlO091JQcYI+9NRCIpQwzmxDalft1IIVSmyAHrZ0wss5eVz5YrKmqYI059KLt5t X-Received: by 2002:a05:6402:ea8:b0:456:d188:b347 with SMTP id h40-20020a0564020ea800b00456d188b347mr25690604eda.15.1665545105828; Tue, 11 Oct 2022 20:25:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665545105; cv=none; d=google.com; s=arc-20160816; b=IxQPhzoGS9muRXyx9yZPX4ccUMfFM0PQyajWcg9ZRRP0oJhkmF1eX9AcDGxcCuTkny 7jqTokdCVKRjJMp7XAbasqHfd2O2THW2bhsTlnLu0pKkrOP/jXfWH3nd3Z6OKt8Xc/JI sZvjz4J7QdpTGws5/+fRtz5fSKxrjeTcFtIeYIrkNYJ8uhkPoH7okSB1u1aQRtHAX51h NDDuq+XGzsfUn4Sx6/OdryQOBQma+WiKC+cZdeMNFn4/WO8UKswdDQKgTgkcGwbE2K4A en6rKPwrLwivu5MbeDhqbS3J70xfSsGcCakWAyYP8czdy66ZwqyZa1Fqvk2FDFAGkKUp Yo8w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :content-language:references:cc:to:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=6f3RnKO5dXYr5Fm4sR2uAmWCpZHFDyj5bmlZqC9hdVs=; b=IDc9HAWLLRuncrt8NNsl8tEIF9RtMboxR39Y5ZXX7FHa0jf77ihKKivuRvS//qS1nO 6MiGUp+rgR2Qc4pWejQX+Rkt5wAboPQ39zYd4nB3yW2aL/8w/wyGFIXnHnK2/SRRiX7A vUwlcfYRJ4W3609ZKAHpLi4kWJbvdXISNPK+6E9Rj6a13HLRUNcyve2DO+2bxtuhytgU aarvvj4TY+jpHNcROfcvylIZ1Tj/z04f7JzUL4lcOWKgp3O8jWJkBg1AMhVx+2p5dY6V fwG98pUdDvFrK4IQiyEeiZ5KxOrZA77MsdDRDitLJVlk5kq22a1F14elEd1MIAws0nEf dA6g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=1UcTlWrc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z44-20020a509e2f000000b00458e689f41fsi12880874ede.415.2022.10.11.20.24.39; Tue, 11 Oct 2022 20:25:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=1UcTlWrc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229487AbiJLDO4 (ORCPT + 99 others); Tue, 11 Oct 2022 23:14:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52642 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229480AbiJLDOw (ORCPT ); Tue, 11 Oct 2022 23:14:52 -0400 Received: from mail-pl1-x636.google.com (mail-pl1-x636.google.com [IPv6:2607:f8b0:4864:20::636]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 69271E27 for ; Tue, 11 Oct 2022 20:14:44 -0700 (PDT) Received: by mail-pl1-x636.google.com with SMTP id c24so15064792plo.3 for ; Tue, 11 Oct 2022 20:14:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=6f3RnKO5dXYr5Fm4sR2uAmWCpZHFDyj5bmlZqC9hdVs=; b=1UcTlWrc6PSvrj+S6k3xXlBpu6epdcoK1T7EnXL2mSJtMkz2ph8LUDdFRJiXvFXsuw 53ON6YqAXI4WiabJg0r/y8v28eeOyL10YBNNHBj4OpVOhMgGadlAk3WZcaAnD7AbEZhb oOij2iKjg1JQsJmZa2/OnuKmNdc496OENtchf8BCoC2cxyyYoWFTKqPXzBWGcT3Rv7Ob gJnPZefsUHtY7owDvsUdD0HGJV0Hnk/Anj84TCNAxKT8bCUaOTubXPZQOurRSZKQEEcR CagBT7luAW1xJAiSKvQQSrfR/7WpBWYDGwj58yxvoSbycAUkNucKYVfTRnIIt6r7RmKv /Gsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6f3RnKO5dXYr5Fm4sR2uAmWCpZHFDyj5bmlZqC9hdVs=; b=FXvl5U2iqseNJ03c5gKT4arF66zGDaoXAxvLGceMTpMAAX/OM958fTJNeCBTKeiYEI zdpJXc4lIYrz3Y1FPUnJsg2wLyR8rlG9lVRTlVOBOJ21Hw8COrrYMgnq1nr77dw32de2 uF40u/j1A1sy+1RnjBfRcGqegp5G6yycF2MN0nuSr+Da0M9BpOURAofLhUsLps3DqhFW m0XtgRm7+BSKrOkxR7dYSQmkMaX0c/ull64MhL3HMb7KMxu8PF8cYIS51JXBYYf3doZY 3sHs9HGr+XOPIkDfjUK/eE8jit5m8PaBQYUvz+1m77MqLDq45yepz8Qr/MHeLgmHmrQd Typg== X-Gm-Message-State: ACrzQf1TF+Z/sLvCKrjyD/15up5eJsC0rQNCMi+gu2/Y3Z8XgfHB324J EIZovpjMdLFzeaF5G+IGkGXqFQ== X-Received: by 2002:a17:90b:4b83:b0:20c:5ac8:1e30 with SMTP id lr3-20020a17090b4b8300b0020c5ac81e30mr2683317pjb.71.1665544483672; Tue, 11 Oct 2022 20:14:43 -0700 (PDT) Received: from [10.255.14.157] ([139.177.225.244]) by smtp.gmail.com with ESMTPSA id w124-20020a626282000000b00562784609fbsm9594961pfb.209.2022.10.11.20.14.40 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 11 Oct 2022 20:14:43 -0700 (PDT) Message-ID: <85145c75-f2f6-a393-daa2-967251cc3443@bytedance.com> Date: Wed, 12 Oct 2022 11:14:37 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.3.2 Subject: Re: [RFC] mm: add new syscall pidfd_set_mempolicy() To: Michal Hocko , Frank van der Linden Cc: Zhongkun He , corbet@lwn.net, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org References: <20221010094842.4123037-1-hezhongkun.hzk@bytedance.com> Content-Language: en-US From: Abel Wu In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/12/22 3:29 AM, Michal Hocko wrote: > On Tue 11-10-22 10:22:23, Frank van der Linden wrote: >> On Tue, Oct 11, 2022 at 8:00 AM Michal Hocko wrote: >>> >>> On Mon 10-10-22 09:22:13, Frank van der Linden wrote: >>>> For consistency with process_madvise(), I would suggest calling it >>>> process_set_mempolicy. >>> >>> This operation has per-thread rather than per-process semantic so I do >>> not think your proposed naming is better. >> >> True. I suppose you could argue that it should have been >> pidfd_madvise() then for consistency, but that ship has sailed. > > madvise commands have per mm semantic. It is set_mempolicy which is > kinda special and it allows to have per task_struct semantic even when > the actual allocation is in the same address space. To be honest I am > not really sure why that is this way because threads aim to share memory > so why should they have different memory policies? > > I suspect that the original thinking was that some portions that are > private to the process want to have different affinities (e.g. stacks > and dedicated per cpu heap arenas). E.g. worker pools which want to be > per-cpu local with their own allocations but operate on shared data that > requires different policies. > >>>> Other than that, this makes sense. To complete >>>> the set, perhaps a process_mbind() should be added as well. What do >>>> you think? >>> >>> Is there any real usecase for this interface? How is the caller supposed >>> to make per-range decisions without a very involved coordination with >>> the target process? >> >> The use case for a potential pidfd_mbind() is basically a combination >> of what is described for in the process_madvise proposal ( >> https://lore.kernel.org/lkml/20200901000633.1920247-1-minchan@kernel.org/ >> ), and what this proposal describes: system management software acting >> as an orchestrator that has a better overview of the system as a whole >> (NUMA nodes, memory tiering), and has knowledge of the layout of the >> processes involved. This is exactly why we are proposing pidfd/process_set_mempolicy(). > > Well, per address range operation is a completely different beast I > would say. External tool would need to a) understand what that range is > used for (e.g. stack/heap ranges, mmaped shared files like libraries or > private mappings) and b) by in sync with memory layout modifications > done by applications (e.g. that an mmap has been issued to back malloc > request). Quite a lot of understanding about the specific process. I > would say that with that intimate knowledge it is quite better to be > part of the process and do those changes from within of the process > itself. Agreed, the orchestrator like system management software may not have enough knowledge about per address range. And I also don't think it is appropriate for orchestrators to overwrite tasks' mempolicy as well, they are set for some purpose by the apps themselves. So I suggested a per-mm policy which have a lower priority than the tasks'. Thanks & BR, Abel