Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp1846435rwb; Tue, 27 Sep 2022 20:28:09 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5cux02y4cDVKij6Y6X2GmALSTWT3ww1fiz5Dy0yca1SOEYbnXcDLhhPoER/QX0RNFfFknS X-Received: by 2002:a17:90b:4a43:b0:202:8d29:c172 with SMTP id lb3-20020a17090b4a4300b002028d29c172mr7930037pjb.177.1664335689284; Tue, 27 Sep 2022 20:28:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664335689; cv=none; d=google.com; s=arc-20160816; b=QGHXw8szlz9RUuDRZr2voMcqI09apIzkdgK06dTQC1wgS1LJsb6lvBMtkQqXEh5lvV dC0CNIXWzw8UVUfdwsYXngcM1WEWknUUNeTruLf5xRSPZrIO+2GAeLPAmoUsDnnY6SYP XN3C8iGKYSQbGLx7kvM+AXM8otz21VqAXjyn6jgJdZND0Iz2SaUsTCj6szOZj+1QzJui g4qXID+/CHSTNnUAU/M/3JQxxEJbLK24vx6SafB0p8uLpxZLiGuWCREh7K+xx3VlTsoY KUMUfX1Nz+6GZZi418+WobGt+oyJhA8zw9xyN47XRR0lyu3ZjdtwYYJd5hZXHM7WCWo/ aFCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=zE7FRbXVPXMEH0b3SCAktRXR5ucy9Cwb3p8sNyPms6k=; b=N32VJeMnve7swwPJb+R95iBCWufetr0Fw6qjJL9w1XSTDlQ7g09qSN9MMbzUsMtvEI CRiGYbYQxHAwpIkwv3XkV40UADtiDmTxVGtKXe9n3PCQ1YaEoxuskTDn0SDc7oQhmZmo wWS3J5ij5iMHf7h5bzxP7Inyln9HawvEOe6v5Q2QtER7Zx5Yeo2dAZsBXUupwbJHWcrm pMo1bLyuXFQ5Oign8X5/0YtTGmiS2ci3f83jzo7/YXjgThwBHmNGvjI9r3iKUsTBtZNV zucIBMZdbV++ALALoTlWHrjh09wrzbr7W9XMsoazr4aoMAfk41APEcTWffjd2/HYxibZ /lAQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=7Gk9byhc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h27-20020a63531b000000b0041d67aea419si3899194pgb.149.2022.09.27.20.27.56; Tue, 27 Sep 2022 20:28:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=7Gk9byhc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231854AbiI1DKC (ORCPT + 99 others); Tue, 27 Sep 2022 23:10:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57626 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229838AbiI1DJ6 (ORCPT ); Tue, 27 Sep 2022 23:09:58 -0400 Received: from mail-pj1-x1033.google.com (mail-pj1-x1033.google.com [IPv6:2607:f8b0:4864:20::1033]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4F2FF7B1E1 for ; Tue, 27 Sep 2022 20:09:55 -0700 (PDT) Received: by mail-pj1-x1033.google.com with SMTP id q15-20020a17090a304f00b002002ac83485so552151pjl.0 for ; Tue, 27 Sep 2022 20:09:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date; bh=zE7FRbXVPXMEH0b3SCAktRXR5ucy9Cwb3p8sNyPms6k=; b=7Gk9byhcd7iaiCUxW9fPwqLZ1MFtkv5Clj4uvCPSsGM1y7EO3tQ93QukOy/hGkhfdm AgOrIMM2SjaloUB63geELSqt32Fn6CqeXF8Y3LKmzq4dRYidmX+Q/v5DyJkCykX8Kepd XQWR8UO3Cb/E1TUXfavDlTFRpCjjMXXqpBBBPDLAKHbXPWmweIaot6zqT0zTfA4UCgbH 72/lXNOoaDhFmTLJqvmv4wL1nsD0rEW8cJCvRXTEgksSXk+S88KUM/AkCEVGYnRI8ss0 T1AFP5cJ5pPcs3PKOu4Rv6JtvSgvyIq2UIiha6M3NczrbtCi42FiC+0X7Yzf5B5sXkJx yTfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date; bh=zE7FRbXVPXMEH0b3SCAktRXR5ucy9Cwb3p8sNyPms6k=; b=4LLf1zX7qy4kDs7SA8yF+3g4SeM/nJJ8Vf1vtXIbmYoi+UgJNAsbsbg4vmpF80B8G1 F9nnp8dJeVbOWo2xflR9B9jaHkP2XHOyhbn3TGc/v5VM65FnGMxgW5i2aEAQin5lVF9O +A1O4D1HmcNBlOT9dnLKKXDouiYL+CT/Io4GrTfh0AUcPfzjnztAgPI0nIHJMEq+gzY1 bdGHY5F5VrTdsZbBSU8HUz3u1yFFJ469GKyHrSZAxtuGrbeqssokIevR5+3BUJWhSTqI OUihVHyfwN4DE/lpIAUuGALvKod/8tJ+r2bUm+our5gHi/aWIhs4bV4iKnLnStZXqjfE 1MTg== X-Gm-Message-State: ACrzQf3uQTP1b7N9vCuMfKhRTibP9vPToN7lQFsa8oiBrL4iyUOUM51U S4QR+2xxtWWZlqp57zqR4FtUDw== X-Received: by 2002:a17:902:d50f:b0:178:6505:fae3 with SMTP id b15-20020a170902d50f00b001786505fae3mr30442831plg.54.1664334592382; Tue, 27 Sep 2022 20:09:52 -0700 (PDT) Received: from [10.255.19.83] ([139.177.225.240]) by smtp.gmail.com with ESMTPSA id lt4-20020a17090b354400b00205e940614dsm274076pjb.34.2022.09.27.20.09.49 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 27 Sep 2022 20:09:51 -0700 (PDT) Message-ID: <4e2aa5c2-3d8c-2a2f-691b-218e23e7271f@bytedance.com> Date: Wed, 28 Sep 2022 11:09:47 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.3.0 Subject: Re: [RFC] proc: Add a new isolated /proc/pid/mempolicy type. Content-Language: en-US To: Michal Hocko Cc: Zhongkun He , corbet@lwn.net, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org References: <20220926091033.340-1-hezhongkun.hzk@bytedance.com> <24b20953-eca9-eef7-8e60-301080a17d2d@bytedance.com> <7ac9abce-4458-982b-6c04-f9569a78c0da@bytedance.com> <9a0130ce-6528-6652-5a8e-3612c5de2d96@bytedance.com> From: Abel Wu In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, URIBL_SBL_A autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 9/27/22 9:58 PM, Michal Hocko wrote: > On Tue 27-09-22 21:07:02, Abel Wu wrote: >> On 9/27/22 6:49 PM, Michal Hocko wrote: >>> On Tue 27-09-22 11:20:54, Abel Wu wrote: >>> [...] >>>>>> Btw.in order to add per-thread-group mempolicy, is it possible to add >>>>>> mempolicy in mm_struct? >>>>> >>>>> I dunno. This would make the mempolicy interface even more confusing. >>>>> Per mm behavior makes a lot of sense but we already do have per-thread >>>>> semantic so I would stick to it rather than introducing a new semantic. >>>>> >>>>> Why is this really important? >>>> >>>> We want soft control on memory footprint of background jobs by applying >>>> NUMA preferences when necessary, so the impact on different NUMA nodes >>>> can be managed to some extent. These NUMA preferences are given by the >>>> control panel, and it might not be suitable to overwrite the tasks with >>>> specific memory policies already (or vice versa). >>> >>> Maybe the answer is somehow implicit but I do not really see any >>> argument for the per thread-group semantic here. In other words why a >>> new interface has to cover more than the local [sg]et_mempolicy? >>> I can see convenience as one potential argument. Also if there is a >>> requirement to change the policy in atomic way then this would require a >>> single syscall. >> >> Convenience is not our major concern. A well-tuned workload can have >> specific memory policies for different tasks/vmas in one process, and >> this can be achieved by set_mempolicy()/mbind() respectively. While >> other workloads are not, they don't care where the memory residents, >> so the impact they brought on the co-located workloads might vary in >> different NUMA nodes. >> >> The control panel, which has a full knowledge of workload profiling, >> may want to interfere the behavior of the non-mempolicied processes >> by giving them NUMA preferences, to better serve the co-located jobs. >> >> So in this scenario, a process's memory policy can be assigned by two >> objects dynamically: >> >> a) the process itself, through set_mempolicy()/mbind() >> b) the control panel, but API is not available right now >> >> Considering the two policies should not fight each other, it sounds >> reasonable to introduce a new syscall to assign memory policy to a >> process through struct mm_struct. > > So you want to allow restoring the original local policy if the external > one is disabled? Pretty much, but the internal policies are expected to have precedence over the external ones, since they are set for some reason to meet their specific requirements. The external ones are used only when there is no internal policy active. > > Anyway, pidfd_$FOO behavior should be semantically very similar to the > original $FOO. Moving from per-task to per-mm is a major shift in the > semantic. I can imagine to have a dedicated flag for the syscall to > enforce the policy to the full thread group. But having a different > semantic is both tricky and also constrained because per-thread binding > is then impossible. Agreed. What about a syscall only apply to per-mm? There are precedents like process_madvice(2).