Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp1055377rwb; Tue, 27 Sep 2022 07:55:02 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6REH45mjL0Xn6SN+2O1UYqXtGiA8U8TKKT9kM2PpktZm6vUd9AUgGVHo/cMtcQOL3sApkD X-Received: by 2002:a17:907:2cd2:b0:770:8363:1f38 with SMTP id hg18-20020a1709072cd200b0077083631f38mr23108944ejc.381.1664290502128; Tue, 27 Sep 2022 07:55:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664290502; cv=none; d=google.com; s=arc-20160816; b=oSujulsAaDHUURtLyGEbOkri1HOo6so9xohuIvNhW9/ygTInIluCx6mH/rFrH+0wCH 6UfmgazMDx7jrdBcP/pgyacmU7Ubq2HdsMbu83Cy/DIuqkLvRGlft2CQkRt+cNc2s0j6 q41B64mFmv+0/DFq8cTlgiU4GTql+TG8r6zKYUobOmVx+ctsK+7P7lkDmpEs88liXAZb +2G/abVx/9iajPcL0mGR6o6TuaIbKvgrl9QZ3vnd97Xuy+7hcngRNeZbmfxr5wPqAabm ikGYhpCJuM6I3iFOg8ZuRLHxFsQdDPPuOzeJek4XK5dsV8LvK6WU92VFZI6dtOIBTXfi k4OQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=px0sFDh89qomZMV5WztXjGmCBWJe2Ndh6D4h5jyTjxQ=; b=mx//LZZKOaK5zH4sAY2fScryTOJOF9pwpCczzS2sAOR7+xUpbJBbMBmVq1hgd+v2xT yOOVPPS7Dh6j2u8iH4WerKgFjorRPkmrWmFj7udXVRNEGiUY34rKm9A3z/biQ0amETV3 U5XS6FeiNIqcXiurXJzp7Dst2YA7A44qZO6+Kvg3wp5xLPZI0yZb8SX3Kagx/tDvtTau 8UBD/JJu647jA+mk/8jyg6xkbTfS2EuAr2QDnkcWL0fAp8g0Ro3gHOiBMARutZwfCJTl zUk958UV76NRcVz9/zsTQLsb/EzxK0jfnyo0LKwMainJ9Z8/bgCz9N9IUytSFUvv+5vn 7UCA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=PFEU6f0B; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ga31-20020a1709070c1f00b0073d68aa36d2si2022030ejc.667.2022.09.27.07.54.33; Tue, 27 Sep 2022 07:55:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=PFEU6f0B; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232995AbiI0N7x (ORCPT + 99 others); Tue, 27 Sep 2022 09:59:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56934 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232985AbiI0N7T (ORCPT ); Tue, 27 Sep 2022 09:59:19 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 43BE21616F2; Tue, 27 Sep 2022 06:58:55 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 8B5361FCF5; Tue, 27 Sep 2022 13:58:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1664287133; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=px0sFDh89qomZMV5WztXjGmCBWJe2Ndh6D4h5jyTjxQ=; b=PFEU6f0BOxVvjmuPPfAHpthYreZsZAoyIuCck5vkKfdQ/eAKv+/8BQfGW+IIYWWtjUqDhD s/B+bZMvsPg2fuT99+VvTbHrqbTQLTImERjcW1HTVO8gtIbk8w2gwy9xhAB2UsrXICP4a4 Z7HqJd3fViX4h7UMTU0+oNVUO4F39sk= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 6A8DC139BE; Tue, 27 Sep 2022 13:58:53 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id LG+5F50BM2P2MwAAMHmgww (envelope-from ); Tue, 27 Sep 2022 13:58:53 +0000 Date: Tue, 27 Sep 2022 15:58:52 +0200 From: Michal Hocko To: Abel Wu Cc: Zhongkun He , corbet@lwn.net, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org Subject: Re: [External] Re: [RFC] proc: Add a new isolated /proc/pid/mempolicy type. Message-ID: References: <20220926091033.340-1-hezhongkun.hzk@bytedance.com> <24b20953-eca9-eef7-8e60-301080a17d2d@bytedance.com> <7ac9abce-4458-982b-6c04-f9569a78c0da@bytedance.com> <9a0130ce-6528-6652-5a8e-3612c5de2d96@bytedance.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9a0130ce-6528-6652-5a8e-3612c5de2d96@bytedance.com> X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,URIBL_SBL_A autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 27-09-22 21:07:02, Abel Wu wrote: > On 9/27/22 6:49 PM, Michal Hocko wrote: > > On Tue 27-09-22 11:20:54, Abel Wu wrote: > > [...] > > > > > Btw.in order to add per-thread-group mempolicy, is it possible to add > > > > > mempolicy in mm_struct? > > > > > > > > I dunno. This would make the mempolicy interface even more confusing. > > > > Per mm behavior makes a lot of sense but we already do have per-thread > > > > semantic so I would stick to it rather than introducing a new semantic. > > > > > > > > Why is this really important? > > > > > > We want soft control on memory footprint of background jobs by applying > > > NUMA preferences when necessary, so the impact on different NUMA nodes > > > can be managed to some extent. These NUMA preferences are given by the > > > control panel, and it might not be suitable to overwrite the tasks with > > > specific memory policies already (or vice versa). > > > > Maybe the answer is somehow implicit but I do not really see any > > argument for the per thread-group semantic here. In other words why a > > new interface has to cover more than the local [sg]et_mempolicy? > > I can see convenience as one potential argument. Also if there is a > > requirement to change the policy in atomic way then this would require a > > single syscall. > > Convenience is not our major concern. A well-tuned workload can have > specific memory policies for different tasks/vmas in one process, and > this can be achieved by set_mempolicy()/mbind() respectively. While > other workloads are not, they don't care where the memory residents, > so the impact they brought on the co-located workloads might vary in > different NUMA nodes. > > The control panel, which has a full knowledge of workload profiling, > may want to interfere the behavior of the non-mempolicied processes > by giving them NUMA preferences, to better serve the co-located jobs. > > So in this scenario, a process's memory policy can be assigned by two > objects dynamically: > > a) the process itself, through set_mempolicy()/mbind() > b) the control panel, but API is not available right now > > Considering the two policies should not fight each other, it sounds > reasonable to introduce a new syscall to assign memory policy to a > process through struct mm_struct. So you want to allow restoring the original local policy if the external one is disabled? Anyway, pidfd_$FOO behavior should be semantically very similar to the original $FOO. Moving from per-task to per-mm is a major shift in the semantic. I can imagine to have a dedicated flag for the syscall to enfore the policy to the full thread group. But having a different semantic is both tricky and also constrained because per-thread binding is then impossible. -- Michal Hocko SUSE Labs