Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp3737265rwi; Wed, 12 Oct 2022 06:14:03 -0700 (PDT) X-Google-Smtp-Source: AMsMyM53ZvfaL5X31/ihCyz6d6vJv9IilN0WhazDU3IvfXTmPbxrLuuWwKNyQGdYaF9yrg9ppLYa X-Received: by 2002:a17:906:5dcc:b0:78d:e77d:e66f with SMTP id p12-20020a1709065dcc00b0078de77de66fmr5214196ejv.102.1665580443309; Wed, 12 Oct 2022 06:14:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665580443; cv=none; d=google.com; s=arc-20160816; b=FI/791zOLPsCG9vE45iK/1TrV/oEtsz9cy7NQKjoabONGKr164s6NddmCAE33qU2sV HwxjVBmH/vmtnYz0oatFbhPEImMFk99/zcAFYM2gD03AwGlS34AQRMeaiNS27N5GZU9K Tt2+KqU5KtvRGHVLgoMfG0/IfwAIkWtwo3d9p6eb4wSHdGJH+UGEdQKFJiazQ75tPNqv HecIFMaE0CkXAC1frfzhqa8T8AOovGUUChasuPthMmyN0yahcuUG+fxamuo6yum8qQSO /QHSnT7voz87Z1tcWLbTB3G/psJTDPZ6zhuvV4vTxap2/USrMiFJSy62rYJ1eIYnxykI JqIA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=noRw341ciNnXF2Fe3Zu8GlyFJvJlbriboL+oH8utWA8=; b=L28jxzaZcsoF70ct6DlKC8FDqJv8oPOFR/UbeIUUfrmi4x3UDCzeIez7r5RfnsSXt2 ctLVIgEAUihFPwq+qWVI2xmuQNF4V9Ty9BQsjQXXwGS2jUv8+PMg5XF+q1LhiFdhJPWO GYYJKFm1r4cEYV2pf0jdC1Mg5IT7DnAWvPsC/W2sn7rt9HckgUZtvbA2aaqa6LYClrpB BJlVlByOx4OjDMcyjBCupH03Ow0WwkBX7maNz+6ZrPHHTxvxUTuTPbz9BYIPCROOK1W+ arlaxy4ADVSTCZK08Ek0nUg4HIfRkRLoNRXJdrsMgKwEM3KKZvXHzLUObklcVTJVn2f3 sr/Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=Hpqv7y7d; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id di19-20020a170906731300b0078c6abf19bfsi6114193ejc.948.2022.10.12.06.13.30; Wed, 12 Oct 2022 06:14:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=Hpqv7y7d; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229961AbiJLNH4 (ORCPT + 99 others); Wed, 12 Oct 2022 09:07:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43240 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229808AbiJLNHx (ORCPT ); Wed, 12 Oct 2022 09:07:53 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4B4DECAE4B; Wed, 12 Oct 2022 06:07:50 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id ECC571F381; Wed, 12 Oct 2022 13:07:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1665580068; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=noRw341ciNnXF2Fe3Zu8GlyFJvJlbriboL+oH8utWA8=; b=Hpqv7y7duLf4qQYyRbwbcD9yTwe3/BS2Ob/qn6ah/cL/4k5nGbsLLVJaNvgdka6iCoKHEz HhTrfROTTyzpNtb/vSORr+hLkzWRMgMVxtM+60yAkiUTeqLzLUeYmuElHA14k9rN/neRMG yr7g4hiFxVLMJuR1KQxOvS0QXM9Om+I= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id A2C5913ACD; Wed, 12 Oct 2022 13:07:48 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id WS+OJSS8RmM7cAAAMHmgww (envelope-from ); Wed, 12 Oct 2022 13:07:48 +0000 Date: Wed, 12 Oct 2022 15:07:47 +0200 From: Michal Hocko To: Vinicius Petrucci Cc: Frank van der Linden , Zhongkun He , corbet@lwn.net, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, wuyun.abel@bytedance.com Subject: Re: [RFC] mm: add new syscall pidfd_set_mempolicy() Message-ID: References: <20221010094842.4123037-1-hezhongkun.hzk@bytedance.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 12-10-22 07:34:06, Vinicius Petrucci wrote: > > Well, per address range operation is a completely different beast I > > would say. External tool would need to a) understand what that range is > > used for (e.g. stack/heap ranges, mmaped shared files like libraries or > > private mappings) and b) by in sync with memory layout modifications > > done by applications (e.g. that an mmap has been issued to back malloc > > request). Quite a lot of understanding about the specific process. I > > would say that with that intimate knowledge it is quite better to be > > part of the process and do those changes from within of the process > > itself. > > Sorry, this may be a digression, but just wanted to mention a > particular use case from a project I recently collaborated on (to > appear next month at IIWSC 2022: > http://www.iiswc.org/iiswc2022/index.html). > > We carried out a performance analysis of the latest Linux AutoNUMA > memory tiering on graph processing applications. We noticed that hot > pages cannot be properly identified by the reactive approach used by > AutoNUMA due to irregular/random memory access patterns. Yes, I can see how a reactive approach might not be the best fit. Automatic NUMA balancing can help quite a lot where memory regions are accessed consistently. I can imagine situations where the user space agent can tell much better what is the best node to place data when the access pattern is not obvbious or hard to deduce from local metrics. My main argument is though that those are rather specialized and it is much easier to implement the agent as a part of the process as they are unlikely to be generic enough to serve many different processes. I might be wrong in this of course and I am also not saying that pidfd_mbind is a completely unreasonable idea. We just need a strong usecase before going that way. > Thus, as a > POC, we implemented and evaluated a simple idea of having an external > user-level process/agent that, based on prior profiling results of > memory regions, could make more effectively memory chunk/object-based > mappings (instead of page-level allocation/migration) in advance on > either DRAM or CXL/PMEM (via mbind calls). This kind of tiering > solution could deliver up to 2x more performance for graph analytics > workloads. We plan to evaluate other workloads as well. > > Having a feature like "pidfd/process_mbind" would really simplify our > user-level agent implementation moving forward, as right now we are > adding a LD_PRELOAD wrapper (for signal handler) to listen and execute > "mbind" requests from another process. If there's any other > alternative solution to this already (via ptrace?), please let me > know. userfaultfd sounds like the closest match if #PF handling under control of an external agent is viable. -- Michal Hocko SUSE Labs