Received: by 2002:a5d:9c59:0:0:0:0:0 with SMTP id 25csp96958iof; Sun, 5 Jun 2022 22:15:55 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyq3yyof4wMIWSsiIU+ZEBmILUzq+RdfATyptzg61G4PGzbN20EEOfH0zBKTJxPy8yGKEs7 X-Received: by 2002:a17:903:32d2:b0:166:3747:8465 with SMTP id i18-20020a17090332d200b0016637478465mr22266141plr.143.1654492555135; Sun, 05 Jun 2022 22:15:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1654492555; cv=none; d=google.com; s=arc-20160816; b=i05R0wPd1E3erYcJxRQUkhZUsduanZgPSrxXUHKc3eZ4ZGmIh7Gf5OdHm4PvZxVmSA GjZtETBNBReyn6lE3efDJ39e+ZGTUT0XMKEaP+whEEnk+tyisfW0gkWmkl8+i7vH/ppq odog8hCJRNcedlAEUJYRR/LBZQscr6R7bfvL8LguzYQnyJRMHX0odeWcXQWjkCfT4kJx v+zCXdG5dZHsFVoO2d+zwDwGEGLILrpeWuuEl8JgqgGE9MFyGdnuLMwHhrJW1RvJ7Rn0 m1/qgUXKjNfVm5hUhc0I53424fLIvA8xvBrBUO62J2QlFP3XnUcLdKZpUwi5t3NjHbFZ PWxg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id:dkim-signature; bh=dJFr4yBVvh60m9zvIBHtq5DtjaUYpI7o4e4IpSIodLE=; b=fsGtfDohRccFn2AKnq88qNwBiOH6yyvsaslu4lhjUsCfhYQq8PY3MWpEI6i057jxZN 75zMi4QAULgG9eHidyOFfaDX+f3pNlajwhEw3bp4rz5dTS+KZiGn1Dnkmq7e10s1oJr5 umVvZe8fbN/KNGtkmyW2uRrj7vyda+JF24IoVryZ1IRnIQd8eeVcJy7qDT1PtAZdzErd QCTuWFAmjCUcbM7rgHd8IQChCa/IjdmVLspTIQ/HQnJzR7YIam4WAc5HuVIYVBueU5kZ Bth6zlmHMhrfdXWf+wuUX2bzPjcNN9GSyJh6Pg5688uEfpKoxEmgfzXqyB5GIXqn4510 sLHQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="M/i3Dg3l"; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id y12-20020a056a00190c00b0051c188efe22si3580801pfi.119.2022.06.05.22.15.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 05 Jun 2022 22:15:55 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="M/i3Dg3l"; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 7A95116F342; Sun, 5 Jun 2022 21:21:37 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351884AbiFFAoN (ORCPT + 99 others); Sun, 5 Jun 2022 20:44:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48288 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351840AbiFFAoM (ORCPT ); Sun, 5 Jun 2022 20:44:12 -0400 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 777BE25286 for ; Sun, 5 Jun 2022 17:44:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1654476251; x=1686012251; h=message-id:subject:from:to:cc:date:in-reply-to: references:mime-version:content-transfer-encoding; bh=v72JwO4ZEs4E4qmYCJBQG3/w1N1+brvvHURcuUQL7aA=; b=M/i3Dg3lbIX4IkcvfoZP+hOKK5cymsfQ362/QdR7n0jslm1nz1PJDQNx jZAoUk34cWgtDepGoKoSZ8dKqnBt+tRJIMzGynZMWubRAh2NUUlWGECcd c8h/tVSOKLIR/oNvWEWr+bim3gwWG95nny8Q8sc8H4OALhReG1NR7560n LZU7eCGmncmTISvyux1xMYkvTArn22/F1BrF0mGXDBJP0G7jntOBYXndW rDiYpZCwIF2+9fVka6mVXsFjOHvrwZg0cTw2HNi1fmrViqxEcK2aTSYeE j653t06GVI0hnaeF15lpb7MkMH5NfKpUoPmJxdR55Euq0rdewMrMS3WrC g==; X-IronPort-AV: E=McAfee;i="6400,9594,10369"; a="339729384" X-IronPort-AV: E=Sophos;i="5.91,280,1647327600"; d="scan'208";a="339729384" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Jun 2022 17:44:11 -0700 X-IronPort-AV: E=Sophos;i="5.91,280,1647327600"; d="scan'208";a="583354699" Received: from xingguom-mobl.ccr.corp.intel.com ([10.254.213.116]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Jun 2022 17:44:06 -0700 Message-ID: <9f6e60cc8be3cbde4871458c612c5c31d2a9e056.camel@intel.com> Subject: Re: [RFC PATCH v4 7/7] mm/demotion: Demote pages according to allocation fallback order From: Ying Huang To: Aneesh Kumar K V , linux-mm@kvack.org, akpm@linux-foundation.org Cc: Greg Thelen , Yang Shi , Davidlohr Bueso , Tim C Chen , Brice Goglin , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , Feng Tang , Jagdish Gediya , Baolin Wang , David Rientjes Date: Mon, 06 Jun 2022 08:43:44 +0800 In-Reply-To: <046c373a-f30b-091d-47a1-e28bfb7e9394@linux.ibm.com> References: <20220527122528.129445-1-aneesh.kumar@linux.ibm.com> <20220527122528.129445-8-aneesh.kumar@linux.ibm.com> <046c373a-f30b-091d-47a1-e28bfb7e9394@linux.ibm.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.38.3-1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 2022-06-03 at 20:39 +0530, Aneesh Kumar K V wrote: > On 6/2/22 1:05 PM, Ying Huang wrote: > > On Fri, 2022-05-27 at 17:55 +0530, Aneesh Kumar K.V wrote: > > > From: Jagdish Gediya > > > > > > currently, a higher tier node can only be demoted to selected > > > nodes on the next lower tier as defined by the demotion path, > > > not any other node from any lower tier. This strict, hard-coded > > > demotion order does not work in all use cases (e.g. some use cases > > > may want to allow cross-socket demotion to another node in the same > > > demotion tier as a fallback when the preferred demotion node is out > > > of space). This demotion order is also inconsistent with the page > > > allocation fallback order when all the nodes in a higher tier are > > > out of space: The page allocation can fall back to any node from any > > > lower tier, whereas the demotion order doesn't allow that currently. > > > > > > This patch adds support to get all the allowed demotion targets mask > > > for node, also demote_page_list() function is modified to utilize this > > > allowed node mask by filling it in migration_target_control structure > > > before passing it to migrate_pages(). > > > > ... > > > >    * Take pages on @demote_list and attempt to demote them to > > >    * another node. Pages which are not demoted are left on > > > @@ -1481,6 +1464,19 @@ static unsigned int demote_page_list(struct list_head *demote_pages, > > >   { > > >    int target_nid = next_demotion_node(pgdat->node_id); > > >    unsigned int nr_succeeded; > > > + nodemask_t allowed_mask; > > > + > > > + struct migration_target_control mtc = { > > > + /* > > > + * Allocate from 'node', or fail quickly and quietly. > > > + * When this happens, 'page' will likely just be discarded > > > + * instead of migrated. > > > + */ > > > + .gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) | __GFP_NOWARN | > > > + __GFP_NOMEMALLOC | GFP_NOWAIT, > > > + .nid = target_nid, > > > + .nmask = &allowed_mask > > > + }; > > > > IMHO, we should try to allocate from preferred node firstly (which will > > kick kswapd of the preferred node if necessary). If failed, we will > > fallback to all allowed node. > > > > As we discussed as follows, > > > > https://lore.kernel.org/lkml/69f2d063a15f8c4afb4688af7b7890f32af55391.camel@intel.com/ > > > > That is, something like below, > > > > static struct page *alloc_demote_page(struct page *page, unsigned long node) > > { > > struct page *page; > > nodemask_t allowed_mask; > > struct migration_target_control mtc = { > > /* > > * Allocate from 'node', or fail quickly and quietly. > > * When this happens, 'page' will likely just be discarded > > * instead of migrated. > > */ > > .gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) | > > __GFP_THISNODE | __GFP_NOWARN | > > __GFP_NOMEMALLOC | GFP_NOWAIT, > > .nid = node > > }; > > > > page = alloc_migration_target(page, (unsigned long)&mtc); > > if (page) > > return page; > > > > mtc.gfp_mask &= ~__GFP_THISNODE; > > mtc.nmask = &allowed_mask; > > > > return alloc_migration_target(page, (unsigned long)&mtc); > > } > > I skipped doing this in v5 because I was not sure this is really what we > want. I think so. And this is the original behavior. We should keep the original behavior as much as possible, then make changes if necessary. > I guess we can do this as part of the change that is going to > introduce the usage of memory policy for the allocation? Like the memory allocation policy, the default policy should be local preferred. We shouldn't force users to use explicit memory policy for that. And the added code isn't complex. Best Regards, Huang, Ying