Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp20007pxb; Tue, 14 Sep 2021 17:32:30 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz7SL9qpNjYhr3taNOCRMqbG8J1f6RxGEC+UFyc46FrRgLPKviWriXIIUW9Xrp/39v0UqcM X-Received: by 2002:ac2:4116:: with SMTP id b22mr15276533lfi.587.1631665950341; Tue, 14 Sep 2021 17:32:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1631665950; cv=none; d=google.com; s=arc-20160816; b=QoqKB1dKnzjD7RyKuQAl/WeJpq9yDcECW9AxeVkuqHKCdJ8X+ia5i4AwRRBZLfFPwA vYU+vPuO5OuNQZmik4lArkxbvDAUZOYkZnEN7kgldalXKLCr6ox22gZETutwXMujCa3B NU2Vr5tTHjPonvoUeKChhB8qmFXRYVTibTMsk1WIE1+MDWb5nEVDdZV7U4cYrP7uVMtd 02dpnusGD4hDxSZZk5+5W8r0ss3Jnv5Sj+VGptJ1+5/nQ2lbfABhJ5xFDy0Xg6UcKfqN +f3n3zfRQ5D/Cbf2+f333QUecCfjJRypl6ifQdfGG0+Qc/zg8IlqPR0QMqZUWLhHkyR7 f0dw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:message-id:in-reply-to :subject:cc:to:from:date:dkim-signature; bh=rKPPAqWinMrhwkk8dufmKi/m/ykeo+r7lvr15eegekI=; b=syoboZeoDZXW9rW9Pq53O1Coeh1ttnp2xvHto/FDJlJ/6o4Qr5Fq7nQtPpoXqulHqV D6xEUP+fYB+n3q4gbnrQ2QgcOw3R1861m/zOcyULz5Dl1xyJJjPVQ0hLEvaxXF2nI4EJ WcT66hV3eqb9oyIUnsrdBl3BNP/KCFYdJRdhYkB4eQWcJIe5RKU7shcSTn62P0ch8YZ8 fLAEblua7TpaxdNAdSfpFXJukC6ejjExk9CLZ8M4ugHIFu0thWv2EY1PtNKy78arrdES 6DXsWzLDn0eWzAJtzNVWix53DKDUtDJQ1Teg5sx8NyjBDN2J6HIS8+1IF5Zf9lya6gXS JbBw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=W2174p+P; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h25si5431201lfg.1.2021.09.14.17.32.03; Tue, 14 Sep 2021 17:32:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=W2174p+P; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229991AbhIOAbY (ORCPT + 99 others); Tue, 14 Sep 2021 20:31:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38006 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236080AbhIOAbX (ORCPT ); Tue, 14 Sep 2021 20:31:23 -0400 Received: from mail-pg1-x532.google.com (mail-pg1-x532.google.com [IPv6:2607:f8b0:4864:20::532]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9BA8AC061574 for ; Tue, 14 Sep 2021 17:30:05 -0700 (PDT) Received: by mail-pg1-x532.google.com with SMTP id q68so927615pga.9 for ; Tue, 14 Sep 2021 17:30:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=rKPPAqWinMrhwkk8dufmKi/m/ykeo+r7lvr15eegekI=; b=W2174p+PqjMxAD9mRnHIDPyb0Hd47zmv7xSvtB0BplA6tW99nXaMcj6zZMFxVYKqr/ aiWbzUiEnk7Yv0hrWrGe6Iv+pWtGsAhgHWbQUJznuTtE/8voNYohxNO8yD8ldhvedz5+ X7S7j0KC/4hn3xr7HmGB5e8QTvtfLe4BXudToAyDuruq6R+cFEkf1jZ8jGyKsXEdaPT9 mo3QPYhGJVCYsYVyh/dFAHL6eb5zq19RzKbCWkcUZKlt2o4UZ+vx2zsfwGGh+NFSIWQF CmCwHYqugGqzlhbWTeEwAaZfQ0rNWF4ZoEs9xk2J5qFerGA79vTYmcIW1/Z49zvr+fDH CjyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=rKPPAqWinMrhwkk8dufmKi/m/ykeo+r7lvr15eegekI=; b=GYpwWU1ahHDnToyNqIIBgoMQEQd5P8H36IJXf93lSY5pk4rXEPzgR9wVlHHYO2Bwx0 nDfZEwlJn1mahPGV/1BwiecTW8/F87bLcTjV37jrOXh31cpQ8qaOJGH5ejGHglMvvton xg+l+ywOt6NOq6eB286yr1xChL0MSB3JQb5feR70jY62ag3LJw1qDiLDQi+3W89+K5BW XYShJy8Nx/RoeGGESqLCHU2msMEhjlJZtOnmwX3YkmmmUg8JXPA2C6mjnFljSnw2odpX QY2YQg1KpA4ymxLGq9g1asSY3LXOB3iGs/OMHVhaxZamEKVbQ14CmbEFQSrMQlQyC+o/ FiUQ== X-Gm-Message-State: AOAM53112mxBeS9DTjJyyoAO+7SKu696hVTLAXBdRm4xSTaFipWpuAIc lX5TLZukcarQGUpuzY5RHAHT5A== X-Received: by 2002:a05:6a00:2283:b0:43d:ef23:48e5 with SMTP id f3-20020a056a00228300b0043def2348e5mr7522479pfe.18.1631665804664; Tue, 14 Sep 2021 17:30:04 -0700 (PDT) Received: from [2620:15c:17:3:22e5:69e1:debc:7973] ([2620:15c:17:3:22e5:69e1:debc:7973]) by smtp.gmail.com with ESMTPSA id j6sm12193573pgq.0.2021.09.14.17.30.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Sep 2021 17:30:04 -0700 (PDT) Date: Tue, 14 Sep 2021 17:30:03 -0700 (PDT) From: David Rientjes To: Feng Tang cc: Andrew Morton , Michal Hocko , Tejun Heo , Zefan Li , Johannes Weiner , Mel Gorman , Vlastimil Babka , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v3] mm/page_alloc: detect allocation forbidden by cpuset and bail out early In-Reply-To: <1631590828-25565-1-git-send-email-feng.tang@intel.com> Message-ID: <3bd87d8a-d09e-ac7-1d1d-25ad1b9d5ed9@google.com> References: <1631590828-25565-1-git-send-email-feng.tang@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 14 Sep 2021, Feng Tang wrote: > diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h > index d2b9c41..d58e047 100644 > --- a/include/linux/cpuset.h > +++ b/include/linux/cpuset.h > @@ -34,6 +34,8 @@ > */ > extern struct static_key_false cpusets_pre_enable_key; > extern struct static_key_false cpusets_enabled_key; > +extern struct static_key_false cpusets_insane_config_key; > + > static inline bool cpusets_enabled(void) > { > return static_branch_unlikely(&cpusets_enabled_key); > @@ -51,6 +53,19 @@ static inline void cpuset_dec(void) > static_branch_dec_cpuslocked(&cpusets_pre_enable_key); > } > > +/* > + * This will get enabled whenever a cpuset configuration is considered > + * unsupportable in general. E.g. movable only node which cannot satisfy > + * any non movable allocations (see update_nodemask). Page allocator > + * needs to make additional checks for those configurations and this > + * check is meant to guard those checks without any overhead for sane > + * configurations. > + */ > +static inline bool cpusets_insane_config(void) > +{ > + return static_branch_unlikely(&cpusets_insane_config_key); > +} > + > extern int cpuset_init(void); > extern void cpuset_init_smp(void); > extern void cpuset_force_rebuild(void); > @@ -167,6 +182,8 @@ static inline void set_mems_allowed(nodemask_t nodemask) > > static inline bool cpusets_enabled(void) { return false; } > > +static inline bool cpusets_insane_config(void) { return false; } > + > static inline int cpuset_init(void) { return 0; } > static inline void cpuset_init_smp(void) {} > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 6a1d79d..a455333 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -1220,6 +1220,22 @@ static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist, > #define for_each_zone_zonelist(zone, z, zlist, highidx) \ > for_each_zone_zonelist_nodemask(zone, z, zlist, highidx, NULL) > > +/* Whether the 'nodes' are all movable nodes */ > +static inline bool movable_only_nodes(nodemask_t *nodes) > +{ > + struct zonelist *zonelist; > + struct zoneref *z; > + > + if (nodes_empty(*nodes)) > + return false; > + > + zonelist = > + &NODE_DATA(first_node(*nodes))->node_zonelists[ZONELIST_FALLBACK]; > + z = first_zones_zonelist(zonelist, ZONE_NORMAL, nodes); > + return (!z->zone) ? true : false; > +} > + > + > #ifdef CONFIG_SPARSEMEM > #include > #endif > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c > index df1ccf4..7fa633e 100644 > --- a/kernel/cgroup/cpuset.c > +++ b/kernel/cgroup/cpuset.c > @@ -69,6 +69,13 @@ > DEFINE_STATIC_KEY_FALSE(cpusets_pre_enable_key); > DEFINE_STATIC_KEY_FALSE(cpusets_enabled_key); > > +/* > + * There could be abnormal cpuset configurations for cpu or memory > + * node binding, add this key to provide a quick low-cost judgement > + * of the situation. > + */ > +DEFINE_STATIC_KEY_FALSE(cpusets_insane_config_key); > + > /* See "Frequency meter" comments, below. */ > > struct fmeter { > @@ -1868,6 +1875,14 @@ static int update_nodemask(struct cpuset *cs, struct cpuset *trialcs, > if (retval < 0) > goto done; > > + if (!cpusets_insane_config() && > + movable_only_nodes(&trialcs->mems_allowed)) { > + static_branch_enable(&cpusets_insane_config_key); > + pr_info("Unsupported (movable nodes only) cpuset configuration detected (nmask=%*pbl)! " > + "Cpuset allocations might fail even with a lot of memory available.\n", > + nodemask_pr_args(&trialcs->mems_allowed)); > + } > + > spin_lock_irq(&callback_lock); > cs->mems_allowed = trialcs->mems_allowed; > spin_unlock_irq(&callback_lock); Is this the only time that the state of the nodemask may change? I'm wondering about a single node nodemask, for example, where all ZONE_NORMAL memory is hot-removed. > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index b37435c..a7e0854 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -4914,6 +4914,19 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > if (!ac->preferred_zoneref->zone) > goto nopage; > > + /* > + * Check for insane configurations where the cpuset doesn't contain > + * any suitable zone to satisfy the request - e.g. non-movable > + * GFP_HIGHUSER allocations from MOVABLE nodes only. > + */ > + if (cpusets_insane_config() && (gfp_mask & __GFP_HARDWALL)) { > + struct zoneref *z = first_zones_zonelist(ac->zonelist, > + ac->highest_zoneidx, > + &cpuset_current_mems_allowed); > + if (!z->zone) > + goto nopage; > + } > + > if (alloc_flags & ALLOC_KSWAPD) > wake_all_kswapds(order, gfp_mask, ac); >