Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp5601776rwl; Tue, 11 Apr 2023 07:39:08 -0700 (PDT) X-Google-Smtp-Source: AKy350aAZ2xqMyFY3AJNtnk4yDd27A2LLdQhNlMsAf+DsctTE/PvUGciFi2AGhXG2H4QJvDcRDwU X-Received: by 2002:a05:6402:32c:b0:501:d43e:d1e3 with SMTP id q12-20020a056402032c00b00501d43ed1e3mr13956271edw.8.1681223948065; Tue, 11 Apr 2023 07:39:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681223948; cv=none; d=google.com; s=arc-20160816; b=Ou04vOXahbUshKXZdVtHtEnVUjfIrKj6JhbGTpX+P4zT1ohWz7eRJFJT4D1Py62LZO 9QIrfRjEtUsolMxqQUuiJ03adTWzZ96R75NcW0pHgJeUXCJiCPfDb8Ma/nm/4sCnTrhe m1tdYzKyBJF+r3nSEaX8R4Ps/qgoJIKqJWcxya+aaw8XLNx32SKwZDVBL6nhDHu7CpXT 4L3J2GWZlXHlEIoGG4GWRQYZatAfh1C2LcBSbMQFWSpL80oSUwfxSn2kht+JNYm7sK8v 8tsGiiYrbT6TTixpnt6O95/v0WmW+l72r++QLJQp71zt+fffWbBgeAqFeZMEPiv5w2cM zkSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=U29cCnOHbPWuYYBzDboY6Km3Yy5eqaduHl0Wu231j74=; b=LvW1bmMJIF1xwOd602krmZSM4gTvlNuSg+eoDMtkzqZ+RuwxHXvYm2hcEnDsKdVDFn HRlEOri0Bs+VB0unI98Y4fu+mLi3/nlghWC8NfMhixS4qu30QzYtEmDizia6Qj6UZ80u p8H4ju75o7hvmuWn1hSHBKi8ZtwH4zT3SRq4OM5d+WEIVeoq9vFcL9tQGgj0h1Y+qKGt rtU1zkvDFkPq5hLb0p2hRC3ghjYoIJ9JT+9HqRzNUx9smHzci/0PlVy0H09B55OFjhGD 1SsE+TY2+9ajh6BAf7H9bkQA5T+BC9KasMlGQWBw0LKfptWZMKQf/PRp+X19kyw7a/Fp g0rQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=edjpGMrZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f19-20020a05640214d300b0050267b43f85si672135edx.414.2023.04.11.07.38.42; Tue, 11 Apr 2023 07:39:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=edjpGMrZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229911AbjDKOgd (ORCPT + 99 others); Tue, 11 Apr 2023 10:36:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52054 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229869AbjDKOgb (ORCPT ); Tue, 11 Apr 2023 10:36:31 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 923811FF3; Tue, 11 Apr 2023 07:36:28 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 473641FD88; Tue, 11 Apr 2023 14:36:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1681223787; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=U29cCnOHbPWuYYBzDboY6Km3Yy5eqaduHl0Wu231j74=; b=edjpGMrZ7N5YaS+Bhq84eaTAsFUJQfQ5qUZDGHxznT+HT1UWjhwdo+L4abwkywXdptySD5 5qp7+MeH72opCbELqEE75FiRgE2ZoDDMZrxDENYrm6SyMRarDUSbtF48i7Jo82DPD+J4MW CncLnF+JzdMoTJt+laveADpsUJQZfok= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 2928A13638; Tue, 11 Apr 2023 14:36:27 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id ltGoB2twNWSeXgAAMHmgww (envelope-from ); Tue, 11 Apr 2023 14:36:27 +0000 Date: Tue, 11 Apr 2023 16:36:26 +0200 From: Michal Hocko To: Gang Li Cc: Waiman Long , cgroups@vger.kernel.org, linux-mm@kvack.org, rientjes@google.com, Zefan Li , linux-kernel@vger.kernel.org Subject: Re: [PATCH v4] mm: oom: introduce cpuset oom Message-ID: References: <20230411065816.9798-1-ligang.bdlg@bytedance.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230411065816.9798-1-ligang.bdlg@bytedance.com> X-Spam-Status: No, score=-2.5 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 11-04-23 14:58:15, Gang Li wrote: > Cpusets constrain the CPU and Memory placement of tasks. > `CONSTRAINT_CPUSET` type in oom has existed for a long time, but > has never been utilized. > > When a process in cpuset which constrain memory placement triggers > oom, it may kill a completely irrelevant process on other numa nodes, > which will not release any memory for this cpuset. > > We can easily achieve node aware oom by using `CONSTRAINT_CPUSET` and > selecting victim from cpusets with the same mems_allowed as the > current one. I believe it still wouldn't hurt to be more specific here. CONSTRAINT_CPUSET is rather obscure. Looking at this just makes my head spin. /* Check this allocation failure is caused by cpuset's wall function */ for_each_zone_zonelist_nodemask(zone, z, oc->zonelist, highest_zoneidx, oc->nodemask) if (!cpuset_zone_allowed(zone, oc->gfp_mask)) cpuset_limited = true; Does this even work properly and why? prepare_alloc_pages sets oc->nodemask to current->mems_allowed but the above gives us cpuset_limited only if there is at least one zone/node that is not oc->nodemask compatible. So it seems like this wouldn't ever get set unless oc->nodemask got reset somewhere. This is a maze indeed. Is there any reason why we cannot rely on __GFP_HARWALL here? Or should we instead rely on the fact the nodemask should be same as current->mems_allowed? I do realize that this is not directly related to your patch but considering this has been mostly doing nothing maybe we want to document it better or even rework it at this occasion. > Example: > > Create two processes named mem_on_node0 and mem_on_node1 constrained > by cpusets respectively. These two processes alloc memory on their > own node. Now node0 has run out of memory, OOM will be invokled by > mem_on_node0. Don't you have an actual real life example with a properly partitioned system which clearly misbehaves and this patch addresses that? -- Michal Hocko SUSE Labs