Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp625189rwb; Thu, 8 Dec 2022 00:23:31 -0800 (PST) X-Google-Smtp-Source: AA0mqf4S+ZnEY3x7iq2wHmMRX8sStdIRxnfJEeb9cS3ECXOft9wC/PC3RwYT/H7D4L8RaJksCzZw X-Received: by 2002:a63:5511:0:b0:477:ce06:6c94 with SMTP id j17-20020a635511000000b00477ce066c94mr59425462pgb.138.1670487810864; Thu, 08 Dec 2022 00:23:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670487810; cv=none; d=google.com; s=arc-20160816; b=dF/gf9/XbF1Z3aqQmC9gqMup3V9RWsHLV03Xz+43yq/O7jX+h1n37/rbwhu1sSmNav 8CkuNsY9+P0MlyLmj3Z8kg/A95CIT4HjEhr6H7qTR4kP3N4bE7DTFa50aii+gJXddxPW hjJjafl5MOhRNJR+NWFFnQLYrC31uxMi+UqFiV/6jslC4V6ywCy7SWFLNz1rLtpj7Vg7 pW/cg5gwyWzmWrXkNE96OPNm4aRt4sg3tYfDUOqKY1ZSQQGj0PQvkL160T4BidqxmdJR eicOJiGjw8QJdGhoPil2Ea2PYkfD6HersPoiY+YTyh0NjUx1GTEskCFVc5QkwA5umxga Mw/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=r6HuzxxK1R5LWZiX2Ry1QA4a/oTdpKiXcJsN1wfL8/4=; b=wFJRviBcs6Zguw9XM6VwX4Wxy/axddNt93GMQbZ8CG3T2fBnE36AIDCEC9VJLYjT7j p1M/wX3YWDEIyt4V7CfS4kv1pDn3D7nPzLoM8ZbJ/bgAiRa3NV/4MmVlQ2/KSfcjfpNE ESOT7TKptCWyLWLcVI+eF54lfBuzjgVRFzo01Syx2nwavc4Fn7Ho1TjoMu3FWGRtLHne J71NVM1xdRl/g3USPmmK2wQHliqg2AmR1Q+Qt6HIZBaTO7gMYk8C6302UeuOATM32ZUP Z0pv2h/ikscxuT3q9Hs1Md7WE0VguS+pNjgh/ViHKRztloy2lNixwHTgZ8h3dw6AY01X I2FA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=iNWSrfnt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w15-20020a17090a8a0f00b002190e2130ffsi3726933pjn.56.2022.12.08.00.23.21; Thu, 08 Dec 2022 00:23:30 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=iNWSrfnt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229722AbiLHIJo (ORCPT + 74 others); Thu, 8 Dec 2022 03:09:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52194 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229462AbiLHIJm (ORCPT ); Thu, 8 Dec 2022 03:09:42 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E919E55A97 for ; Thu, 8 Dec 2022 00:09:41 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 97FBC2075E; Thu, 8 Dec 2022 08:09:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1670486980; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=r6HuzxxK1R5LWZiX2Ry1QA4a/oTdpKiXcJsN1wfL8/4=; b=iNWSrfntGjAMT0fkfWVxQ3zKRHO5wRDAm28xHnmTDcW+rll/wk7LJLDYvhG2jUEehAE1ah ozgpzedy9wg/LeC+STFaVFTVlLkBJoTBFni5WkfYOnB26NpuFc59i0w1jW5DI/2hlW7xzT 0Jx0W42wQDYBg3BwKryIr0+KKJvq/F8= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 79FC5138E0; Thu, 8 Dec 2022 08:09:40 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id soQnG8SbkWPbGwAAMHmgww (envelope-from ); Thu, 08 Dec 2022 08:09:40 +0000 Date: Thu, 8 Dec 2022 09:09:39 +0100 From: Michal Hocko To: Mina Almasry Cc: Andrew Morton , Johannes Weiner , Roman Gushchin , Shakeel Butt , Muchun Song , Huang Ying , Yang Shi , Yosry Ahmed , weixugc@google.com, fvdl@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v3] [mm-unstable] mm: Fix memcg reclaim on memory tiered systems Message-ID: References: <20221206023406.3182800-1-almasrymina@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 07-12-22 13:43:55, Mina Almasry wrote: > On Wed, Dec 7, 2022 at 3:12 AM Michal Hocko wrote: [...] > > Anyway a proper nr_reclaimed tracking should be rather straightforward > > but I do not expect to make a big difference in practice > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 026199c047e0..1b7f2d8cb128 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -1633,7 +1633,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, > > LIST_HEAD(ret_folios); > > LIST_HEAD(free_folios); > > LIST_HEAD(demote_folios); > > - unsigned int nr_reclaimed = 0; > > + unsigned int nr_reclaimed = 0, nr_demoted = 0; > > unsigned int pgactivate = 0; > > bool do_demote_pass; > > struct swap_iocb *plug = NULL; > > @@ -2065,8 +2065,17 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, > > } > > /* 'folio_list' is always empty here */ > > > > - /* Migrate folios selected for demotion */ > > - nr_reclaimed += demote_folio_list(&demote_folios, pgdat); > > + /* > > + * Migrate folios selected for demotion. > > + * Do not consider demoted pages to be reclaimed for the memcg reclaim > > + * because no charges are really freed during the migration. Global > > + * reclaim aims at releasing memory from nodes/zones so consider > > + * demotion to reclaim memory. > > + */ > > + nr_demoted += demote_folio_list(&demote_folios, pgdat); > > + if (!cgroup_reclaim(sc)) > > + nr_reclaimed += nr_demoted; > > + > > /* Folios that could not be demoted are still in @demote_folios */ > > if (!list_empty(&demote_folios)) { > > /* Folios which weren't demoted go back on @folio_list for retry: */ > > > > [...] > > Thank you again, but this patch breaks the memory.reclaim nodes arg > for me. This is my test case. I run it on a machine with 2 memory > tiers. > > Memory tier 1= nodes 0-2 > Memory tier 2= node 3 > > mkdir -p /sys/fs/cgroup/unified/test > cd /sys/fs/cgroup/unified/test > echo $$ > cgroup.procs > head -c 500m /dev/random > /tmp/testfile > echo $$ > /sys/fs/cgroup/unified/cgroup.procs > echo "1m nodes=0-2" > memory.reclaim > > In my opinion the expected behavior is for the kernel to demote 1mb of > memory from nodes 0-2 to node 3. > > Actual behavior on the tip of mm-unstable is as expected. > > Actual behavior with your patch cherry-picked to mm-unstable is that > the kernel demotes all 500mb of memory from nodes 0-2 to node 3, and > returns -EAGAIN to the user. This may be the correct behavior you're > intending, but it completely breaks the use case I implemented the > nodes= arg for and listed on the commit message of that change. Yes, strictly speaking the behavior is correct albeit unexpected. You have told the kernel to _reclaim_ that much memory but demotion are simply aging handling rather than a reclaim if the demotion target has a lot of memory free. This would be the case without any nodemask as well btw. I am worried this will popping up again and again. I thought your nodes subset approach could deal with this but I have overlooked one important thing in your patch. The user provided nodemask controls where to reclaim from but it doesn't constrain demotion targets. Is this intentional? Would it actually make more sense to control demotion by addint demotion nodes into the nodemask? -- Michal Hocko SUSE Labs