Received: by 10.213.65.68 with SMTP id h4csp709275imn; Fri, 6 Apr 2018 07:38:30 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+c58tJQwDqHel3tV7kmzJ9KmUjc22e2psHrM2+CxnraPzOE2zJ74VfB5HrduNseI3XIVc0 X-Received: by 2002:a17:902:464:: with SMTP id 91-v6mr27356524ple.126.1523025510314; Fri, 06 Apr 2018 07:38:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523025510; cv=none; d=google.com; s=arc-20160816; b=GNcCO/nogNT9upNysjR/mhFU+thHtRMnUV6naYx+ZqALu8sW7C0ThdQIvT6QYq9HZV BH90M/kN1xGXGerg/9BM8RkXH8jQex+KOdVT1sE0/X7DDfH/GlpWVTZRHwqxiBzfggwx Lum5taOCfXNwzKidLKf2AwiIwFwVOoFH73bSiiCJl4Vvt1I18qMjH1vjCSuk9WOAlFLc PVa2H+0c9CR3c8gycMfIIP41oIW0StLVnNiWtKx5V8VnMZpOtBsEH8mGriHoFUz5pEPj RmCMah5tXL5Nf2/oS335uIPkoXU6BerqOfee34ttUQ5QneK0cazBOcAnpLQ75Hai6iDr fcwg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=7JxZvQx6456nUJZy4MVzB9VWXTy6wvWD7YBOifs2514=; b=Fub1K1yD9hbH+aahQiRtKQtEYwnBs2tp2iaCfkIeorV375hoFu2x6Phv38dNBnvnzn +M/oYim4vcVsPipRVz/LSmjwY0j0a52AFfksrOArPEDQ+nOkGq753hRALkuFv87fWNdH be2edga9tL+we2sshyjYReyPtVitaOVldQRIkBL4hr2WTlFbqpWitZya6lkD85Naq49l Wn9RrEQhtPz9igmm98jKoF02Ld7UxRXtDrb/jT0q4NUIQXKtH3YQedCpd3uwAunNLjTA ucACXVnOJ6SqeFSihlNT1sP2s54V4qAPCDIDXk3xMnqzok4GT5wTFviprNCJ0miNrC+Y t16A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Ob4P3P28; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y13si8168114pfd.47.2018.04.06.07.38.16; Fri, 06 Apr 2018 07:38:30 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Ob4P3P28; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932073AbeDFOhP (ORCPT + 99 others); Fri, 6 Apr 2018 10:37:15 -0400 Received: from mail-wm0-f66.google.com ([74.125.82.66]:37054 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755997AbeDFOhN (ORCPT ); Fri, 6 Apr 2018 10:37:13 -0400 Received: by mail-wm0-f66.google.com with SMTP id r131so3770088wmb.2 for ; Fri, 06 Apr 2018 07:37:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=7JxZvQx6456nUJZy4MVzB9VWXTy6wvWD7YBOifs2514=; b=Ob4P3P28a1vEZMEGFsOXrnN0iBfN+GaHd2iRCCcfIjnbyZx8g2/a28A5XnEA2PPvCJ J2HuR2ltvU0NWkX4U7uPuxNWUg4p+X3VvlC+6s1axqR18XWO8Trw7BZFYu6lGzmHNJeb Mf8Dwgp7nzRyAIjpBjoHghVNeSPoYr+VOs+mYKqhVsrthMzm0icTnJWD1bYuMuqb2OJr 5mLm3zCeNaGGFoG6cRUIIWX0tvZIXkOWoPEJdS02h+rCmVzNkKEo0NgjqaK4oBKQlbDT dE6F8dM2NtiC49CeHn0YnAwkVkGN/G8UVw9PVw+hNLYa8sNULuXP5hI0Y9gdvnLFiHIB Agnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=7JxZvQx6456nUJZy4MVzB9VWXTy6wvWD7YBOifs2514=; b=D3zkAAKGMmWNPToCohZKdGdirI+MVAKVCo4n+Yj1XoYoG+BdVWhC8AnFKfrjIl5xTs jJs0cEJcYNW/Fpj0b7LVRrVuO+DOpjqsobPSL7Pw/UW8FVgeG+C5EdWehD6A6wbnXjpi dXMAOqYAuUtl+OFl9Y98iI/pgjQG2ylsr+7HROlZQGKGz1exN0Im8lBqqJCeHjdw5OHW 1Yi5WJhe/4et3zkJf7y1h1KiP0WKfHTnlV99+DC8XA0xZLnU6t+2fiDDBXBvpEmE4kSg DdLgN7MPqlASPvWxijyClXrdJj8W8KiYFLeHoeuuoDqkTwApFNI6CwPhDFw40OzNU3Yp vccw== X-Gm-Message-State: AElRT7FuTiOVDk2yZiNdHPaJxhd6Eg8LSL8RBRzMTj5uM75kihFL5cGQ pkyhopDAKkba+pIlquSkWd1dRqvNdKfqulO1EMsWwA== X-Received: by 10.28.91.65 with SMTP id p62mr15768625wmb.140.1523025431558; Fri, 06 Apr 2018 07:37:11 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.156.139 with HTTP; Fri, 6 Apr 2018 07:37:10 -0700 (PDT) In-Reply-To: <20180406135215.10057-1-aryabinin@virtuozzo.com> References: <20180406135215.10057-1-aryabinin@virtuozzo.com> From: Shakeel Butt Date: Fri, 6 Apr 2018 07:37:10 -0700 Message-ID: Subject: Re: [PATCH] mm-vmscan-dont-mess-with-pgdat-flags-in-memcg-reclaim-v2-fix To: Andrey Ryabinin Cc: Andrew Morton , Mel Gorman , Tejun Heo , Johannes Weiner , Michal Hocko , Linux MM , LKML , Cgroups Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 6, 2018 at 6:52 AM, Andrey Ryabinin wrote: > On 04/06/2018 05:13 AM, Shakeel Butt wrote: >> Question: Should this 'flags' be per-node? Is it ok for a congested >> memcg to call wait_iff_congested for all nodes? > > Indeed, congestion state should be pre-node. If memcg on node A is > congested, there is no point is stalling memcg reclaim from node B. > > Make congestion state per-cgroup-per-node and record it in > 'struct mem_cgroup_per_node'. > > Signed-off-by: Andrey Ryabinin > --- > include/linux/memcontrol.h | 5 +++-- > mm/vmscan.c | 39 +++++++++++++++++++++++++-------------- > 2 files changed, 28 insertions(+), 16 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 8b394bbf1c86..af9eed2e3e04 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -120,6 +120,9 @@ struct mem_cgroup_per_node { > unsigned long usage_in_excess;/* Set to the value by which */ > /* the soft limit is exceeded*/ > bool on_tree; > + bool congested; /* memcg has many dirty pages */ > + /* backed by a congested BDI */ > + > struct mem_cgroup *memcg; /* Back pointer, we cannot */ > /* use container_of */ > }; > @@ -189,8 +192,6 @@ struct mem_cgroup { > /* vmpressure notifications */ > struct vmpressure vmpressure; > > - unsigned long flags; > - > /* > * Should the accounting and control be hierarchical, per subtree? > */ > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 99688299eba8..78214c899710 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -200,16 +200,27 @@ static bool sane_reclaim(struct scan_control *sc) > return false; > } > > -static void set_memcg_bit(enum pgdat_flags flag, > - struct mem_cgroup *memcg) > +static void set_memcg_congestion(pg_data_t *pgdat, > + struct mem_cgroup *memcg, > + bool congested) > { > - set_bit(flag, &memcg->flags); > + struct mem_cgroup_per_node *mz; > + > + if (!memcg) > + return; > + > + mz = mem_cgroup_nodeinfo(memcg, pgdat->node_id); > + WRITE_ONCE(mz->congested, congested); > } > > -static int test_memcg_bit(enum pgdat_flags flag, > +static bool memcg_congested(pg_data_t *pgdat, > struct mem_cgroup *memcg) > { > - return test_bit(flag, &memcg->flags); > + struct mem_cgroup_per_node *mz; > + > + mz = mem_cgroup_nodeinfo(memcg, pgdat->node_id); > + return READ_ONCE(mz->congested); > + > } > #else > static bool global_reclaim(struct scan_control *sc) > @@ -222,15 +233,16 @@ static bool sane_reclaim(struct scan_control *sc) > return true; > } > > -static inline void set_memcg_bit(enum pgdat_flags flag, > - struct mem_cgroup *memcg) > +static inline void set_memcg_congestion(struct pglist_data *pgdat, > + struct mem_cgroup *memcg, bool congested) > { > } > > -static inline int test_memcg_bit(enum pgdat_flags flag, > - struct mem_cgroup *memcg) > +static inline bool memcg_congested(struct pglist_data *pgdat, > + struct mem_cgroup *memcg) > { > - return 0; > + return false; > + > } > #endif > > @@ -2482,7 +2494,7 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat, > static bool pgdat_memcg_congested(pg_data_t *pgdat, struct mem_cgroup *memcg) > { > return test_bit(PGDAT_CONGESTED, &pgdat->flags) || > - (memcg && test_memcg_bit(PGDAT_CONGESTED, memcg)); > + (memcg && memcg_congested(pgdat, memcg)); I am wondering if we should check all ancestors for congestion as well. Maybe a parallel memcg reclaimer might have set some ancestor of this memcg to congested. > } > > static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc) > @@ -2617,7 +2629,7 @@ static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc) > */ > if (!global_reclaim(sc) && sane_reclaim(sc) && > sc->nr.dirty && sc->nr.dirty == sc->nr.congested) > - set_memcg_bit(PGDAT_CONGESTED, root); > + set_memcg_congestion(pgdat, root, true); > > /* > * Stall direct reclaim for IO completions if underlying BDIs > @@ -2844,6 +2856,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist, > continue; > last_pgdat = zone->zone_pgdat; > snapshot_refaults(sc->target_mem_cgroup, zone->zone_pgdat); > + set_memcg_congestion(last_pgdat, sc->target_mem_cgroup, false); > } > > delayacct_freepages_end(); > @@ -3067,7 +3080,6 @@ unsigned long mem_cgroup_shrink_node(struct mem_cgroup *memcg, > * the priority and make it zero. > */ > shrink_node_memcg(pgdat, memcg, &sc, &lru_pages); > - clear_bit(PGDAT_CONGESTED, &memcg->flags); > > trace_mm_vmscan_memcg_softlimit_reclaim_end(sc.nr_reclaimed); > > @@ -3113,7 +3125,6 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, > noreclaim_flag = memalloc_noreclaim_save(); > nr_reclaimed = do_try_to_free_pages(zonelist, &sc); > memalloc_noreclaim_restore(noreclaim_flag); > - clear_bit(PGDAT_CONGESTED, &memcg->flags); > > trace_mm_vmscan_memcg_reclaim_end(nr_reclaimed); > > -- > 2.16.1 >