Received: by 2002:a05:6a10:af89:0:0:0:0 with SMTP id iu9csp1447636pxb; Wed, 12 Jan 2022 15:12:27 -0800 (PST) X-Google-Smtp-Source: ABdhPJzMcKw1GkmWpZzRsLCjelCNhJgdiKqaRxq1AEOYWpv/J5ncPeztQ0cx+b8t8EG9pPGXvRUQ X-Received: by 2002:a17:90a:a588:: with SMTP id b8mr11604774pjq.25.1642029147720; Wed, 12 Jan 2022 15:12:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1642029147; cv=none; d=google.com; s=arc-20160816; b=LV9JiuIxGrssDELdN2pfbtR9ssb1ljZHKzYE9Jj6C6npzYxM/XSlMOl2OwI7SIdl08 zFsHQWS3WtOVaJRcNtD6kXzoKWw29IDiKcQ4GDfk6aoYbcOVxA/b/LD0SR3IMEYp9PVP +TZ4DhnpiH8tKcVf+hWo2iXucLpLlgzwNcCtIOvYNttfQAMz4oQVR2S640uOXxlXcUPv MnX35Mhjy+HFbafZEqvn3I+MXevR9KdA3k72EyYhOfUh+8+tSiN7HATNHjb/RY/5RY8H BvtHPRpQgM8SMQStJokg9gsp4N5j2M+bKC1IaQRaYAGm5PbfafBmrRo7mvZS3ZB3Fng3 CW4g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=ZvDWrOdsVL4BOKMYpdO/DNmNYh/wCYm8X3avXlAB3QI=; b=cRuJm/DJO2H4sgGRu+6J73dfw436A6PfuIdy2eQs/qhyxQ+1v0CCDzie9CbWZsJbTT NdDa83FNkZae8DjRzzfxZYEW1GwQhkKQ/S/Q1bMrD48Pu+O3I1la1XkVlDdjMhxxhnst Qysy0NqconO8SpMHexmZGdlQSVkEKH7kK/29lt5gyOb1ZSGxWhYiYkHY29j4Xs9+HSnp ITgFAhoGKObRbtyYvlloMkOfeZHFlQclUROkknu5SiVKbSnL301jzgqguIKc/O4ZeELN BGTin04bZTBaP/jxNNS3sIxACdhQbN39GOe12c15d0r1KLT5GezhsSHWhRJy3JoQnth1 ggfw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=TH2+3tnz; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u8si1097831pgr.304.2022.01.12.15.12.13; Wed, 12 Jan 2022 15:12:27 -0800 (PST) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=TH2+3tnz; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1353569AbiALNXP (ORCPT + 99 others); Wed, 12 Jan 2022 08:23:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60136 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1353565AbiALNXO (ORCPT ); Wed, 12 Jan 2022 08:23:14 -0500 Received: from mail-yb1-xb36.google.com (mail-yb1-xb36.google.com [IPv6:2607:f8b0:4864:20::b36]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3247EC061748 for ; Wed, 12 Jan 2022 05:23:14 -0800 (PST) Received: by mail-yb1-xb36.google.com with SMTP id c10so6367239ybb.2 for ; Wed, 12 Jan 2022 05:23:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=ZvDWrOdsVL4BOKMYpdO/DNmNYh/wCYm8X3avXlAB3QI=; b=TH2+3tnzeR1GyZg1J16O9I9i7mV9b3PVye495KnEfWywjmr7Q/bqysBwsQADqi23Fs GEFJQbtX9+KILIkNCU5gsGR0e5d7GsiOBDPvAYwQIOXbE0qhsN654BLc8zgCiFG3mPjd aAdJV27ygEPpJfbuMzu/sADpS/SxQjywTR17duEyuN3vfFAXvGrcM5Ga5c8gre3Ow+/Q UqVt0EBh25e8NgEgXAoAarX7Yols3zbFpXGpV57tEGafhJjCDo0b4TedOOmYfeRjF+j8 VP3RF675+P494CTjh+GmgbWBkQgD2ppZGbBQs4hnNhBoGz5jOR7zpkCo/Cv9tVohAtcQ O4gw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=ZvDWrOdsVL4BOKMYpdO/DNmNYh/wCYm8X3avXlAB3QI=; b=K3BL+HVvF36pMQUXC003R6bHccNk4HfUAF9KPW81wVagOTS1IzQWXlV6aoFJw1/jxP 7xVm3dO2x25yM8LlR07a4/BD//y4/7MhqhVt4P5CTEyM9YKI+/iFHVzU9LaKwB4yRXkH ZJkYcuKPSVpZoANjfCwG1rHPyWZjbG9ZkhMIj5u1CwT5hM8gVqcuRK8tIbNTs54ewjGX Hy1F2SnpRHk5rRVf8xZ66GWwmOyKxgoPGFBHrLp5qLINkGbxXTfaRR+FvxtWGNJaAujF Xp7jKsbVdvfqB61XpcHhK94t5LAF/3jrJRtv+0FKLveFtVuBCT03pafzpJRcbaM0WeMz 6d3Q== X-Gm-Message-State: AOAM530pL7g5uroHxNT3jFe33B/pq+o0Pf0z7QNudfwoddEDlIrg/9tm 5G3ciosMCeOZIO39g7iiLMrNwsmFK5UJMWGCqDCeDQ== X-Received: by 2002:a25:77cb:: with SMTP id s194mr274565ybc.485.1641993793293; Wed, 12 Jan 2022 05:23:13 -0800 (PST) MIME-Version: 1.0 References: <20211220085649.8196-1-songmuchun@bytedance.com> <20211220085649.8196-11-songmuchun@bytedance.com> <20220106110051.GA470@blackbody.suse.cz> In-Reply-To: <20220106110051.GA470@blackbody.suse.cz> From: Muchun Song Date: Wed, 12 Jan 2022 21:22:36 +0800 Message-ID: Subject: Re: [PATCH v5 10/16] mm: list_lru: allocate list_lru_one only when needed To: =?UTF-8?Q?Michal_Koutn=C3=BD?= Cc: Matthew Wilcox , Andrew Morton , Johannes Weiner , Michal Hocko , Vladimir Davydov , Shakeel Butt , Roman Gushchin , Yang Shi , Alex Shi , Wei Yang , Dave Chinner , trond.myklebust@hammerspace.com, anna.schumaker@netapp.com, jaegeuk@kernel.org, chao@kernel.org, Kari Argillander , linux-fsdevel , LKML , Linux Memory Management List , linux-nfs@vger.kernel.org, Qi Zheng , Xiongchun duan , Fam Zheng , Muchun Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Thu, Jan 6, 2022 at 7:00 PM Michal Koutn=C3=BD wrote: > > On Mon, Dec 20, 2021 at 04:56:43PM +0800, Muchun Song wrote: > (Thanks for pointing me here.) > > > -void memcg_drain_all_list_lrus(int src_idx, struct mem_cgroup *dst_mem= cg) > > +void memcg_drain_all_list_lrus(struct mem_cgroup *src, struct mem_cgro= up *dst) > > { > > + struct cgroup_subsys_state *css; > > struct list_lru *lru; > > + int src_idx =3D src->kmemcg_id; > > + > > + /* > > + * Change kmemcg_id of this cgroup and all its descendants to the > > + * parent's id, and then move all entries from this cgroup's list= _lrus > > + * to ones of the parent. > > + * > > + * After we have finished, all list_lrus corresponding to this cg= roup > > + * are guaranteed to remain empty. So we can safely free this cgr= oup's > > + * list lrus in memcg_list_lru_free(). > > + * > > + * Changing ->kmemcg_id to the parent can prevent memcg_list_lru_= alloc() > > + * from allocating list lrus for this cgroup after memcg_list_lru= _free() > > + * call. > > + */ > > + rcu_read_lock(); > > + css_for_each_descendant_pre(css, &src->css) { > > + struct mem_cgroup *memcg; > > + > > + memcg =3D mem_cgroup_from_css(css); > > + memcg->kmemcg_id =3D dst->kmemcg_id; > > + } > > + rcu_read_unlock(); > > Do you envision using this function anywhere else beside offlining? > If not, you shouldn't need traversing whole subtree because normally > parents are offlined only after children (see cgroup_subsys_state.online_= cnt). > > > > > mutex_lock(&list_lrus_mutex); > > list_for_each_entry(lru, &memcg_list_lrus, list) > > - memcg_drain_list_lru(lru, src_idx, dst_memcg); > > + memcg_drain_list_lru(lru, src_idx, dst); > > mutex_unlock(&list_lrus_mutex); > > If you do, then here you only drain list_lru of the subtree root but not > the descendants anymore. > > So I do get that mem_cgroup.kmemcg_id refernces the "effective" > kmemcg_id after offlining, so that proper list_lrus are used afterwards. > > I wonder -- is this necessary when objcgs are reparented too? IOW would > anyone query the offlined child's kmemcg_id? > (Maybe it's worth explaining better in the commit message, I think even > current approach is OK (better safe than sorry).) > Sorry for the late reply. Image a bunch of memcg as follows. C's parent is B, B's parent is A and A's parent is root. The numbers in parentheses are their kmemcg_id respectively. root(-1) -> A(0) -> B(1) -> C(2) CPU0: CPU1: memcg_list_lru_alloc(C) memcg_drain_all_list_lrus(C) memcg_drain_all_list_lrus(B) // Now C and B are offline. The // kmemcg_id becomes the following = if // we do not the kmemcg_id of its // descendants in // memcg_drain_all_list_lrus(). // // root(-1) -> A(0) -> B(0) -> C(1) for (i =3D 0; memcg; memcg =3D parent_mem_cgroup(memcg), i++) { // allocate struct list_lru_per_memcg for memcg C table[i].mlru =3D memcg_init_list_lru_one(gfp); } spin_lock_irqsave(&lru->lock, flags); while (i--) { // here index =3D 1 int index =3D table[i].memcg->kmemcg_id; struct list_lru_per_memcg *mlru =3D table[i].mlru; if (index < 0 || rcu_dereference_protected(mlrus->mlru[index], true)) kfree(mlru); else // mlrus->mlru[index] will be assigned a new value regardless // memcg C is already offline. rcu_assign_pointer(mlrus->mlru[index], mlru); } spin_unlock_irqrestore(&lru->lock, flags); So changing ->kmemcg_id of all its descendants can prevent memcg_list_lru_alloc() from allocating list lrus for the offlined cgroup after memcg_list_lru_free() calling. Thanks.