Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp2146689pxj; Sat, 5 Jun 2021 14:42:23 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxgvIUiza02Yclui7vYmHgH2IYxuzXCm4Fjai8VqCliDyvcckjw2/cYf8FTTnmzyDqZwo1L X-Received: by 2002:a17:906:13cb:: with SMTP id g11mr10574862ejc.169.1622929343383; Sat, 05 Jun 2021 14:42:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1622929343; cv=none; d=google.com; s=arc-20160816; b=Os9foQuVBgKF63SSAzUE4PM+fVoqtsl0T13nD2pkutk+1aoDP9OwlKEadDPTYCrvXA QVLfOSQfMxBElINvjPVTWZe5qTkrQ82s3NDsWvawKH0uoJe9RReweiTwrQMRgAXc1uyU QauYxIYaEft/vafYyQGj7YFBMwBq2O/gsIA6eeXVq7suXGKELS/aY94qeebvKWAiNp20 5FVHYOthRRgJ/dmfbuMi5N273U1ep7WiU1pQ+BTbnHv9p7ZJL2BHFliOJzIpWdzHl+/i xxyWs++3g5uikNeFeb3/vxk9GWwtV1LzwsHTwiK/6X88XKE8Hx4cnJd4sZ2vV9hcoeX1 O0Sg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=f0JYFWLbU1K2WU+3KcY+euQ65dBHJZ+TcnqUBRn+Ixw=; b=M5re3IaakilxL8eUCCWEeN0X4DkejhV8yF8Ne+uzu5oQMR9Ou+zUAKxl3TMl18hJaF Km2cdmC7D/drWU6k9U6/239oot/TpsRAVUhPflMdQofVYrNBzW3dXMJSFtjwNOuEoGx2 1w9qK47W1BT4ElYYQhFF4Lmdbmkco/Z8+Qq5Fi5BZrDkl7708jQAVd2GcD41mJdthQax o5OlF5odoAb2tSlZse4CMDh62m118rlhMSKj4akXJsaRuGYYJ+GYEomI/4cEdIW/EBNc vF1QPadAlow9UScQCBZ14zFQYJA4Qo82Gzpoyo8J9ZTNlWXIj6yVJsdhUrKV08dJFnOl chjQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ka6si8162506ejc.204.2021.06.05.14.42.00; Sat, 05 Jun 2021 14:42:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230147AbhFEVjp (ORCPT + 99 others); Sat, 5 Jun 2021 17:39:45 -0400 Received: from mail-il1-f180.google.com ([209.85.166.180]:42516 "EHLO mail-il1-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230010AbhFEVjn (ORCPT ); Sat, 5 Jun 2021 17:39:43 -0400 Received: by mail-il1-f180.google.com with SMTP id a8so11300949ilv.9; Sat, 05 Jun 2021 14:37:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=f0JYFWLbU1K2WU+3KcY+euQ65dBHJZ+TcnqUBRn+Ixw=; b=rYrH/iCtcdSFI9Wp5XTB6rFjWGy65dCezviNROeFN5cdB0KuU3jaMkNAp0pDyCoq54 KgM2DUOGWR77jZBGQD9hinqSyumCsRmLJjPL7R6tUyVmPJLLI0eEmWbq/Pm9181ZafLV A2BG6OKnk6daBcrFcYP3eYwc9hUx43esDCNL7F+71MaauBn7g//3eA5hs0UfG9wColS3 DnBKqTZ3hXekmtn2C2/O2V9evrvH5YogK92Vao0LPpfCqiyQotPy9RH35sbV6dNEZ0CO nK1un+b/bx/r3Hgu5mUnilxldWv9NsZEhFVNcD+cXYCMQY3n/+hLPqUmxDtnrUgpb0V0 svBA== X-Gm-Message-State: AOAM5326Inj7PmUlImMzAgbReo2Wbl2tupkCgO9yfref3yByoW3Bq+0Z 4c8xY2luC6mrZJCpDsPETzU= X-Received: by 2002:a92:c611:: with SMTP id p17mr8862502ilm.166.1622929059076; Sat, 05 Jun 2021 14:37:39 -0700 (PDT) Received: from google.com (243.199.238.35.bc.googleusercontent.com. [35.238.199.243]) by smtp.gmail.com with ESMTPSA id 15sm3666647ilt.66.2021.06.05.14.37.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 05 Jun 2021 14:37:38 -0700 (PDT) Date: Sat, 5 Jun 2021 21:37:37 +0000 From: Dennis Zhou To: Roman Gushchin Cc: Jan Kara , Tejun Heo , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Alexander Viro , Dave Chinner , cgroups@vger.kernel.org Subject: Re: [PATCH v7 0/6] cgroup, blkcg: prevent dirty inodes to pin dying memory cgroups Message-ID: References: <20210604013159.3126180-1-guro@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210604013159.3126180-1-guro@fb.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, On Thu, Jun 03, 2021 at 06:31:53PM -0700, Roman Gushchin wrote: > When an inode is getting dirty for the first time it's associated > with a wb structure (see __inode_attach_wb()). It can later be > switched to another wb (if e.g. some other cgroup is writing a lot of > data to the same inode), but otherwise stays attached to the original > wb until being reclaimed. > > The problem is that the wb structure holds a reference to the original > memory and blkcg cgroups. So if an inode has been dirty once and later > is actively used in read-only mode, it has a good chance to pin down > the original memory and blkcg cgroups forewer. This is often the case with > services bringing data for other services, e.g. updating some rpm > packages. > > In the real life it becomes a problem due to a large size of the memcg > structure, which can easily be 1000x larger than an inode. Also a > really large number of dying cgroups can raise different scalability > issues, e.g. making the memory reclaim costly and less effective. > > To solve the problem inodes should be eventually detached from the > corresponding writeback structure. It's inefficient to do it after > every writeback completion. Instead it can be done whenever the > original memory cgroup is offlined and writeback structure is getting > killed. Scanning over a (potentially long) list of inodes and detach > them from the writeback structure can take quite some time. To avoid > scanning all inodes, attached inodes are kept on a new list (b_attached). > To make it less noticeable to a user, the scanning and switching is performed > from a work context. > > Big thanks to Jan Kara, Dennis Zhou and Hillf Danton for their ideas and > contribution to this patchset. > > v7: > - shared locking for multiple inode switching > - introduced inode_prepare_wbs_switch() helper > - extended the pre-switch inode check for I_WILL_FREE > - added comments here and there > > v6: > - extended and reused wbs switching functionality to switch inodes > on cgwb cleanup > - fixed offline_list handling > - switched to the unbound_wq > - other minor fixes > > v5: > - switch inodes to bdi->wb instead of zeroing inode->i_wb > - split the single patch into two > - only cgwbs maintain lists of attached inodes > - added cond_resched() > - fixed !CONFIG_CGROUP_WRITEBACK handling > - extended list of prohibited inodes flag > - other small fixes > > > Roman Gushchin (6): > writeback, cgroup: do not switch inodes with I_WILL_FREE flag > writeback, cgroup: switch to rcu_work API in inode_switch_wbs() > writeback, cgroup: keep list of inodes attached to bdi_writeback > writeback, cgroup: split out the functional part of > inode_switch_wbs_work_fn() > writeback, cgroup: support switching multiple inodes at once > writeback, cgroup: release dying cgwbs by switching attached inodes > > fs/fs-writeback.c | 302 +++++++++++++++++++++---------- > include/linux/backing-dev-defs.h | 20 +- > include/linux/writeback.h | 1 + > mm/backing-dev.c | 69 ++++++- > 4 files changed, 293 insertions(+), 99 deletions(-) > > -- > 2.31.1 > I too am a bit late to the party. Feel free to add mine as well to the series. Acked-by: Dennis Zhou I left my one comment on the last patch regarding a possible future extension. Thanks, Dennis