Received: by 2002:ab2:1149:0:b0:1f3:1f8c:d0c6 with SMTP id z9csp2765575lqz; Wed, 3 Apr 2024 08:03:57 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVQCRaBYMekmxiEUehMoiMoWdmLQpkjsADeCpUN7bBpkrUMISNceQZ+YvBiAehZIZ3LPmZ3uZRpQvUCLyOPMRVoEqI5YarHnLgRkuM48w== X-Google-Smtp-Source: AGHT+IGey75PAGx0Oa47Z1HRfZlBdVwTv4KYss7v71mIB54rq0nAnQry3YAky/7jI7BCbgbLaAU6 X-Received: by 2002:a17:907:1119:b0:a4a:3861:d63b with SMTP id qu25-20020a170907111900b00a4a3861d63bmr11759742ejb.71.1712156637486; Wed, 03 Apr 2024 08:03:57 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1712156637; cv=pass; d=google.com; s=arc-20160816; b=frgc6hGhjbDyGRH9lBuNC92kBpNmY+2KwCvvYxJ9ziiPij4PZpWXM65x5bEn7rLNKu bET6S4zo3eQ09D13JMxVAZgzurCy3EzrF6LwFF5zJV00UV6zXDDbIWOB6bBtmB105DCT 161aMkuUAaZQq+v8GJxzw0vFFpxHjxjvfI/oKZbIWj0PXTfjtCTJ+ShXRdL5/9UYbDzP uwC/euOAzbYXLf8xHPoAVCKK7MMGLWaVCr/oK/VoJFI/EnTg7jWUalmJbWg3hKFqSEEK 65Kx6M9/6lNq0in2Ka/VoIymqm6dChpbgIgfOGXyoaiarR1gRIipyb0kTb1UM3ZLEOQa f1RQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=1t9rNljNB7WZCGoFa1Ob1ROcqUyp21vTL5nU3nsygOo=; fh=fDRQHOpa3R/Va0RHox1zqWfvrHjL+uYRxNcNu2RIMXk=; b=M8VwhRXQm9g3a8eTHfq1q0nEwsCp2tHGNGj0T7qnj3YYlEqvYPSi6r4vl4yDRABbnI VBfRp56wQyfHYbIJ9d4iE36q1/kMZmJB14gEgV62nnC2xLhY1wLDAYM+Uc7B6QsexzDn LhslecTAEGzE2nDq3pQGBofP/i1w5+zTKQa874o4zfRRRQUnj3ZVhde58Xpp8KUOyl2+ w3i/KiS0KqVyVB/84B0D4bk3vGuaglsG360xVsTi3HFLYSZSlhSEl6Y8Lbx+yHpK7UPV RIaWpBf0x1cyPv+1QHK0nJHhxb3ZhvDvZGLz2FEJhOJu4yzzCT8nZvPqITEUMkz2mIHI CHrA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=PtVD7uWM; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-130024-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-130024-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id hd40-20020a17090796a800b00a4e8cfa98absi1822420ejc.29.2024.04.03.08.03.57 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Apr 2024 08:03:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-130024-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=PtVD7uWM; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-130024-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-130024-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 08AD51F21429 for ; Wed, 3 Apr 2024 15:03:57 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 22E9D14A4D2; Wed, 3 Apr 2024 15:03:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="PtVD7uWM" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 18D4314A4C0 for ; Wed, 3 Apr 2024 15:03:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712156595; cv=none; b=rhGL1aavO9dIw2DUKNZGuiy/TiTqCMaEdMRyWMG6uR0J6sR22O27bq/fn6K5Ir6dlQJTnmkST0xSN/wXvY1K3ok1r6UPBKBAnPABZGayanAeaq5dM+L3GdsQuIjDx+2/l7dTRztPmqU33mrqfCbstKoRAZvAflNz/H+vihg3OhU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712156595; c=relaxed/simple; bh=xdm/sS39KuLJea9bB+omDTBqRO1nfiovy2OcPy7/pt0=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=FW9L1jOdDBaWdcE40hmGIOx9zAH14puD4X+NARNizqiBmigq4UqAkFaDInYS3XrjjU5YnUh3jmLkpbtJIseneL7QQJEbdQ0NLBklyyW9PW218MkjWYYf6Mx5LZO6laBdGSelNyKibnGAphJe3BKjpLO9RY0xbvLNTm92YkjW1Gs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=PtVD7uWM; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1712156593; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=1t9rNljNB7WZCGoFa1Ob1ROcqUyp21vTL5nU3nsygOo=; b=PtVD7uWMfI2J8HKthVU9OoChUR29uTScdtR9hSTw4NcJOnZjLtzGnA+p4LTrQIrnz0hYd0 p6N4q5taWbPbwwRwrz61dRQjYVqEYbKAeLvC+HjfvLehs3sMIASl0LamYY/4ugu4v0PzQy GobC3nY7MUkWkbBTw23yc85WsPh9VbM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-625-Rwfc9kBWMhWP0sXigXb-eA-1; Wed, 03 Apr 2024 11:03:02 -0400 X-MC-Unique: Rwfc9kBWMhWP0sXigXb-eA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 976748D1385; Wed, 3 Apr 2024 15:03:01 +0000 (UTC) Received: from bfoster (unknown [10.22.16.57]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 0FA7C10E47; Wed, 3 Apr 2024 15:03:00 +0000 (UTC) Date: Wed, 3 Apr 2024 11:04:58 -0400 From: Brian Foster To: Kemeng Shi Cc: akpm@linux-foundation.org, willy@infradead.org, jack@suse.cz, tj@kernel.org, dsterba@suse.com, mjguzik@gmail.com, dhowells@redhat.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH v2 3/6] writeback: support retrieving per group debug writeback stats of bdi Message-ID: References: <20240327155751.3536-1-shikemeng@huaweicloud.com> <20240327155751.3536-4-shikemeng@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.5 On Wed, Apr 03, 2024 at 04:49:42PM +0800, Kemeng Shi wrote: > > > on 3/29/2024 9:10 PM, Brian Foster wrote: > > On Wed, Mar 27, 2024 at 11:57:48PM +0800, Kemeng Shi wrote: > >> Add /sys/kernel/debug/bdi/xxx/wb_stats to show per group writeback stats > >> of bdi. > >> > > > > Hi Kemeng, > Hello Brian, > > > > Just a few random thoughts/comments.. > > > >> Following domain hierarchy is tested: > >> global domain (320G) > >> / \ > >> cgroup domain1(10G) cgroup domain2(10G) > >> | | > >> bdi wb1 wb2 > >> > >> /* per wb writeback info of bdi is collected */ > >> cat /sys/kernel/debug/bdi/252:16/wb_stats > >> WbCgIno: 1 > >> WbWriteback: 0 kB > >> WbReclaimable: 0 kB > >> WbDirtyThresh: 0 kB > >> WbDirtied: 0 kB > >> WbWritten: 0 kB > >> WbWriteBandwidth: 102400 kBps > >> b_dirty: 0 > >> b_io: 0 > >> b_more_io: 0 > >> b_dirty_time: 0 > >> state: 1 > > > > Maybe some whitespace or something between entries would improve > > readability? > Sure, I will add a whitespace in next version. > > > >> WbCgIno: 4094 > >> WbWriteback: 54432 kB > >> WbReclaimable: 766080 kB > >> WbDirtyThresh: 3094760 kB > >> WbDirtied: 1656480 kB > >> WbWritten: 837088 kB > >> WbWriteBandwidth: 132772 kBps > >> b_dirty: 1 > >> b_io: 1 > >> b_more_io: 0 > >> b_dirty_time: 0 > >> state: 7 > >> WbCgIno: 4135 > >> WbWriteback: 15232 kB > >> WbReclaimable: 786688 kB > >> WbDirtyThresh: 2909984 kB > >> WbDirtied: 1482656 kB > >> WbWritten: 681408 kB > >> WbWriteBandwidth: 124848 kBps > >> b_dirty: 0 > >> b_io: 1 > >> b_more_io: 0 > >> b_dirty_time: 0 > >> state: 7 > >> > >> Signed-off-by: Kemeng Shi > >> --- > >> include/linux/writeback.h | 1 + > >> mm/backing-dev.c | 88 +++++++++++++++++++++++++++++++++++++++ > >> mm/page-writeback.c | 19 +++++++++ > >> 3 files changed, 108 insertions(+) > >> > > ... > >> diff --git a/mm/backing-dev.c b/mm/backing-dev.c > >> index 8daf950e6855..e3953db7d88d 100644 > >> --- a/mm/backing-dev.c > >> +++ b/mm/backing-dev.c > >> @@ -103,6 +103,91 @@ static void collect_wb_stats(struct wb_stats *stats, > >> } > >> > >> #ifdef CONFIG_CGROUP_WRITEBACK > > ... > >> +static int cgwb_debug_stats_show(struct seq_file *m, void *v) > >> +{ > >> + struct backing_dev_info *bdi; > >> + unsigned long background_thresh; > >> + unsigned long dirty_thresh; > >> + struct bdi_writeback *wb; > >> + struct wb_stats stats; > >> + > >> + rcu_read_lock(); > >> + bdi = lookup_bdi(m); > >> + if (!bdi) { > >> + rcu_read_unlock(); > >> + return -EEXIST; > >> + } > >> + > >> + global_dirty_limits(&background_thresh, &dirty_thresh); > >> + > >> + list_for_each_entry_rcu(wb, &bdi->wb_list, bdi_node) { > >> + memset(&stats, 0, sizeof(stats)); > >> + stats.dirty_thresh = dirty_thresh; > > > > If you did something like the following here, wouldn't that also zero > > the rest of the structure? > > > > struct wb_stats stats = { .dirty_thresh = dirty_thresh }; > > > Suer, will do it in next version. > >> + collect_wb_stats(&stats, wb); > >> + > > > > Also, similar question as before on whether you'd want to check > > WB_registered or something here.. > Still prefer to keep full debug info and user could filter out on > demand. Ok. I was more wondering if that was needed for correctness. If not, then that seems fair enough to me. > > > >> + if (mem_cgroup_wb_domain(wb) == NULL) { > >> + wb_stats_show(m, wb, &stats); > >> + continue; > >> + } > > > > Can you explain what this logic is about? Is the cgwb_calc_thresh() > > thing not needed in this case? A comment might help for those less > > familiar with the implementation details. > If mem_cgroup_wb_domain(wb) is NULL, then it's bdi->wb, otherwise, > it's wb in cgroup. For bdi->wb, there is no need to do wb_tryget > and cgwb_calc_thresh. Will add some comment in next version. > > > > BTW, I'm also wondering if something like the following is correct > > and/or roughly equivalent: > > > > list_for_each_*(wb, ...) { > > struct wb_stats stats = ...; > > > > if (!wb_tryget(wb)) > > continue; > > > > collect_wb_stats(&stats, wb); > > > > /* > > * Extra wb_thresh magic. Drop rcu lock because ... . We > > * can do so here because we have a ref. > > */ > > if (mem_cgroup_wb_domain(wb)) { > > rcu_read_unlock(); > > stats.wb_thresh = min(stats.wb_thresh, cgwb_calc_thresh(wb)); > > rcu_read_lock(); > > } > > > > wb_stats_show(m, wb, &stats) > > wb_put(wb); > > } > It's correct as wb_tryget to bdi->wb has no harm. I have considered > to do it in this way, I change my mind to do it in new way for > two reason: > 1. Put code handling wb in cgroup more tight which could be easier > to maintain. > 2. Rmove extra wb_tryget/wb_put for wb in bdi. > Would this make sense to you? Ok, well assuming it is correct the above logic is a bit more simple and readable to me. I think you'd just need to fill in the comment around the wb_thresh thing rather than i.e. having to explain we don't need to ref bdi->wb even though it doesn't seem to matter. I kind of feel the same on the wb_stats file thing below just because it seems more consistent and available if wb_stats eventually grows more wb-specific data. That said, this is subjective and not hugely important so I don't insist on either point. Maybe wait a bit and see if Jan or Tejun or somebody has any thoughts..? If nobody else expresses explicit preference then I'm good with it either way. Brian > > > >> + > >> + /* > >> + * cgwb_release will destroy wb->memcg_completions which > >> + * will be ued in cgwb_calc_thresh. Use wb_tryget to prevent > >> + * memcg_completions destruction from cgwb_release. > >> + */ > >> + if (!wb_tryget(wb)) > >> + continue; > >> + > >> + rcu_read_unlock(); > >> + /* cgwb_calc_thresh may sleep in cgroup_rstat_flush */ > >> + stats.wb_thresh = min(stats.wb_thresh, cgwb_calc_thresh(wb)); > >> + wb_stats_show(m, wb, &stats); > >> + rcu_read_lock(); > >> + wb_put(wb); > >> + } > >> + rcu_read_unlock(); > >> + > >> + return 0; > >> +} > >> +DEFINE_SHOW_ATTRIBUTE(cgwb_debug_stats); > >> + > >> +static void cgwb_debug_register(struct backing_dev_info *bdi) > >> +{ > >> + debugfs_create_file("wb_stats", 0444, bdi->debug_dir, bdi, > >> + &cgwb_debug_stats_fops); > >> +} > >> + > >> static void bdi_collect_stats(struct backing_dev_info *bdi, > >> struct wb_stats *stats) > >> { > >> @@ -117,6 +202,8 @@ static void bdi_collect_stats(struct backing_dev_info *bdi, > >> { > >> collect_wb_stats(stats, &bdi->wb); > >> } > >> + > >> +static inline void cgwb_debug_register(struct backing_dev_info *bdi) { } > > > > Could we just create the wb_stats file regardless of whether cgwb is > > enabled? Obviously theres only one wb in the !CGWB case and it's > > somewhat duplicative with the bdi stats file, but that seems harmless if > > the same code can be reused..? Maybe there's also a small argument for > > dropping the state info from the bdi stats file and moving it to > > wb_stats.In backing-dev.c, there are a lot "#ifdef CGWB .. #else .. #endif" to > avoid unneed extra cost when CGWB is not enabled. > I think it's better to avoid extra cost from wb_stats when CGWB is not > enabled. For now, we only save cpu cost to create and destroy wb_stats > and save memory cost to record debugfs file, we could save more in > future when wb_stats records more debug info. > Move state info from bdi stats to wb_stats make senses to me. The only > concern would be compatibility problem. I will add a new patch to this > to make this more noticeable and easier to revert. > Thanks a lot for review! > > Kemeng > > > > Brian > > > >> #endif > >> > >> static int bdi_debug_stats_show(struct seq_file *m, void *v) > >> @@ -182,6 +269,7 @@ static void bdi_debug_register(struct backing_dev_info *bdi, const char *name) > >> > >> debugfs_create_file("stats", 0444, bdi->debug_dir, bdi, > >> &bdi_debug_stats_fops); > >> + cgwb_debug_register(bdi); > >> } > >> > >> static void bdi_debug_unregister(struct backing_dev_info *bdi) > >> diff --git a/mm/page-writeback.c b/mm/page-writeback.c > >> index 0e20467367fe..3724c7525316 100644 > >> --- a/mm/page-writeback.c > >> +++ b/mm/page-writeback.c > >> @@ -893,6 +893,25 @@ unsigned long wb_calc_thresh(struct bdi_writeback *wb, unsigned long thresh) > >> return __wb_calc_thresh(&gdtc, thresh); > >> } > >> > >> +unsigned long cgwb_calc_thresh(struct bdi_writeback *wb) > >> +{ > >> + struct dirty_throttle_control gdtc = { GDTC_INIT_NO_WB }; > >> + struct dirty_throttle_control mdtc = { MDTC_INIT(wb, &gdtc) }; > >> + unsigned long filepages, headroom, writeback; > >> + > >> + gdtc.avail = global_dirtyable_memory(); > >> + gdtc.dirty = global_node_page_state(NR_FILE_DIRTY) + > >> + global_node_page_state(NR_WRITEBACK); > >> + > >> + mem_cgroup_wb_stats(wb, &filepages, &headroom, > >> + &mdtc.dirty, &writeback); > >> + mdtc.dirty += writeback; > >> + mdtc_calc_avail(&mdtc, filepages, headroom); > >> + domain_dirty_limits(&mdtc); > >> + > >> + return __wb_calc_thresh(&mdtc, mdtc.thresh); > >> +} > >> + > >> /* > >> * setpoint - dirty 3 > >> * f(dirty) := 1.0 + (----------------) > >> -- > >> 2.30.0 > >> > > > > >