Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753052AbbGAIPp (ORCPT ); Wed, 1 Jul 2015 04:15:45 -0400 Received: from cantor2.suse.de ([195.135.220.15]:50369 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751382AbbGAIPf (ORCPT ); Wed, 1 Jul 2015 04:15:35 -0400 Date: Wed, 1 Jul 2015 10:15:28 +0200 From: Jan Kara To: Tejun Heo Cc: axboe@kernel.dk, linux-kernel@vger.kernel.org, jack@suse.cz, hch@infradead.org, hannes@cmpxchg.org, linux-fsdevel@vger.kernel.org, vgoyal@redhat.com, lizefan@huawei.com, cgroups@vger.kernel.org, linux-mm@kvack.org, mhocko@suse.cz, clm@fb.com, fengguang.wu@intel.com, david@fromorbit.com, gthelen@google.com, khlebnikov@yandex-team.ru Subject: Re: [PATCH 41/51] writeback: make wakeup_flusher_threads() handle multiple bdi_writeback's Message-ID: <20150701081528.GB7252@quack.suse.cz> References: <1432329245-5844-1-git-send-email-tj@kernel.org> <1432329245-5844-42-git-send-email-tj@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1432329245-5844-42-git-send-email-tj@kernel.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3528 Lines: 105 On Fri 22-05-15 17:13:55, Tejun Heo wrote: > wakeup_flusher_threads() currently only starts writeback on the root > wb (bdi_writeback). For cgroup writeback support, update the function > to wake up all wbs and distribute the number of pages to write > according to the proportion of each wb's write bandwidth, which is > implemented in wb_split_bdi_pages(). > > Signed-off-by: Tejun Heo > Cc: Jens Axboe > Cc: Jan Kara I was looking at who uses wakeup_flusher_threads(). There are two usecases: 1) sync() - we want to writeback everything 2) We want to relieve memory pressure by cleaning and subsequently reclaiming pages. Neither of these cares about number of pages too much if you write enough. So similarly as we don't split the passed nr_pages argument among bdis, I wouldn't split the nr_pages among wbs. Just pass the nr_pages to each wb and be done with that... Honza > --- > fs/fs-writeback.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 46 insertions(+), 2 deletions(-) > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > index 92aaf64..508e10c 100644 > --- a/fs/fs-writeback.c > +++ b/fs/fs-writeback.c > @@ -198,6 +198,41 @@ int inode_congested(struct inode *inode, int cong_bits) > } > EXPORT_SYMBOL_GPL(inode_congested); > > +/** > + * wb_split_bdi_pages - split nr_pages to write according to bandwidth > + * @wb: target bdi_writeback to split @nr_pages to > + * @nr_pages: number of pages to write for the whole bdi > + * > + * Split @wb's portion of @nr_pages according to @wb's write bandwidth in > + * relation to the total write bandwidth of all wb's w/ dirty inodes on > + * @wb->bdi. > + */ > +static long wb_split_bdi_pages(struct bdi_writeback *wb, long nr_pages) > +{ > + unsigned long this_bw = wb->avg_write_bandwidth; > + unsigned long tot_bw = atomic_long_read(&wb->bdi->tot_write_bandwidth); > + > + if (nr_pages == LONG_MAX) > + return LONG_MAX; > + > + /* > + * This may be called on clean wb's and proportional distribution > + * may not make sense, just use the original @nr_pages in those > + * cases. In general, we wanna err on the side of writing more. > + */ > + if (!tot_bw || this_bw >= tot_bw) > + return nr_pages; > + else > + return DIV_ROUND_UP_ULL((u64)nr_pages * this_bw, tot_bw); > +} > + > +#else /* CONFIG_CGROUP_WRITEBACK */ > + > +static long wb_split_bdi_pages(struct bdi_writeback *wb, long nr_pages) > +{ > + return nr_pages; > +} > + > #endif /* CONFIG_CGROUP_WRITEBACK */ > > void wb_start_writeback(struct bdi_writeback *wb, long nr_pages, > @@ -1187,8 +1222,17 @@ void wakeup_flusher_threads(long nr_pages, enum wb_reason reason) > nr_pages = get_nr_dirty_pages(); > > rcu_read_lock(); > - list_for_each_entry_rcu(bdi, &bdi_list, bdi_list) > - wb_start_writeback(&bdi->wb, nr_pages, false, reason); > + list_for_each_entry_rcu(bdi, &bdi_list, bdi_list) { > + struct bdi_writeback *wb; > + struct wb_iter iter; > + > + if (!bdi_has_dirty_io(bdi)) > + continue; > + > + bdi_for_each_wb(wb, bdi, &iter, 0) > + wb_start_writeback(wb, wb_split_bdi_pages(wb, nr_pages), > + false, reason); > + } > rcu_read_unlock(); > } > > -- > 2.4.0 > -- Jan Kara SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/