Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp3785743pxu; Tue, 8 Dec 2020 23:53:06 -0800 (PST) X-Google-Smtp-Source: ABdhPJzalN7ZNLkVTR9BHX8ieW43D5cX68dfJqMe5yGha9iLdR857/5CRiASLEzrKCJBybZG9JWY X-Received: by 2002:a17:906:dd3:: with SMTP id p19mr974736eji.221.1607500386097; Tue, 08 Dec 2020 23:53:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1607500386; cv=none; d=google.com; s=arc-20160816; b=gYEGrs6Xi5RtOTs2au2aMynSudzeyGuM7Rt1epTH3YBi5MDNbJw3JtFgEMbvC+ohg6 alfBS9Q3b23Ar1AylTj7A+5Zk54oHXgIEbA3bm19PWlTcX8yfjoc1ShBTCnqpCwrbdxy dNDFKtIWBy7l50viumcJsYO8D5PbE4EVyV4uc97RpOPgaNZOuJm9iEHWCPjECFrBg8AI QrCHCI5mCVObRs/41UqCD27lVVJPBDCeybn0q01Z0s/XTSmeGdZ5YiGu1DU26kDimCRI pnBVlHvPIBeazT+aDriIDjuMigrsLlgR69QBAZJm6NeQ2tL3fX761/zyMAWDax3op7P+ 5c5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:cc:to :subject; bh=G9AHpcRnjb/R68LChjfwKKEFrg196OV9x/Ch/d+aU8E=; b=uwCJS82HVmp5F342r62aaW9f/TNqQSwI4xwar1uanCplnNn/hHJ4axWLa3Dt1qOjS4 b/T0xlpnOSpwZD2y32McVVwDpDNJFCJ+R6Tm1wL0O3hqxOh0EsB+q+4KZN3p6MUq8OmK yoRByo5nvvfbXbpY0mTf56t9SD3s8S3G3nGjDGAOq0NQGK/vhK+ygbrc37972E4wmK2F wyrke5lU8X/521hrPgT2IYLqLSs7y/8rneyWC/CS3eHta0HlAr0zWLDeuv/5JocAACTf iB3qa0JGU0tcxtWcEiGnepggr6omq/pTbeLF2JEOefG5bWVAp8uxkFqHuN33yA5CXqob UyYQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=easystack.cn Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y12si320674ejw.561.2020.12.08.23.52.43; Tue, 08 Dec 2020 23:53:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=easystack.cn Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727742AbgLIHtq (ORCPT + 99 others); Wed, 9 Dec 2020 02:49:46 -0500 Received: from m97179.mail.qiye.163.com ([220.181.97.179]:56372 "EHLO m97179.mail.qiye.163.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725953AbgLIHtq (ORCPT ); Wed, 9 Dec 2020 02:49:46 -0500 Received: from [192.168.122.37] (unknown [218.94.118.90]) by m97179.mail.qiye.163.com (Hmail) with ESMTPA id 4A51EE019BC; Wed, 9 Dec 2020 15:48:55 +0800 (CST) Subject: Re: [PATCH] bcache: consider the fragmentation when update the writeback rate To: Dongdong Tao Cc: Dongdong Tao , Coly Li , Gavin Guo , Gerald Yang , Trent Lloyd , Kent Overstreet , "open list:BCACHE (BLOCK LAYER CACHE)" , open list References: <20201103124235.14440-1-tdd21151186@gmail.com> <89b83c00-1117-d114-2c23-7b03fc22966e@easystack.cn> From: Dongsheng Yang Message-ID: Date: Wed, 9 Dec 2020 15:49:26 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.5.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-HM-Spam-Status: e1kfGhgUHx5ZQUtXWQgYFAkeWUFZS1VLWVdZKFlBSUI3V1ktWUFJV1kPCR oVCBIfWUFZHxofSE5ITxlLTB4dVkpNS0xOS0tKSE5NTEJVGRETFhoSFyQUDg9ZV1kWGg8SFR0UWU FZT0tIVUpKS0hOT1VLWQY+ X-HM-Sender-Digest: e1kMHhlZQR0aFwgeV1kSHx4VD1lBWUc6Oj46EBw6Gj0wDCkNEEguOggy NQkaCzRVSlVKTUtMTktLSkhOQktMVTMWGhIXVR8UFRwIEx4VHFUCGhUcOx4aCAIIDxoYEFUYFUVZ V1kSC1lBWUlKQ1VCT1VKSkNVQktZV1kIAVlBQkJPTjcG X-HM-Tid: 0a7646797c8f20bdkuqy4a51ee019bc Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 在 2020/12/9 星期三 下午 12:48, Dongdong Tao 写道: > Hi Dongsheng, > > I'm working on it, next step I'm gathering some testing data and > upload (very sorry for the delay...) > Thanks for the comment. > One of the main concerns to alleviate this issue with the writeback > process is that we need to minimize the impact on the client IO > performance. > writeback_percent by default is 10, start writeback when dirty buckets > reached 10 percent might be a bit too aggressive, as the > writeback_cutoff_sync is 70 percent. > So i chose to start the writeback when dirty buckets reached 50 > percent so that this patch will only take effect after dirty buckets > percent is above that Agree with that's too aggressive to reuse writeback_percent, and that's less flexable. Okey, let's wait for your testing result. Thanx > > Thanks, > Dongdong > > > > > On Wed, Dec 9, 2020 at 10:27 AM Dongsheng Yang > wrote: >> >> 在 2020/11/3 星期二 下午 8:42, Dongdong Tao 写道: >>> From: dongdong tao >>> >>> Current way to calculate the writeback rate only considered the >>> dirty sectors, this usually works fine when the fragmentation >>> is not high, but it will give us unreasonable small rate when >>> we are under a situation that very few dirty sectors consumed >>> a lot dirty buckets. In some case, the dirty bucekts can reached >>> to CUTOFF_WRITEBACK_SYNC while the dirty data(sectors) noteven >>> reached the writeback_percent, the writeback rate will still >>> be the minimum value (4k), thus it will cause all the writes to be >>> stucked in a non-writeback mode because of the slow writeback. >>> >>> This patch will try to accelerate the writeback rate when the >>> fragmentation is high. It calculate the propotional_scaled value >>> based on below: >>> (dirty_sectors / writeback_rate_p_term_inverse) * fragment >>> As we can see, the higher fragmentation will result a larger >>> proportional_scaled value, thus cause a larger writeback rate. >>> The fragment value is calculated based on below: >>> (dirty_buckets * bucket_size) / dirty_sectors >>> If you think about it, the value of fragment will be always >>> inside [1, bucket_size]. >>> >>> This patch only considers the fragmentation when the number of >>> dirty_buckets reached to a dirty threshold(configurable by >>> writeback_fragment_percent, default is 50), so bcache will >>> remain the original behaviour before the dirty buckets reached >>> the threshold. >>> >>> Signed-off-by: dongdong tao >>> --- >>> drivers/md/bcache/bcache.h | 1 + >>> drivers/md/bcache/sysfs.c | 6 ++++++ >>> drivers/md/bcache/writeback.c | 21 +++++++++++++++++++++ >>> 3 files changed, 28 insertions(+) >>> >>> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h >>> index 1d57f48307e6..87632f7032b6 100644 >>> --- a/drivers/md/bcache/bcache.h >>> +++ b/drivers/md/bcache/bcache.h >>> @@ -374,6 +374,7 @@ struct cached_dev { >>> unsigned int writeback_metadata:1; >>> unsigned int writeback_running:1; >>> unsigned char writeback_percent; >>> + unsigned char writeback_fragment_percent; >>> unsigned int writeback_delay; >>> >>> uint64_t writeback_rate_target; >>> diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c >>> index 554e3afc9b68..69499113aef8 100644 >>> --- a/drivers/md/bcache/sysfs.c >>> +++ b/drivers/md/bcache/sysfs.c >>> @@ -115,6 +115,7 @@ rw_attribute(stop_when_cache_set_failed); >>> rw_attribute(writeback_metadata); >>> rw_attribute(writeback_running); >>> rw_attribute(writeback_percent); >>> +rw_attribute(writeback_fragment_percent); >> >> Hi Dongdong and Coly, >> >> What is the status about this patch? In my opinion, it is a problem >> we need to solve, >> >> but can we just reuse the parameter of writeback_percent, rather than >> introduce a new writeback_fragment_percent? >> >> That means the semantic of writeback_percent will act on dirty data >> percent and dirty bucket percent. >> >> When we found there are dirty buckets more than (c->nbuckets * >> writeback_percent), start the writeback. >> >> >> Thanx >> >> Yang >> >>> rw_attribute(writeback_delay); >>> rw_attribute(writeback_rate); >>> >>> @@ -197,6 +198,7 @@ SHOW(__bch_cached_dev) >>> var_printf(writeback_running, "%i"); >>> var_print(writeback_delay); >>> var_print(writeback_percent); >>> + var_print(writeback_fragment_percent); >>> sysfs_hprint(writeback_rate, >>> wb ? atomic_long_read(&dc->writeback_rate.rate) << 9 : 0); >>> sysfs_printf(io_errors, "%i", atomic_read(&dc->io_errors)); >>> @@ -308,6 +310,9 @@ STORE(__cached_dev) >>> sysfs_strtoul_clamp(writeback_percent, dc->writeback_percent, >>> 0, bch_cutoff_writeback); >>> >>> + sysfs_strtoul_clamp(writeback_fragment_percent, dc->writeback_fragment_percent, >>> + 0, bch_cutoff_writeback_sync); >>> + >>> if (attr == &sysfs_writeback_rate) { >>> ssize_t ret; >>> long int v = atomic_long_read(&dc->writeback_rate.rate); >>> @@ -498,6 +503,7 @@ static struct attribute *bch_cached_dev_files[] = { >>> &sysfs_writeback_running, >>> &sysfs_writeback_delay, >>> &sysfs_writeback_percent, >>> + &sysfs_writeback_fragment_percent, >>> &sysfs_writeback_rate, >>> &sysfs_writeback_rate_update_seconds, >>> &sysfs_writeback_rate_i_term_inverse, >>> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c >>> index 3c74996978da..34babc89fdf3 100644 >>> --- a/drivers/md/bcache/writeback.c >>> +++ b/drivers/md/bcache/writeback.c >>> @@ -88,6 +88,26 @@ static void __update_writeback_rate(struct cached_dev *dc) >>> int64_t integral_scaled; >>> uint32_t new_rate; >>> >>> + /* >>> + * We need to consider the number of dirty buckets as well >>> + * when calculating the proportional_scaled, Otherwise we might >>> + * have an unreasonable small writeback rate at a highly fragmented situation >>> + * when very few dirty sectors consumed a lot dirty buckets, the >>> + * worst case is when dirty_data reached writeback_percent and >>> + * dirty buckets reached to cutoff_writeback_sync, but the rate >>> + * still will be at the minimum value, which will cause the write >>> + * stuck at a non-writeback mode. >>> + */ >>> + struct cache_set *c = dc->disk.c; >>> + >>> + if (c->gc_stats.in_use > dc->writeback_fragment_percent && dirty > 0) { >>> + int64_t dirty_buckets = (c->gc_stats.in_use * c->nbuckets) / 100; >>> + int64_t fragment = (dirty_buckets * c->cache->sb.bucket_size) / dirty; >>> + >>> + proportional_scaled = >>> + div_s64(dirty, dc->writeback_rate_p_term_inverse) * (fragment); >>> + } >>> + >>> if ((error < 0 && dc->writeback_rate_integral > 0) || >>> (error > 0 && time_before64(local_clock(), >>> dc->writeback_rate.next + NSEC_PER_MSEC))) { >>> @@ -969,6 +989,7 @@ void bch_cached_dev_writeback_init(struct cached_dev *dc) >>> dc->writeback_metadata = true; >>> dc->writeback_running = false; >>> dc->writeback_percent = 10; >>> + dc->writeback_fragment_percent = 50; >>> dc->writeback_delay = 30; >>> atomic_long_set(&dc->writeback_rate.rate, 1024); >>> dc->writeback_rate_minimum = 8;