Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp3968955pxu; Mon, 21 Dec 2020 00:11:59 -0800 (PST) X-Google-Smtp-Source: ABdhPJwLTyKygt60oC+zXCLUAdLsXUZURBCRzR+/0ihJ9vG91Qqfas91sW5J93jRHGo+0Ga2s7+C X-Received: by 2002:a17:906:710b:: with SMTP id x11mr14796448ejj.433.1608538319006; Mon, 21 Dec 2020 00:11:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1608538318; cv=none; d=google.com; s=arc-20160816; b=j7i/Isy6MFFiLId4DMR4M/egL8AZjjVyqA18aV8XoX1qZRLquXpvtEyV8Y8jejpV7h NCnyca5/hYiQ97srA5kYASNq7wYb4kSL7mlFLuND1VKEhGodf0NXDv55JGeV8c2KooAu /54K56YrwWN3R4fPZJtRPItTvuWXqyrFZmZWAaYn2n+MCZng2lz+yGBIuQZGo34pdqDg x4kDSb5/q/iIBtIzhf93C7IsfnXpBVEayfMQwldmGzMHeN8Es2Ms1H6acjmUet9jCYbp AqTAiMLbgjvSSFUttbyJcg9QohBIpkKpkaiMvqt489FVwWbvAt9EqqAfk6N/E/9gfEZZ 1+kQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:subject:from :references:cc:to; bh=YG1K2hfJ9c7tAqdOYE3eBwdNeab7yPQetBhDKqJ8nMg=; b=k+syt7OzTf/oEuQJL2Xj8ThyyOTsc1lHIRpOrVMkL5CDZWBO+evk3pprv56aNwPF+0 rKLNeSgE+6FeOBbQtl3+b2KsCDwfKPrbHKnuw0Yya8DgJcp5349wgW6nu80pguwuAwj5 95YfEynBijxlEtXB6YYNFmZ8DmwUx10cXRc+FPyORrHyltAouxYW5qN/E6ga3m8NKimw oiZlNlC+gqjba+HG4XWLeLixjO/19D8qMvuRFFC+yzL8Y2g8XwojuvGLhNVUnsHx5cUh J7X1jPjXEOmpWjomaB7gkai/sxBhMF0F1xASeP52WjsP1VE8b1Kato2kjMwrZ3M4yTkL 8jSw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id k2si11315792edf.160.2020.12.21.00.11.36; Mon, 21 Dec 2020 00:11:58 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726643AbgLUIJh (ORCPT + 99 others); Mon, 21 Dec 2020 03:09:37 -0500 Received: from mx2.suse.de ([195.135.220.15]:45310 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725878AbgLUIJh (ORCPT ); Mon, 21 Dec 2020 03:09:37 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id EF413AD45; Mon, 21 Dec 2020 08:08:55 +0000 (UTC) To: Dongdong Tao Cc: Gavin Guo , Gerald Yang , Trent Lloyd , Kent Overstreet , "open list:BCACHE (BLOCK LAYER CACHE)" , open list , Dominique Poulain , Dongsheng Yang References: <20201103124235.14440-1-tdd21151186@gmail.com> <89b83c00-1117-d114-2c23-7b03fc22966e@easystack.cn> <35a038d8-fe6b-7954-f2d9-be74eb32dcdd@suse.de> From: Coly Li Subject: Re: [PATCH] bcache: consider the fragmentation when update the writeback rate Message-ID: Date: Mon, 21 Dec 2020 16:08:51 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.16; rv:78.0) Gecko/20100101 Thunderbird/78.5.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/21/20 12:06 PM, Dongdong Tao wrote: > Hi Coly, > > Thank you so much for your prompt reply! > > So, I've performed the same fio testing based on 1TB NVME and 10 TB HDD > disk as the backing device. > I've run them both for about 4 hours, since it's 1TB nvme device, it > will roughly take about 10 days to consume 50 percent dirty buckets > I did increase the iops to 500,100, but the dirty buckets only increased > to about 30 after 2 days run, and because the read is much limited by > the backing hdd deivce, so the actual maximum read iops I can constantly > get is only about 200. HI Dongdong, There are two method to make the buckets being filled faster. 1) use larger non-spinning backing device (e.g. a md raid0 with multiple SSDs). 2) specify larger read block size and small write block size in fio Or you may combine them together to fill the cache device faster. > > Though the 4 hours run on 1TB nvme bcache didn't make us hit the 50 > percent dirty bucket threshold, but I think it's still valuable to prove > that  > bcache with my patch behaves the same way as expected in terms of the > latency when the dirty bucket is under 50 percent. I guess this might be > what you wanted to confirm with this run. > Previous testing on tiny SSD size is much better than this time, I cannot provide my opinion before a non-optimized-configuration testing finished. If the latency distribution has no (recognized) difference, it will mean your patch does not improve I/O latency. I believe this is only the testing is unfinished yet. The idea to heuristically estimate bucket fragmentation condition is cool IMHO, but we need solid performance number to prove this optimization. Please continue to finish the benchmark with real hardware configuration, and I do hope we can see recoganized positive result for your goal (improve I/O latency and throughput for high bucket dirty segmentation). Thanks. Coly Li > Here is the result: > Master: > > fio-master.png > > Master + My patch: > fio-patch.png > As we can see, the latency distributions for the outliers or those > majorities are the same between these two runs。  > Let's combine those two together, and they are more clear: > fio-full.png > The test steps are exactly the same for those two runs: > > > 1. make-bcache -B -C --writeback > > 2. sudo fio --name=random-writers --filename=/dev/bcache0 > --ioengine=libaio --iodepth=1 --rw=randrw --bs=16k --direct=1 > --rate_iops=90,10 --numjobs=1 --write_lat_log=16k --runtime=14000 > > Thank you so much! > Regards, > Dongdong > > On Tue, Dec 15, 2020 at 1:07 AM Coly Li > wrote: > > On 12/14/20 11:30 PM, Dongdong Tao wrote: > > Hi Coly and Dongsheng, > > > > I've get the testing result and confirmed that this testing result is > > reproducible by repeating it many times. > > I ran fio to get the write latency log and parsed the log and then > > generated below latency graphs with some visualization tool > > > > Hi Dongdong, > > Thank you so much for the performance number! > > [snipped] > > So, my code will accelerate the writeback process when the dirty > buckets > > exceeds 50%( can be tuned), as we can see > > the cache_available_percent does increase once it hit 50, so we won't > > hit writeback cutoff issue. > > > > Below are the steps that I used to do the experiment: > > 1. make-bcache -B -C --writeback -- I prepared the nvme > > size to 1G, so it can be reproduced faster > > > > 2. sudo fio --name=random-writers --filename=/dev/bcache0 > > --ioengine=libaio --iodepth=1 --rw=randrw --bs=16k --direct=1 > > --rate_iops=90,10 --numjobs=1 --write_lat_log=16k > > > > 3. For 1 G nvme, running for about 20 minutes is enough get the data. > > 1GB cache and 20 minutes is quite limited for the performance valuation. > Could you please to do similar testing with 1TB SSD and 1 hours for each > run of the benchmark ? > > > > > Using randrw with rate_iops=90,10 is just one way to reproduce this > > easily, this can be reproduced as long as we can create a fragmented > > situation that quite few dirty data consumed a lot dirty buckets thus > > killing the write performance. > > > > Yes this is a good method to generate dirty data segments. > > > This bug nowadays is becoming very critical, as ceph is hitting > it, ceph > > mostly submitting random small IO. > > Please let me know what you need in order to move forward in this > > direction, I'm sure this patch can be improved also. > > The performance number is quite convinced and the idea in your patch is > promising. > > I will provide my comments on your patch after we see the performance > number for larger cache device and longer run time. > > Thanks again for the detailed performance number, which is really > desired for performance optimization changes :-) > > Coly Li >