Subject: Re: [PATCHSET v2][RFC] Make background writeback not suck
To: <linux-kernel@vger.kernel.org>, <linux-fsdevel@vger.kernel.org>,
        <linux-block@vger.kernel.org>
References: <1458746750-9213-1-git-send-email-axboe@fb.com>
 <56F2B8B5.10106@fb.com>
From: Jens Axboe <axboe@fb.com>
Message-ID: <56F426FB.2010002@fb.com>
Date: Thu, 24 Mar 2016 11:42:19 -0600
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
 Thunderbird/38.6.0
MIME-Version: 1.0
In-Reply-To: <56F2B8B5.10106@fb.com>
Content-Type: text/plain; charset="windows-1252"; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2593
Lines: 102

On 03/23/2016 09:39 AM, Jens Axboe wrote:
> Hi,
>
> Apparently I dropped the subject on this one, it's of course v2 of the
> writeback not sucking patchset...

Some test result. I've run a lot of them, on various types of storage, 
and performance seems good with the default settings.

This case reads in a file and writes it to stdout. It targets a certain 
latency for the reads - by default it's 10ms. If a read isn't done my 
10ms, it'll queue the next read. This avoids the coordinated omission 
problem, where one long latency is in fact many of them, you just don't 
knows since you don't issue more while one is stuck.

The test case reads a compressed file, and writes it over a pipe to gzip 
to decompress it. The input file is around 9G, uncompresses to 20G. At 
the end of the run, latency results are shown. Every time the target 
latency is exceeded during the run, it's output.

To keep the system busy, 75% (24G) of the memory is taking up by CPU 
hogs. This is intended to make the case worse for the throttled depth, 
as Dave pointed out.

Out-of-the-box results:

# time (./read-to-pipe-async -f randfile.gz | gzip -dc > outfile; sync)
read latency=11790 usec
read latency=82697 usec
[...]
Latency percentiles (usec) (READERS)
	50.0000th: 4
	75.0000th: 5
	90.0000th: 6
	95.0000th: 7
	99.0000th: 54
	99.5000th: 64
	99.9000th: 334
	99.9900th: 17952
	99.9990th: 101504
	99.9999th: 203520
	Over=333, min=0, max=215367
Latency percentiles (usec) (WRITERS)
	50.0000th: 3
	75.0000th: 5
	90.0000th: 454
	95.0000th: 473
	99.0000th: 615
	99.5000th: 625
	99.9000th: 815
	99.9900th: 1142
	99.9990th: 2244
	99.9999th: 10032
	Over=3, min=0, max=10811
Read rate (KB/sec) : 88988
Write rate (KB/sec): 60019

real	2m38.701s
user	2m33.030s
sys	1m31.540s

215ms worst case latency, 333 cases of being above the 10ms target. And 
with the patchset applied:

# time (./read-to-pipe-async -f randfile.gz | gzip -dc > outfile; sync)
write latency=15394 usec
[...]
Latency percentiles (usec) (READERS)
	50.0000th: 4
	75.0000th: 5
	90.0000th: 6
	95.0000th: 8
	99.0000th: 55
	99.5000th: 64
	99.9000th: 338
	99.9900th: 2652
	99.9990th: 3964
	99.9999th: 7464
	Over=1, min=0, max=10221
Latency percentiles (usec) (WRITERS)
	50.0000th: 4
	75.0000th: 5
	90.0000th: 450
	95.0000th: 471
	99.0000th: 611
	99.5000th: 623
	99.9000th: 703
	99.9900th: 1106
	99.9990th: 2010
	99.9999th: 10448
	Over=6, min=1, max=15394
Read rate (KB/sec) : 95506
Write rate (KB/sec): 59970

real	2m39.014s
user	2m33.800s
sys	1m35.210s

I won't bore you with vmstat output, it's pretty messy for the default case.

-- 
Jens Axboe