Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756519AbaAJKgs (ORCPT ); Fri, 10 Jan 2014 05:36:48 -0500 Received: from mail-wg0-f51.google.com ([74.125.82.51]:48406 "EHLO mail-wg0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752322AbaAJKgo (ORCPT ); Fri, 10 Jan 2014 05:36:44 -0500 MIME-Version: 1.0 In-Reply-To: <20140110093623.GD26378@quack.suse.cz> References: <20140106201032.GA13491@quack.suse.cz> <20140107155830.GA28395@infradead.org> <20140108140307.GA588@infradead.org> <20140108152610.GA5863@infradead.org> <20140108205524.GA15313@quack.suse.cz> <20140110093623.GD26378@quack.suse.cz> From: Sergey Meirovich Date: Fri, 10 Jan 2014 12:36:22 +0200 Message-ID: Subject: Re: Terrible performance of sequential O_DIRECT 4k writes in SAN environment. ~3 times slower then Solars 10 with the same HBA/Storage. To: Jan Kara Cc: Christoph Hellwig , linux-scsi , Linux Kernel Mailing List , Gluk Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Jan, On 10 January 2014 11:36, Jan Kara wrote: > On Thu 09-01-14 12:11:16, Sergey Meirovich wrote: ... >> I've done preallocation on fnic/XtremIO as Christoph suggested. >> >> [root@dca-poc-gtsxdb3 mnt]# sysbench --max-requests=0 >> --file-extra-flags=direct --test=fileio --num-threads=4 >> --file-total-size=10G --file-io-mode=async --file-async-backlog=1024 >> --file-rw-ratio=1 --file-fsync-freq=0 --max-requests=0 >> --file-test-mode=seqwr --max-time=100 --file-block-size=4K prepare >> sysbench 0.4.12: multi-threaded system evaluation benchmark >> >> 128 files, 81920Kb each, 10240Mb total >> Creating files for the test... >> [root@dca-poc-gtsxdb3 mnt]# du -k test_file.* | awk '{print $1}' |sort |uniq >> 81920 >> [root@dca-poc-gtsxdb3 mnt]# fallocate -l 81920k test_file.* >> >> Results: 13.042Mb/sec 3338.73 Requests/sec >> >> Probably sysbench is still triggering append DIO scenario. Will say >> simple wrapper over io_submit() against already preallocated (and even >> filled with data) file provide much better throughput if your theory >> is valid? > So I was experimenting a bit. "sysbench prepare" seems to always do > synchronous IO from a single thread in the 'prepare' phase regardless of > the arguments. So there the reported throughput isn't really relevant. > > In the 'run' phase it obeys the arguments and indeed when I run fallocate > to preallocate files during 'run' phase, it significantly helps the > throughput (from 20 MB/s to 55 MB/s on my SATA drive). Sorry, Jan. Seems that I presented my findings in a previous mail in ambiguous style . I know that prepare phase of sysbench is synchronous/probably buffered (because I saw 512k chunks sent down to HBA)? IO. I played with blocktrace and have seen that myself during prepare: [root@dca-poc-gtsxdb3 mnt]# sysbench --max-requests=0 --file-extra-flags=direct --test=fileio --num-threads=4 --file-total-size=10G --file-io-mode=async --file-async-backlog=1024 --file-rw-ratio=1 --file-fsync-freq=0 --max-requests=0 --file-test-mode=seqwr --max-time=100 --file-block-size=4K prepare ... Leads to: [root@dca-poc-gtsxdb3 mnt]# blktrace -d /dev/sdg -o - | blkparse -i - | grep 'D W' 8,96 14 604 53.129805520 28114 D WS 1116160 + 1024 [sysbench] 8,96 14 607 53.129843345 28114 D WS 1120256 + 1024 [sysbench] 8,96 14 610 53.129873782 28114 D WS 1124352 + 1024 [sysbench] 8,96 14 613 53.129903703 28114 D WS 1128448 + 1024 [sysbench] 8,96 14 616 53.130957213 28114 D WS 1132544 + 1024 [sysbench] 8,96 14 619 53.130988835 28114 D WS 1136640 + 1024 [sysbench] 8,96 14 622 53.131018854 28114 D WS 1140736 + 1024 [sysbench] ... That result "13.042Mb/sec 3338.73 Requests/sec" was from run phase and before it fallocate had been made. blktrace from run phase looks very different. 4k as expected. [root@dca-poc-gtsxdb3 ~]# blktrace -d /dev/sdg -o - | blkparse -i - | grep 'D W' 8,96 5 3 0.000001874 28212 D WS 1847296 + 8 [sysbench] 8,96 5 7 0.001213728 28212 D WS 1847304 + 8 [sysbench] 8,96 5 11 0.002779304 28212 D WS 1847312 + 8 [sysbench] 8,96 5 15 0.004486445 28212 D WS 1847320 + 8 [sysbench] 8,96 5 19 0.006012133 28212 D WS 22691864 + 8 [sysbench] 8,96 5 23 0.007781553 28212 D WS 22691896 + 8 [sysbench] 8,96 5 27 0.009043404 28212 D WS 22691928 + 8 [sysbench] 8,96 5 31 0.010546829 28212 D WS 22691960 + 8 [sysbench] 8,96 5 35 0.012214468 28212 D WS 22691992 + 8 [sysbench] 8,96 5 39 0.013792616 28212 D WS 22692024 + 8 [sysbench] ... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/