Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753656AbbFJVqV (ORCPT ); Wed, 10 Jun 2015 17:46:21 -0400 Received: from mx1.redhat.com ([209.132.183.28]:60087 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751206AbbFJVqO (ORCPT ); Wed, 10 Jun 2015 17:46:14 -0400 Date: Wed, 10 Jun 2015 17:46:11 -0400 From: Mike Snitzer To: Ming Lin Cc: Ming Lei , dm-devel@redhat.com, Christoph Hellwig , Alasdair G Kergon , Lars Ellenberg , Philip Kelleher , Joshua Morris , Christoph Hellwig , Kent Overstreet , Nitin Gupta , Oleg Drokin , Al Viro , Jens Axboe , Andreas Dilger , Geoff Levand , Jiri Kosina , lkml , Jim Paris , Minchan Kim , Dongsu Park , drbd-user@lists.linbit.com Subject: Re: [PATCH v4 01/11] block: make generic_make_request handle arbitrarily sized bios Message-ID: <20150610214611.GA744@redhat.com> References: <20150526143626.GA4315@redhat.com> <20150526160400.GB4715@redhat.com> <20150528003627.GD32216@agk-dp.fab.redhat.com> <1433138551.11778.4.camel@hasee> <20150604210617.GA23710@redhat.com> <1433830169.1197.6.camel@hasee> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6328 Lines: 160 On Wed, Jun 10 2015 at 5:20pm -0400, Ming Lin wrote: > On Mon, Jun 8, 2015 at 11:09 PM, Ming Lin wrote: > > On Thu, 2015-06-04 at 17:06 -0400, Mike Snitzer wrote: > >> We need to test on large HW raid setups like a Netapp filer (or even > >> local SAS drives connected via some SAS controller). Like a 8+2 drive > >> RAID6 or 8+1 RAID5 setup. Testing with MD raid on JBOD setups with 8 > >> devices is also useful. It is larger RAID setups that will be more > >> sensitive to IO sizes being properly aligned on RAID stripe and/or chunk > >> size boundaries. > > > > Here are tests results of xfs/ext4/btrfs read/write on HW RAID6/MD RAID6/DM stripe target. > > Each case run 0.5 hour, so it took 36 hours to finish all the tests on 4.1-rc4 and 4.1-rc4-patched kernels. > > > > No performance regressions were introduced. > > > > Test server: Dell R730xd(2 sockets/48 logical cpus/264G memory) > > HW RAID6/MD RAID6/DM stripe target were configured with 10 HDDs, each 280G > > Stripe size 64k and 128k were tested. > > > > devs="/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk" > > spare_devs="/dev/sdl /dev/sdm" > > stripe_size=64 (or 128) > > > > MD RAID6 was created by: > > mdadm --create --verbose /dev/md0 --level=6 --raid-devices=10 $devs --spare-devices=2 $spare_devs -c $stripe_size > > > > DM stripe target was created by: > > pvcreate $devs > > vgcreate striped_vol_group $devs > > lvcreate -i10 -I${stripe_size} -L2T -nstriped_logical_volume striped_vol_group DM had a regression relative to merge_bvec that wasn't fixed until recently (it wasn't in 4.1-rc4), see commit 1c220c69ce0 ("dm: fix casting bug in dm_merge_bvec()"). It was introduced in 4.1. So your 4.1-rc4 DM stripe testing may have effectively been with merge_bvec disabled. > > Here is an example of fio script for stripe size 128k: > > [global] > > ioengine=libaio > > iodepth=64 > > direct=1 > > runtime=1800 > > time_based > > group_reporting > > numjobs=48 > > gtod_reduce=0 > > norandommap > > write_iops_log=fs > > > > [job1] > > bs=1280K > > directory=/mnt > > size=5G > > rw=read > > > > All results here: http://minggr.net/pub/20150608/fio_results/ > > > > Results summary: > > > > 1. HW RAID6: stripe size 64k > > 4.1-rc4 4.1-rc4-patched > > ------- --------------- > > (MB/s) (MB/s) > > xfs read: 821.23 812.20 -1.09% > > xfs write: 753.16 754.42 +0.16% > > ext4 read: 827.80 834.82 +0.84% > > ext4 write: 783.08 777.58 -0.70% > > btrfs read: 859.26 871.68 +1.44% > > btrfs write: 815.63 844.40 +3.52% > > > > 2. HW RAID6: stripe size 128k > > 4.1-rc4 4.1-rc4-patched > > ------- --------------- > > (MB/s) (MB/s) > > xfs read: 948.27 979.11 +3.25% > > xfs write: 820.78 819.94 -0.10% > > ext4 read: 978.35 997.92 +2.00% > > ext4 write: 853.51 847.97 -0.64% > > btrfs read: 1013.1 1015.6 +0.24% > > btrfs write: 854.43 850.42 -0.46% > > > > 3. MD RAID6: stripe size 64k > > 4.1-rc4 4.1-rc4-patched > > ------- --------------- > > (MB/s) (MB/s) > > xfs read: 847.34 869.43 +2.60% > > xfs write: 198.67 199.03 +0.18% > > ext4 read: 763.89 767.79 +0.51% > > ext4 write: 281.44 282.83 +0.49% > > btrfs read: 756.02 743.69 -1.63% > > btrfs write: 268.37 265.93 -0.90% > > > > 4. MD RAID6: stripe size 128k > > 4.1-rc4 4.1-rc4-patched > > ------- --------------- > > (MB/s) (MB/s) > > xfs read: 993.04 1014.1 +2.12% > > xfs write: 293.06 298.95 +2.00% > > ext4 read: 1019.6 1020.9 +0.12% > > ext4 write: 371.51 371.47 -0.01% > > btrfs read: 1000.4 1020.8 +2.03% > > btrfs write: 241.08 246.77 +2.36% > > > > 5. DM: stripe size 64k > > 4.1-rc4 4.1-rc4-patched > > ------- --------------- > > (MB/s) (MB/s) > > xfs read: 1084.4 1080.1 -0.39% > > xfs write: 1071.1 1063.4 -0.71% > > ext4 read: 991.54 1003.7 +1.22% > > ext4 write: 1069.7 1052.2 -1.63% > > btrfs read: 1076.1 1082.1 +0.55% > > btrfs write: 968.98 965.07 -0.40% > > > > 6. DM: stripe size 128k > > 4.1-rc4 4.1-rc4-patched > > ------- --------------- > > (MB/s) (MB/s) > > xfs read: 1020.4 1066.1 +4.47% > > xfs write: 1058.2 1066.6 +0.79% > > ext4 read: 990.72 988.19 -0.25% > > ext4 write: 1050.4 1070.2 +1.88% > > btrfs read: 1080.9 1074.7 -0.57% > > btrfs write: 975.10 972.76 -0.23% > > Hi Mike, > > How about these numbers? Looks fairly good. I just am not sure the workload is going to test the code paths in question like we'd hope. I'll have to set aside some time to think through scenarios to test. My concern still remains that at some point it the future we'll regret not having merge_bvec but it'll be too late. That is just my own FUD at this point... > I'm also happy to run other fio jobs your team used. I've been busy getting DM changes for the 4.2 merge window finalized. As such I haven't connected with others on the team to discuss this issue. I'll see if we can make time in the next 2 days. But I also have RHEL-specific kernel deadlines I'm coming up against. Seems late to be staging this extensive a change for 4.2... are you pushing for this code to land in the 4.2 merge window? Or do we have time to work this further and target the 4.3 merge? Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/