Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753995Ab0DTIdL (ORCPT ); Tue, 20 Apr 2010 04:33:11 -0400 Received: from mail.copilotco.com ([216.105.40.123]:47238 "EHLO mail.copilotco.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753346Ab0DTIdI (ORCPT ); Tue, 20 Apr 2010 04:33:08 -0400 X-Greylist: delayed 1386 seconds by postgrey-1.27 at vger.kernel.org; Tue, 20 Apr 2010 04:33:08 EDT Date: Tue, 20 Apr 2010 01:09:58 -0700 From: Tracy Reed To: xen-devel@lists.xensource.com Cc: Aoetools-discuss@lists.sourceforge.net, linux-kernel@vger.kernel.org Subject: domU is causing misaligned disk writes Message-ID: <20100420080958.GN5660@tracyreed.org> Mail-Followup-To: Tracy Reed , xen-devel@lists.xensource.com, Aoetools-discuss@lists.sourceforge.net, linux-kernel@vger.kernel.org MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="1VhaNAE4Hwu38Ubd" Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4205 Lines: 106 --1VhaNAE4Hwu38Ubd Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Anyone know why my xen xvda devices would be doing (apparently) unaligned writes to my SAN causing horrible performance and massive seeking and lots of reading for page cache backfill? BUT writing to the device in the dom0 is very fast and causes no extra reads? I am running the 2.6.18-164.11.1.el5xen xen/kernel which came with CentOS 5.4 After spending a lot of time banging my head on this I seem to have finally tracked it down to a difference between domU and dom0. I never would have thought it would be this but it is extremely reproduceable. We're talking a difference of 4-5x in write speed. Reads are equally fast everywhere. I am using AoE v72 kernel module (initiator) on a Dell R610's to talk to vblade-19 (target) on Dell R710's all running CentOS 5.4. I have striped two 7200 RPM SATA disks and exported the md with AoE (although I have done these tests with individual disks also). Read performance is excellent: # dd of=3D/dev/null if=3D/dev/xvdg1 bs=3D4096 count=3D3000000 3000000+0 records in 3000000+0 records out 12288000000 bytes (12 GB) copied, 106.749 seconds, 115 MB/s I dropped the cache with: echo 1 > /proc/sys/vm/drop_caches on both target and initiator before starting the test. This is great for just a single gig-e link. This suggests that the network is fine. However, write performance is odious. Typically around 20MB/s. It should be more like 70MB/s per disk or better (7200rpm SATA) and max out my gig-e with write performance similar to the above read performance. I mentioned above that these are unaligned writes because when running iostat on the target machine I can see lots of reads happening which are surely causing seeks and killing performance. Typical is something like 8MB/s of reads while doing 16MB/s of writes. HOWEVER, if I do the writes from the dom0 the performance is excellent: # dd if=3D/dev/zero of=3D/dev/etherd/e6.2 bs=3D4096 count=3D3000000 3000000+0 records in 3000000+0 records out 12288000000 bytes (12 GB) copied, 104.679 seconds, 117 MB/s And I see no reads happening on the disks being written to in iostat. Purely streaming writes at high speeds. I have had AoE working very well with Xen previously although not with this particular hardware/xen/aoe version. Also it occurs to me that in the past when I have done this I network booted the domU's and they got root over AoE using a complicated initrd that I cooked up. In the last year or so I decided that it was too complicated and went to booting my dom0's from compact flash with the AoE driver in the dom0 instead of the domU. I now handing the domU xvd's from the AoE driver in dom0. I strongly suspect that this is why things worked great before but stink now. Unfortunately I don't have a working network boot initrd setup like I used to and although I still have all of the code etc. it would take a while to set up. I don't want to run that setup in production anymore anyway if I can help it. I have tried manually aligning the disk by setting the beginning of data on the partition from 63 to 64 (although this is usually done for RAID alignment) and I have tried changing the disk geometry to account for the extra partition table which causes a half-block page-cache misalignment as described by the ever insightful Kelsey Hudson in his writeup on the issue here: http://copilotco.com/Virtualization/wiki/aoe-caching-alignment.pdf/at_downl= oad/file All to no avail. What am I missing here? Why is domU apparently fudging my writes? --=20 Tracy Reed http://tracyreed.org --1VhaNAE4Hwu38Ubd Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD4DBQFLzWFW9PIYKZYVAq0RAvkrAJsGICa+vuJYjohNm8c72y9OC4I4bgCYpF5O uhC3asn31XF2iNKxIGsLOA== =H0nb -----END PGP SIGNATURE----- --1VhaNAE4Hwu38Ubd-- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/