Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754829AbaAHBR0 (ORCPT ); Tue, 7 Jan 2014 20:17:26 -0500 Received: from cantor2.suse.de ([195.135.220.15]:44123 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754654AbaAHBRS (ORCPT ); Tue, 7 Jan 2014 20:17:18 -0500 Date: Wed, 8 Jan 2014 02:17:13 +0100 From: Jan Kara To: Christoph Hellwig Cc: Jan Kara , Sergey Meirovich , linux-scsi , Linux Kernel Mailing List , Gluk Subject: Re: Terrible performance of sequential O_DIRECT 4k writes in SAN environment. ~3 times slower then Solars 10 with the same HBA/Storage. Message-ID: <20140108011713.GA5212@quack.suse.cz> References: <20140106201032.GA13491@quack.suse.cz> <20140107155830.GA28395@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140107155830.GA28395@infradead.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 07-01-14 07:58:30, Christoph Hellwig wrote: > On Mon, Jan 06, 2014 at 09:10:32PM +0100, Jan Kara wrote: > > This is likely a problem of Linux direct IO implementation. The thing is > > that in Linux when you are doing appending direct IO (i.e., direct IO which > > changes file size), the IO is performed synchronously so that we have our > > life simpler with inode size update etc. (and frankly our current locking > > rules make inode size update on IO completion almost impossible). Since > > appending direct IO isn't very common, we seem to get away with this > > simplification just fine... > > Shouldn't be too much of a problem at least for XFS and maybe even ext4 > with the workqueue based I/O end handler. For XFS we protect size > updates by the ilock which we already taken in that handler, not sure > what ext4 would do there. Well, I was specifically worried about i_mutex locking. In particular: Before we report appending IO completion we need to update i_size. To update i_size we need to grab i_mutex. Now this is unpleasant because inode_dio_wait() happens under i_mutex so the above would create lock inversion. And we cannot really do inode_dio_done() before grabbing i_mutex as that would open interesting races between truncate decreasing i_size and DIO increasing it. Honza -- Jan Kara SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/