Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759032AbYFSRnz (ORCPT ); Thu, 19 Jun 2008 13:43:55 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752919AbYFSRnq (ORCPT ); Thu, 19 Jun 2008 13:43:46 -0400 Received: from www.church-of-our-saviour.ORG ([69.25.196.31]:48678 "EHLO thunker.thunk.org" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751828AbYFSRnq (ORCPT ); Thu, 19 Jun 2008 13:43:46 -0400 Date: Thu, 19 Jun 2008 13:42:11 -0400 From: Theodore Tso To: Eric Sandeen Cc: Holger Kiehl , "Aneesh Kumar K.V" , Jan Kara , Solofo.Ramangalahy@bull.net, Nick Dokos , linux-ext4@vger.kernel.org, linux-kernel Subject: Re: Performance of ext4 Message-ID: <20080619174211.GB9119@mit.edu> Mail-Followup-To: Theodore Tso , Eric Sandeen , Holger Kiehl , "Aneesh Kumar K.V" , Jan Kara , Solofo.Ramangalahy@bull.net, Nick Dokos , linux-ext4@vger.kernel.org, linux-kernel References: <20080612131928.GB18229@mit.edu> <20080612180605.GD22481@skywalker> <20080616175408.GF3279@atrey.karlin.mff.cuni.cz> <20080616181353.GA20686@skywalker> <20080619155645.GA8582@mit.edu> <485A8C2D.1090806@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <485A8C2D.1090806@redhat.com> User-Agent: Mutt/1.5.17+20080114 (2008-01-14) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: tytso@mit.edu X-SA-Exim-Scanned: No (on thunker.thunk.org); SAEximRunCond expanded to false Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1694 Lines: 33 On Thu, Jun 19, 2008 at 11:41:17AM -0500, Eric Sandeen wrote: > > It might be worth runninga "simple" fsx under your kernel too; last time > I tested fsx it was still happy and it exercises fs ops (including > truncate) at random... > >From what Holger described, it's doubtful that the bug is in the truncate operation. It sounds like i_size is actually dropping in size at some pointer long after the file was written. If I had to guess the value in the inode cache is correct; and perhaps so is the value on the journal. But somehow, the wrong value is getting written to disk (remember the jbd layer can keep up to three different versions of filesystem metadata in memory, because most of the time we don't block modifications to the filesystem while we are in the middle of writing a previous commit to disk). So depending on whether the inode gets redirtied or not, the inconsistency could self-heal, and if the inode never gets pushed out of memory due to memory pressure, the problem might not be noticed until the system reboots or the filesystem is unmounted. This is one of the reasons why I'm a bit suspicious that the problem may lie in the delayed allocation code; changing i_size without first starting a transaction could lead to this sort of problem, for example, and the delayed allocation could represent a different code path where file blocks get allocated and i_size gets changed. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/