Date: Wed, 30 Jan 2008 17:32:31 -0700
From: Andreas Dilger <adilger@Sun.COM>
Subject: Re: [RFC] ext3: per-process soft-syncing data=ordered mode
In-reply-to: <200801300929.21778.chris.mason@oracle.com>
To: Chris Mason <chris.mason@oracle.com>
Cc: Al Boldi <a1426z@gawab.com>, Jan Kara <jack@suse.cz>,
       Chris Snook <csnook@redhat.com>, linux-fsdevel@vger.kernel.org,
       linux-kernel@vger.kernel.org
Message-id: <20080131003231.GK23836@webber.adilger.int>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7BIT
Content-disposition: inline
References: <200801242336.00340.a1426z@gawab.com>
 <20080129172232.GA9770@atrey.karlin.mff.cuni.cz>
 <200801300904.48299.a1426z@gawab.com>
 <200801300929.21778.chris.mason@oracle.com>
User-Agent: Mutt/1.5.17 (2007-11-01)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1557
Lines: 34

On Wednesday 30 January 2008, Al Boldi wrote:
> And, a quick test of successive 1sec delayed syncs shows no hangs until
> about 1 minute (~180mb) of db-writeout activity, when the sync abruptly
> hangs for minutes on end, and io-wait shows almost 100%.

How large is the journal in this filesystem?  You can check via
"debugfs -R 'stat <8>' /dev/XXX".  Is this affected by increasing
the journal size?  You can set the journal size via "mke2fs -J size=400" 
at format time, or on an unmounted filesystem by running
"tune2fs -O ^has_journal /dev/XXX" then "tune2fs -J size=400 /dev/XXX".

I suspect that the stall is caused by the journal filling up, and then
waiting while the entire journal is checkpointed back to the filesystem
before the next transaction can start.

It is possible to improve this behaviour in JBD by reducing the amount
of space that is cleared if the journal becomes "full", and also doing
journal checkpointing before it becomes full.  While that may reduce
performance a small amount, it would help avoid such huge latency problems.
I believe we have such a patch in one of the Lustre branches already,
and while I'm not sure what kernel it is for the JBD code rarely changes
much....

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/