2009-06-26 02:18:14

by Nick Bowler

[permalink] [raw]
Subject: ext3: massive latencies for some write operations

I wasn't entirely sure which list to post this on, this one seemed to
fit best. Apologies if there was a better place.

Please CC any replies to me me as I am not subscribed to the list.

Recently, I've been experiencing enormous (more than a minute)
latencies during some write operations to my /home filesystem. During
such a time, many programs tend to completely stop responding,
although programs doing continuous reads (e.g. music or video players)
appear to be unaffected.

The filesystem is a 3TB ext3 filesystem on a software raid-5 array (4x
1TB drives). Dmesg is completely silent.

I'm on an amd64 running 2.6.30. However, I had these issues on 2.6.26
and 2.6.29 as well. Latencytop output follows.

LatencyTOP version 0.4 (C) 2008 Intel Corporation

Cause Maximum Percentage
start_this_handle journal_start ext3_journal_start68746.9 msec 44.3 %
start_this_handle journal_start ext3_journal_start59219.1 msec 38.1 %
sync_page sync_page_killable __lock_page_killable 448.1 msec 2.3 %
sync_buffer __wait_on_buffer ext4_bread ext4_find_445.0 msec 1.6 %
sync_page __lock_page find_lock_page filemap_fault 84.5 msec 0.4 %
do_get_write_access journal_get_write_access __ext 78.2 msec 0.1 %
do_get_write_access journal_get_write_access __ext 70.0 msec 0.0 %
md_write_start make_request md_make_request generi 66.9 msec 0.6 %
sync_buffer __wait_on_buffer ext4_bread ext4_find_ 61.1 msec 0.2 %


Process amuled (19428) Total: 69182.1 msec
start_this_handle journal_start ext3_journal_start68746.9 msec 99.4 %do_
sync_page __lock_page find_lock_page filemap_fault 84.5 msec 0.3 %lt do
sync_page sync_page_killable __lock_page_killable 20.9 msec 0.0 %sync_
sync_buffer __wait_on_buffer __ext3_get_inode_loc 15.8 msec 0.0 %looku
sync_buffer __wait_on_buffer __bread ext3_get_bran 11.4 msec 0.0 % ext3
hrtimer_nanosleep sys_nanosleep system_call_fastpa 4.9 msec 0.1 %_read


2009-06-26 04:54:08

by Andreas Dilger

[permalink] [raw]
Subject: Re: ext3: massive latencies for some write operations

On Jun 25, 2009 22:18 -0400, Nick Bowler wrote:
> I wasn't entirely sure which list to post this on, this one seemed to
> fit best. Apologies if there was a better place.
>
> Please CC any replies to me me as I am not subscribed to the list.

This is a relatively well-known problem with ext3.

> Recently, I've been experiencing enormous (more than a minute)
> latencies during some write operations to my /home filesystem. During
> such a time, many programs tend to completely stop responding,
> although programs doing continuous reads (e.g. music or video players)
> appear to be unaffected.
>
> The filesystem is a 3TB ext3 filesystem on a software raid-5 array (4x
> 1TB drives). Dmesg is completely silent.
>
> I'm on an amd64 running 2.6.30. However, I had these issues on 2.6.26
> and 2.6.29 as well. Latencytop output follows.
>
> LatencyTOP version 0.4 (C) 2008 Intel Corporation
>
> Process amuled (19428) Total: 69182.1 msec
> start_this_handle journal_start ext3_journal_start68746.9 msec 99.4 %
> sync_page __lock_page find_lock_page filemap_fault 84.5 msec 0.3 %
> sync_page sync_page_killable __lock_page_killable 20.9 msec 0.0 %

You have some application that is doing a lot of "fsync" operations,
but because there is likely also a lot of other IO going on the
fsync operations are taking a long time to sync the journal.

If you use ext4 this problem should go away. I would recommend the
FC11 2.6.29 kernel, since it has all of the latest ext4 fixes in it.

If you wanted to throw some hardware at the problem, adding an SSD device
for the journal should also solve the problem for ext3.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2009-06-27 21:07:54

by Nick Bowler

[permalink] [raw]
Subject: Re: ext3: massive latencies for some write operations

On 6/26/09, Andreas Dilger <[email protected]> wrote:
> This is a relatively well-known problem with ext3.

<snip>

> If you use ext4 this problem should go away. I would recommend the
> FC11 2.6.29 kernel, since it has all of the latest ext4 fixes in it.

Okay, I have switched to ext4 (although with 2.6.30). Hopefully I
won't have any major issues, thanks!