LinuxLists.cc - ext4 performance regression 2.6.27-stable versus 2.6.32 and later

2010-07-28 20:01:47

Subject: ext4 performance regression 2.6.27-stable versus 2.6.32 and later

Dear all,

we reproducibly find significantly worse ext4 performance when our
fileservers run 2.6.32 or later kernels, when compared to the
2.6.27-stable series.

The hardware is RAID5 of 5 1TB WD10EACS disks (giving almost 4TB) in an
external eSATA enclosure (STARDOM ST6600); disks are not partitioned but
rather the complete disks are used:
md5 : active raid5 sde[0] sdg[5] sdd[3] sdc[2] sdf[1]
3907045376 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5]
[UUUUU]

The enclosure is connected using a Silicon Image (supported by
sata_sil24) PCIe-X1 adapter to one of our fileservers (either the backup
fileserver, 32bit desktop hardware with Intel(R) Pentium(R) D CPU
3.40GHz, or a production-fileserver 64bit Precision WorkStation 670 w/ 2
Xeon 3.2GHz).

The ext4 filesystem was created using
mke2fs -j -T largefile -E stride=128,stripe_width=512 -O extent,uninit_bg
It is mounted with noatime,data=writeback

As operating system we usually use RHEL5.5, but to exclude problems with
self-compiled kernels, we also booted USB sticks with latest Fedora12
and FC13 .

Our benchmarks consist of copying 100 6MB files from and to the RAID5,
over NFS (NVSv3, GB ethernet, TCP, async export), and tar-ing and
rsync-ing kernel trees back and forth. Before and after each individual
benchmark part, we "sync" and "echo 3 > /proc/sys/vm/drop_caches" on
both the client and the server.

The problem:
with 2.6.27.48 we typically get:
44 seconds for preparations
23 seconds to rsync 100 frames with 597M from nfs directory
33 seconds to rsync 100 frames with 595M to nfs directory
50 seconds to untar 24353 kernel files with 323M to nfs directory
56 seconds to rsync 24353 kernel files with 323M from nfs directory
67 seconds to run xds_par in nfs directory (reads and writes 600M)
301 seconds to run the script

with 2.6.32.16 we find:
49 seconds for preparations
23 seconds to rsync 100 frames with 597M from nfs directory
261 seconds to rsync 100 frames with 595M to nfs directory
74 seconds to untar 24353 kernel files with 323M to nfs directory
67 seconds to rsync 24353 kernel files with 323M from nfs directory
290 seconds to run xds_par in nfs directory (reads and writes 600M)
797 seconds to run the script

This is quite reproducible (times varying about 1-2% or so). All times
include reading and writing on the client side (stock CentOS5.5 Nehalem
machines with fast single SATA disks). The 2.6.32.16 times are the same
with FC12 and FC13 (booted from USB stick).

The 2.6.27-versus-2.6.32+ regression cannot be due to barriers because
md RAID5 does not support barriers ("JBD: barrier-based sync failed on
md5 - disabling barriers").

What we tried: noop and deadline schedulers instead of cfq;
modifications of /sys/block/sd[c-g]/queue/max_sectors_kb; switching
on/off NCQ; blockdev --setra 8192 /dev/md5; increasing
/sys/block/md5/md/stripe_cache_size

When looking at the I/O statistics while the benchmark is running, we
see very choppy patterns for 2.6.32, but quite smooth stats for
2.6.27-stable.

It is not an NFS problem; we see the same effect when transferring the
data using an rsync daemon. We believe, but are not sure, that the
problem does not exist with ext3 - it's not so quick to re-format a 4 TB
volume.

Any ideas? We cannot believe that a general ext4 regression should have
gone unnoticed. So is it due to the interaction of ext4 with md-RAID5 ?

thanks,

Kay
--
Kay Diederichs http://strucbio.biologie.uni-konstanz.de
email: [email protected] Tel +49 7531 88 4049 Fax 3183
Fachbereich Biologie, Universit?t Konstanz, Box M647, D-78457 Konstanz.

2010-07-28 21:00:21

by Greg Freemyer

[permalink] [raw]

Subject: Re: ext4 performance regression 2.6.27-stable versus 2.6.32 and later

On Wed, Jul 28, 2010 at 3:51 PM, Kay Diederichs
<[email protected]> wrote:
> Dear all,
>
> we reproducibly find significantly worse ext4 performance when our
> fileservers run 2.6.32 or later kernels, when compared to the
> 2.6.27-stable series.
>
> The hardware is RAID5 of 5 1TB WD10EACS disks (giving almost 4TB) in an
> external eSATA enclosure (STARDOM ST6600); disks are not partitioned but
> rather the complete disks are used:
> md5 : active raid5 sde[0] sdg[5] sdd[3] sdc[2] sdf[1]
> ? ?3907045376 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5]
> [UUUUU]
>
> The enclosure is connected using a Silicon Image (supported by
> sata_sil24) PCIe-X1 adapter to one of our fileservers (either the backup
> fileserver, 32bit desktop hardware with Intel(R) Pentium(R) D CPU
> 3.40GHz, or a production-fileserver 64bit Precision WorkStation 670 w/ 2
> Xeon 3.2GHz).
>
> The ext4 filesystem was created using
> mke2fs -j -T largefile -E stride=128,stripe_width=512 -O extent,uninit_bg
> It is mounted with noatime,data=writeback
>
> As operating system we usually use RHEL5.5, but to exclude problems with
> self-compiled kernels, we also booted USB sticks with latest Fedora12
> and FC13 .
>
> Our benchmarks consist of copying 100 6MB files from and to the RAID5,
> over NFS (NVSv3, GB ethernet, TCP, async export), and tar-ing and
> rsync-ing kernel trees back and forth. Before and after each individual
> benchmark part, we "sync" and "echo 3 > /proc/sys/vm/drop_caches" on
> both the client and the server.
>
> The problem:
> with 2.6.27.48 we typically get:
> ?44 seconds for preparations
> ?23 seconds to rsync 100 frames with 597M from nfs directory
> ?33 seconds to rsync 100 frames with 595M to nfs directory
> ?50 seconds to untar 24353 kernel files with 323M to nfs directory
> ?56 seconds to rsync 24353 kernel files with 323M from nfs directory
> ?67 seconds to run xds_par in nfs directory (reads and writes 600M)
> 301 seconds to run the script
>
> with 2.6.32.16 we find:
> ?49 seconds for preparations
> ?23 seconds to rsync 100 frames with 597M from nfs directory
> 261 seconds to rsync 100 frames with 595M to nfs directory
> ?74 seconds to untar 24353 kernel files with 323M to nfs directory
> ?67 seconds to rsync 24353 kernel files with 323M from nfs directory
> 290 seconds to run xds_par in nfs directory (reads and writes 600M)
> 797 seconds to run the script
>
> This is quite reproducible (times varying about 1-2% or so). All times
> include reading and writing on the client side (stock CentOS5.5 Nehalem
> machines with fast single SATA disks). The 2.6.32.16 times are the same
> with FC12 and FC13 (booted from USB stick).
>
> The 2.6.27-versus-2.6.32+ regression cannot be due to barriers because
> md RAID5 does not support barriers ("JBD: barrier-based sync failed on
> md5 - disabling barriers").
>
> What we tried: noop and deadline schedulers instead of cfq;
> modifications of /sys/block/sd[c-g]/queue/max_sectors_kb; switching
> on/off NCQ; blockdev --setra 8192 /dev/md5; increasing
> /sys/block/md5/md/stripe_cache_size
>
> When looking at the I/O statistics while the benchmark is running, we
> see very choppy patterns for 2.6.32, but quite smooth stats for
> 2.6.27-stable.
>
> It is not an NFS problem; we see the same effect when transferring the
> data using an rsync daemon. We believe, but are not sure, that the
> problem does not exist with ext3 - it's not so quick to re-format a 4 TB
> volume.
>
> Any ideas? We cannot believe that a general ext4 regression should have
> gone unnoticed. So is it due to the interaction of ext4 with md-RAID5 ?
>
> thanks,
>
> Kay

Kay,

I didn't read your whole e-mail, but 2.6.27 has known issues with
barriers not working in many raid configs. Thus it is more likely to
experience data loss in the event of a power failure.

With newer kernels, If you prefer to have performance over robustness,
you can mount with the "nobarrier" option.

So now you have your choice whereas with 2.6.27, with raid5 you
effectively had nobarriers as your only choice.

Greg

2010-07-29 23:28:56

by Dave Chinner

[permalink] [raw]

Subject: Re: ext4 performance regression 2.6.27-stable versus 2.6.32 and later

On Wed, Jul 28, 2010 at 09:51:48PM +0200, Kay Diederichs wrote:
> Dear all,
>
> we reproducibly find significantly worse ext4 performance when our
> fileservers run 2.6.32 or later kernels, when compared to the
> 2.6.27-stable series.
>
> The hardware is RAID5 of 5 1TB WD10EACS disks (giving almost 4TB) in an
> external eSATA enclosure (STARDOM ST6600); disks are not partitioned but
> rather the complete disks are used:
> md5 : active raid5 sde[0] sdg[5] sdd[3] sdc[2] sdf[1]
> 3907045376 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5]
> [UUUUU]
>
> The enclosure is connected using a Silicon Image (supported by
> sata_sil24) PCIe-X1 adapter to one of our fileservers (either the backup
> fileserver, 32bit desktop hardware with Intel(R) Pentium(R) D CPU
> 3.40GHz, or a production-fileserver 64bit Precision WorkStation 670 w/ 2
> Xeon 3.2GHz).
>
> The ext4 filesystem was created using
> mke2fs -j -T largefile -E stride=128,stripe_width=512 -O extent,uninit_bg
> It is mounted with noatime,data=writeback
>
> As operating system we usually use RHEL5.5, but to exclude problems with
> self-compiled kernels, we also booted USB sticks with latest Fedora12
> and FC13 .
>
> Our benchmarks consist of copying 100 6MB files from and to the RAID5,
> over NFS (NVSv3, GB ethernet, TCP, async export), and tar-ing and
> rsync-ing kernel trees back and forth. Before and after each individual
> benchmark part, we "sync" and "echo 3 > /proc/sys/vm/drop_caches" on
> both the client and the server.
>
> The problem:
> with 2.6.27.48 we typically get:
> 44 seconds for preparations
> 23 seconds to rsync 100 frames with 597M from nfs directory
> 33 seconds to rsync 100 frames with 595M to nfs directory
> 50 seconds to untar 24353 kernel files with 323M to nfs directory
> 56 seconds to rsync 24353 kernel files with 323M from nfs directory
> 67 seconds to run xds_par in nfs directory (reads and writes 600M)
> 301 seconds to run the script
>
> with 2.6.32.16 we find:
> 49 seconds for preparations
> 23 seconds to rsync 100 frames with 597M from nfs directory
> 261 seconds to rsync 100 frames with 595M to nfs directory
> 74 seconds to untar 24353 kernel files with 323M to nfs directory
> 67 seconds to rsync 24353 kernel files with 323M from nfs directory
> 290 seconds to run xds_par in nfs directory (reads and writes 600M)
> 797 seconds to run the script
>
> This is quite reproducible (times varying about 1-2% or so). All times
> include reading and writing on the client side (stock CentOS5.5 Nehalem
> machines with fast single SATA disks). The 2.6.32.16 times are the same
> with FC12 and FC13 (booted from USB stick).
>
> The 2.6.27-versus-2.6.32+ regression cannot be due to barriers because
> md RAID5 does not support barriers ("JBD: barrier-based sync failed on
> md5 - disabling barriers").
>
> What we tried: noop and deadline schedulers instead of cfq;
> modifications of /sys/block/sd[c-g]/queue/max_sectors_kb; switching
> on/off NCQ; blockdev --setra 8192 /dev/md5; increasing
> /sys/block/md5/md/stripe_cache_size
>
> When looking at the I/O statistics while the benchmark is running, we
> see very choppy patterns for 2.6.32, but quite smooth stats for
> 2.6.27-stable.
>
> It is not an NFS problem; we see the same effect when transferring the
> data using an rsync daemon. We believe, but are not sure, that the
> problem does not exist with ext3 - it's not so quick to re-format a 4 TB
> volume.
>
> Any ideas? We cannot believe that a general ext4 regression should have
> gone unnoticed. So is it due to the interaction of ext4 with md-RAID5 ?

Try reverting 50797481a7bdee548589506d7d7b48b08bc14dcd (ext4: Avoid
group preallocation for closed files). IIRC it caused the same sort
of isevere performance regressions for postmark....

Cheers,

Dave.
--
Dave Chinner
[email protected]

2010-07-30 02:20:58

by Theodore Ts'o

[permalink] [raw]

Subject: Re: ext4 performance regression 2.6.27-stable versus 2.6.32 and later

On Wed, Jul 28, 2010 at 09:51:48PM +0200, Kay Diederichs wrote:
>
> When looking at the I/O statistics while the benchmark is running, we
> see very choppy patterns for 2.6.32, but quite smooth stats for
> 2.6.27-stable.

Could you try to do two things for me? Using (preferably from a
recent e2fsprogs, such as 1.41.11 or 12) run filefrag -v on the files
created from your 2.6.27 run and your 2.6.32 run?

Secondly can capture blktrace results from 2.6.27 and 2.6.32? That
would be very helpful to understand what might be going on.

Either would be helpful; both would be greatly appreciated.

Thanks,

- Ted

2010-07-30 21:01:47

by Kay Diederichs

[permalink] [raw]

Subject: Re: ext4 performance regression 2.6.27-stable versus 2.6.32 and later

Am 30.07.2010 04:20, schrieb Ted Ts'o:
> On Wed, Jul 28, 2010 at 09:51:48PM +0200, Kay Diederichs wrote:
>>
>> When looking at the I/O statistics while the benchmark is running, we
>> see very choppy patterns for 2.6.32, but quite smooth stats for
>> 2.6.27-stable.
>
> Could you try to do two things for me? Using (preferably from a
> recent e2fsprogs, such as 1.41.11 or 12) run filefrag -v on the files
> created from your 2.6.27 run and your 2.6.32 run?
>
> Secondly can capture blktrace results from 2.6.27 and 2.6.32? That
> would be very helpful to understand what might be going on.
>
> Either would be helpful; both would be greatly appreciated.
>
> Thanks,
>
> - Ted

Ted,

a typical example of filefrag -v output for 2.6.27.48 is

Filesystem type is: ef53
File size of /mnt/md5/scratch/nfs-test/tmp/xds/frames/h2g28_1_00000.cbf
is 6229688 (1521 blocks, blocksize 4096)
ext logical physical expected length flags
0 0 796160000 1024
1 1024 826381312 796161023 497 eof

(99 out of 100 files have 2 extents)

whereas for 2.6.32.16 the result is typically
Filesystem type is: ef53
File size of /mnt/md5/scratch/nfs-test/tmp/xds/frames/h2g28_1_00000.cbf
is 6229688 (1521 blocks, blocksize 4096)
ext logical physical expected length flags
0 0 826376200 1521 eof
/mnt/md5/scratch/nfs-test/tmp/xds/frames/h2g28_1_00000.cbf: 1 extent found

(99 out of 100 files have 1 extent)

We'll try the blktrace ASAP and report back.

thanks,

Kay

Attachments:

smime.p7s (5.11 kB)
S/MIME Cryptographic Signature

2010-08-01 23:02:27

by Theodore Ts'o

[permalink] [raw]

Subject: Re: ext4 performance regression 2.6.27-stable versus 2.6.32 and later

On Fri, Jul 30, 2010 at 11:01:36PM +0200, Kay Diederichs wrote:
> whereas for 2.6.32.16 the result is typically
> Filesystem type is: ef53
> File size of
> /mnt/md5/scratch/nfs-test/tmp/xds/frames/h2g28_1_00000.cbf is
> 6229688 (1521 blocks, blocksize 4096)
> ext logical physical expected length flags
> 0 0 826376200 1521 eof
> /mnt/md5/scratch/nfs-test/tmp/xds/frames/h2g28_1_00000.cbf: 1 extent found

OK, so 2.6.32 is actually doing a better job laying out the files....

The blktrace will be interesting, but at this point I'm wondering if
this is a generic kernel-wide writeback regression. At $WORK we've
noticed some performance regressions between 2.6.26-based kernels and
2.6.33- and 2.6.34-based kernels with both ext2 and ext4 (in no
journal mode) that we've been trying to track down. We have a pretty
large number of patches applied to both 2.6.26 and 2.6.33/34 which is
why I haven't mentioned it up until now, but at this point it seems
pretty clear there are some writeback issues in the mainline kernel.

There are half a dozen or so patch series on LKML that are addressing
writeback in one way or another, and writeback is a major topic at the
upcoming Linux Storage and Filesystem workshop. So if this is the
cause, hopefully there will be some improvements in this area in the
near future.

- Ted

2010-08-02 10:47:28

by Kay Diederichs

[permalink] [raw]

Subject: Re: ext4 performance regression 2.6.27-stable versus 2.6.32 and later

Greg Freemyer schrieb:
> On Wed, Jul 28, 2010 at 3:51 PM, Kay Diederichs
> <[email protected]> wrote:
>> Dear all,
>>
>> we reproducibly find significantly worse ext4 performance when our
>> fileservers run 2.6.32 or later kernels, when compared to the
>> 2.6.27-stable series.
>>
>> The hardware is RAID5 of 5 1TB WD10EACS disks (giving almost 4TB) in an
>> external eSATA enclosure (STARDOM ST6600); disks are not partitioned but
>> rather the complete disks are used:
>> md5 : active raid5 sde[0] sdg[5] sdd[3] sdc[2] sdf[1]
>> 3907045376 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5]
>> [UUUUU]
>>
>> The enclosure is connected using a Silicon Image (supported by
>> sata_sil24) PCIe-X1 adapter to one of our fileservers (either the backup
>> fileserver, 32bit desktop hardware with Intel(R) Pentium(R) D CPU
>> 3.40GHz, or a production-fileserver 64bit Precision WorkStation 670 w/ 2
>> Xeon 3.2GHz).
>>
>> The ext4 filesystem was created using
>> mke2fs -j -T largefile -E stride=128,stripe_width=512 -O extent,uninit_bg
>> It is mounted with noatime,data=writeback
>>
>> As operating system we usually use RHEL5.5, but to exclude problems with
>> self-compiled kernels, we also booted USB sticks with latest Fedora12
>> and FC13 .
>>
>> Our benchmarks consist of copying 100 6MB files from and to the RAID5,
>> over NFS (NVSv3, GB ethernet, TCP, async export), and tar-ing and
>> rsync-ing kernel trees back and forth. Before and after each individual
>> benchmark part, we "sync" and "echo 3 > /proc/sys/vm/drop_caches" on
>> both the client and the server.
>>
>> The problem:
>> with 2.6.27.48 we typically get:
>> 44 seconds for preparations
>> 23 seconds to rsync 100 frames with 597M from nfs directory
>> 33 seconds to rsync 100 frames with 595M to nfs directory
>> 50 seconds to untar 24353 kernel files with 323M to nfs directory
>> 56 seconds to rsync 24353 kernel files with 323M from nfs directory
>> 67 seconds to run xds_par in nfs directory (reads and writes 600M)
>> 301 seconds to run the script
>>
>> with 2.6.32.16 we find:
>> 49 seconds for preparations
>> 23 seconds to rsync 100 frames with 597M from nfs directory
>> 261 seconds to rsync 100 frames with 595M to nfs directory
>> 74 seconds to untar 24353 kernel files with 323M to nfs directory
>> 67 seconds to rsync 24353 kernel files with 323M from nfs directory
>> 290 seconds to run xds_par in nfs directory (reads and writes 600M)
>> 797 seconds to run the script
>>
>> This is quite reproducible (times varying about 1-2% or so). All times
>> include reading and writing on the client side (stock CentOS5.5 Nehalem
>> machines with fast single SATA disks). The 2.6.32.16 times are the same
>> with FC12 and FC13 (booted from USB stick).
>>
>> The 2.6.27-versus-2.6.32+ regression cannot be due to barriers because
>> md RAID5 does not support barriers ("JBD: barrier-based sync failed on
>> md5 - disabling barriers").
>>
>> What we tried: noop and deadline schedulers instead of cfq;
>> modifications of /sys/block/sd[c-g]/queue/max_sectors_kb; switching
>> on/off NCQ; blockdev --setra 8192 /dev/md5; increasing
>> /sys/block/md5/md/stripe_cache_size
>>
>> When looking at the I/O statistics while the benchmark is running, we
>> see very choppy patterns for 2.6.32, but quite smooth stats for
>> 2.6.27-stable.
>>
>> It is not an NFS problem; we see the same effect when transferring the
>> data using an rsync daemon. We believe, but are not sure, that the
>> problem does not exist with ext3 - it's not so quick to re-format a 4 TB
>> volume.
>>
>> Any ideas? We cannot believe that a general ext4 regression should have
>> gone unnoticed. So is it due to the interaction of ext4 with md-RAID5 ?
>>
>> thanks,
>>
>> Kay
>
> Kay,
>
> I didn't read your whole e-mail, but 2.6.27 has known issues with
> barriers not working in many raid configs. Thus it is more likely to
> experience data loss in the event of a power failure.
>
> With newer kernels, If you prefer to have performance over robustness,
> you can mount with the "nobarrier" option.
>
> So now you have your choice whereas with 2.6.27, with raid5 you
> effectively had nobarriers as your only choice.
>
> Greg

Greg,

2.6.33 and later support md5 write barriers, whereas 2.6.27-stable
doesn't. I looked thru the 2.6.32.* Changelogs at
http://kernel.org/pub/linux/kernel/v2.6/ but could not find anything
indicating that md5 write barriers were backported to 2.6.32-stable.

Anyway, we do not get the message "JBD: barrier-based sync failed on md5
- disabling barriers" when using 2.6.32.16 which might indicate that
write barriers are indeed active when specifying no options in this respect.

Performance-wise, we tried mounting with barrier versus nobarrier (or
barrier=1 versus barrier=0) and re-did the 2.6.32+ benchmarks. It turned
out that the benchmark difference with and without barrier is less than
the variation between runs (which is much higher with 2.6.32+ than with
2.6.27-stable), so the influence seems to be minor.

best,

Kay

Attachments:

smime.p7s (5.11 kB)
S/MIME Cryptographic Signature

2010-08-02 14:52:32

by Kay Diederichs

[permalink] [raw]

Subject: Re: ext4 performance regression 2.6.27-stable versus 2.6.32 and later

Dave Chinner schrieb:
> On Wed, Jul 28, 2010 at 09:51:48PM +0200, Kay Diederichs wrote:
>> Dear all,
>>
>> we reproducibly find significantly worse ext4 performance when our
>> fileservers run 2.6.32 or later kernels, when compared to the
>> 2.6.27-stable series.
>>
>> The hardware is RAID5 of 5 1TB WD10EACS disks (giving almost 4TB) in an
>> external eSATA enclosure (STARDOM ST6600); disks are not partitioned but
>> rather the complete disks are used:
>> md5 : active raid5 sde[0] sdg[5] sdd[3] sdc[2] sdf[1]
>> 3907045376 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5]
>> [UUUUU]
>>
>> The enclosure is connected using a Silicon Image (supported by
>> sata_sil24) PCIe-X1 adapter to one of our fileservers (either the backup
>> fileserver, 32bit desktop hardware with Intel(R) Pentium(R) D CPU
>> 3.40GHz, or a production-fileserver 64bit Precision WorkStation 670 w/ 2
>> Xeon 3.2GHz).
>>
>> The ext4 filesystem was created using
>> mke2fs -j -T largefile -E stride=128,stripe_width=512 -O extent,uninit_bg
>> It is mounted with noatime,data=writeback
>>
>> As operating system we usually use RHEL5.5, but to exclude problems with
>> self-compiled kernels, we also booted USB sticks with latest Fedora12
>> and FC13 .
>>
>> Our benchmarks consist of copying 100 6MB files from and to the RAID5,
>> over NFS (NVSv3, GB ethernet, TCP, async export), and tar-ing and
>> rsync-ing kernel trees back and forth. Before and after each individual
>> benchmark part, we "sync" and "echo 3 > /proc/sys/vm/drop_caches" on
>> both the client and the server.
>>
>> The problem:
>> with 2.6.27.48 we typically get:
>> 44 seconds for preparations
>> 23 seconds to rsync 100 frames with 597M from nfs directory
>> 33 seconds to rsync 100 frames with 595M to nfs directory
>> 50 seconds to untar 24353 kernel files with 323M to nfs directory
>> 56 seconds to rsync 24353 kernel files with 323M from nfs directory
>> 67 seconds to run xds_par in nfs directory (reads and writes 600M)
>> 301 seconds to run the script
>>
>> with 2.6.32.16 we find:
>> 49 seconds for preparations
>> 23 seconds to rsync 100 frames with 597M from nfs directory
>> 261 seconds to rsync 100 frames with 595M to nfs directory
>> 74 seconds to untar 24353 kernel files with 323M to nfs directory
>> 67 seconds to rsync 24353 kernel files with 323M from nfs directory
>> 290 seconds to run xds_par in nfs directory (reads and writes 600M)
>> 797 seconds to run the script
>>
>> This is quite reproducible (times varying about 1-2% or so). All times
>> include reading and writing on the client side (stock CentOS5.5 Nehalem
>> machines with fast single SATA disks). The 2.6.32.16 times are the same
>> with FC12 and FC13 (booted from USB stick).
>>
>> The 2.6.27-versus-2.6.32+ regression cannot be due to barriers because
>> md RAID5 does not support barriers ("JBD: barrier-based sync failed on
>> md5 - disabling barriers").
>>
>> What we tried: noop and deadline schedulers instead of cfq;
>> modifications of /sys/block/sd[c-g]/queue/max_sectors_kb; switching
>> on/off NCQ; blockdev --setra 8192 /dev/md5; increasing
>> /sys/block/md5/md/stripe_cache_size
>>
>> When looking at the I/O statistics while the benchmark is running, we
>> see very choppy patterns for 2.6.32, but quite smooth stats for
>> 2.6.27-stable.
>>
>> It is not an NFS problem; we see the same effect when transferring the
>> data using an rsync daemon. We believe, but are not sure, that the
>> problem does not exist with ext3 - it's not so quick to re-format a 4 TB
>> volume.
>>
>> Any ideas? We cannot believe that a general ext4 regression should have
>> gone unnoticed. So is it due to the interaction of ext4 with md-RAID5 ?
>
> Try reverting 50797481a7bdee548589506d7d7b48b08bc14dcd (ext4: Avoid
> group preallocation for closed files). IIRC it caused the same sort
> of isevere performance regressions for postmark....
>
> Cheers,
>
> Dave.

Dave,

as you suggested, we reverted "ext4: Avoid group preallocation for
closed files" and this indeed fixes a big part of the problem: after
booting the NFS server we get

NFS-Server: turn5 2.6.32.16p i686
NFS-Client: turn10 2.6.18-194.8.1.el5 x86_64

exported directory on the nfs-server:
/dev/md5 /mnt/md5 ext4
rw,seclabel,noatime,barrier=1,stripe=512,data=writeback 0 0

48 seconds for preparations
28 seconds to rsync 100 frames with 597M from nfs directory
57 seconds to rsync 100 frames with 595M to nfs directory
70 seconds to untar 24353 kernel files with 323M to nfs directory
57 seconds to rsync 24353 kernel files with 323M from nfs directory
133 seconds to run xds_par in nfs directory
425 seconds to run the script

For blktrace details, see my next email which is a response to Ted's.

best,

Kay

2010-08-02 15:28:04

by Kay Diederichs

[permalink] [raw]

Subject: Re: ext4 performance regression 2.6.27-stable versus 2.6.32 and later

Ted Ts'o schrieb:
> On Wed, Jul 28, 2010 at 09:51:48PM +0200, Kay Diederichs wrote:
>> When looking at the I/O statistics while the benchmark is running, we
>> see very choppy patterns for 2.6.32, but quite smooth stats for
>> 2.6.27-stable.
>
> Could you try to do two things for me? Using (preferably from a
> recent e2fsprogs, such as 1.41.11 or 12) run filefrag -v on the files
> created from your 2.6.27 run and your 2.6.32 run?
>
> Secondly can capture blktrace results from 2.6.27 and 2.6.32? That
> would be very helpful to understand what might be going on.
>
> Either would be helpful; both would be greatly appreciated.
>
> Thanks,
>
> - Ted

Ted,

we pared down the benchmark to the last step (called "run xds_par in nfs
directory (reads 600M, and writes 50M)") because this captures most of
the problem. Here we report kernel messages with stacktrace, and the
blktrace output that you requested.

Kernel messages: with 2.6.32.16 we observe

[ 6961.838032] INFO: task jbd2/md5-8:2010 blocked for more than 120 seconds.
[ 6961.838111] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 6961.838191] jbd2/md5-8 D 00000634 0 2010 2 0x00000000
[ 6961.838200] f5171e78 00000046 231a9999 00000634 ddf91f2c f652cc4c
00000001 f5171e1c
[ 6961.838307] c0a6f140 c0a6f140 c0a6f140 c0a6a6ac f6354c20 f6354ecc
c2008140 00000000
[ 6961.838412] 00637f84 00000003 f652cc58 00000000 00000292 00000048
c20036ac f6354ecc
[ 6961.838518] Call Trace:
[ 6961.838556] [<c056c39e>] jbd2_journal_commit_transaction+0x1d9/0x1187
[ 6961.838627] [<c040220a>] ? __switch_to+0xd5/0x147
[ 6961.838681] [<c07a390a>] ? schedule+0x837/0x885
[ 6961.838734] [<c0455e5f>] ? autoremove_wake_function+0x0/0x38
[ 6961.838799] [<c0448c84>] ? try_to_del_timer_sync+0x58/0x60
[ 6961.838859] [<c0572426>] kjournald2+0xa2/0x1be
[ 6961.838909] [<c0455e5f>] ? autoremove_wake_function+0x0/0x38
[ 6961.838971] [<c0572384>] ? kjournald2+0x0/0x1be
[ 6961.839035] [<c0455c11>] kthread+0x66/0x6b
[ 6961.839089] [<c0455bab>] ? kthread+0x0/0x6b
[ 6961.839139] [<c0404167>] kernel_thread_helper+0x7/0x10
[ 6961.839215] INFO: task sync:11600 blocked for more than 120 seconds.
[ 6961.839286] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 6961.839367] sync D 00000632 0 11600 11595 0x00000000
[ 6961.839375] ddf91ea4 00000086 dca59d8b 00000632 76f570ee 00000048
00001dec 773fd796
[ 6961.839486] c0a6f140 c0a6f140 c0a6f140 c200819c f4ce0000 f4ce02ac
c2008140 00000000
[ 6961.839600] ddf91ea8 dca5c36b 00000632 dca5bb77 c2008180 773fd796
00000282 f4ce02ac
[ 6961.839727] Call Trace:
[ 6961.839762] [<c04f4de6>] bdi_sched_wait+0xd/0x11
[ 6961.841438] [<c07a3ede>] __wait_on_bit+0x3b/0x62
[ 6961.843109] [<c04f4dd9>] ? bdi_sched_wait+0x0/0x11
[ 6961.844782] [<c07a3fb5>] out_of_line_wait_on_bit+0xb0/0xb8
[ 6961.846479] [<c04f4dd9>] ? bdi_sched_wait+0x0/0x11
[ 6961.848181] [<c0455e97>] ? wake_bit_function+0x0/0x48
[ 6961.849906] [<c04f4c75>] wait_on_bit+0x25/0x31
[ 6961.851601] [<c04f4e5d>] sync_inodes_sb+0x73/0x121
[ 6961.853287] [<c04f8acc>] __sync_filesystem+0x48/0x69
[ 6961.854983] [<c04f8b72>] sync_filesystems+0x85/0xc7
[ 6961.856670] [<c04f8c04>] sys_sync+0x20/0x32
[ 6961.858363] [<c040351b>] sysenter_do_call+0x12/0x28

Blktrace: blktrace was run for 2.6.27.48, 2.6.32.16 and a patched
2.6.32.16 (called 2.6.32.16p below and in the .tar file), where the
patch just reverts "ext4: Avoid group preallocation for closed files".
This revert removes a substantial part of the regression.

For 2.6.32.16p and 2.6.27.48 there are two runs: run1 is directly after
booting, then the directory is unexported, unmounted, mounted, exported,
and then run2 is done. For 2.6.32.16 there is just run1; all subsequent
runs yield approximately the same results, i.e. they are as slow as run1.

Some numbers (time, and number of lines with flush|nfsd|sync in the
blkparse output):
2.6.27.48 2.6.32.16 2.6.32.16p
run1 run2 run1 run2 run1 run2
wallclock 113s 61s 280s ~280s 137s 61s
flush 25362 9285 71861 32656 12066
nfsd 7595 8580 8685 8359 8444
sync 2860 3925 303 212 169

The total time seems to be dominated by the number of flushes.
It should be noted that all these runs used barrier=0 ; barrier=1 does
not have a significant effect, though.

So we find:
a) in 2.6.32.16, there is a problem which manifests itself in kernel
messages associated with the jbd2/md5-8 and sync tasks, and vastly
increased number of flush operations
b) reverting patch "ext4: Avoid group preallocation for closed files"
cures part of the problem
c) even after reverting that patch, the first run takes much longer than
the subsequent runs, despite "sync", "echo 3 >
/proc/sys/vm/drop_caches", and umounting/re-mounting the filesystem.

The blktrace files are at
http://strucbio.biologie.uni-konstanz.de/~dikay/blktraces.tar.bz2 .

Should we test any other patches?

thanks,

Kay

2010-08-02 16:04:05

by Henrique de Moraes Holschuh

[permalink] [raw]

Subject: Re: ext4 performance regression 2.6.27-stable versus 2.6.32 and later

On Mon, 02 Aug 2010, Kay Diederichs wrote:
> Performance-wise, we tried mounting with barrier versus nobarrier (or
> barrier=1 versus barrier=0) and re-did the 2.6.32+ benchmarks. It turned
> out that the benchmark difference with and without barrier is less than
> the variation between runs (which is much higher with 2.6.32+ than with
> 2.6.27-stable), so the influence seems to be minor.

Did you check interactions with the IO scheduler?

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh

2010-08-02 16:10:12

by Henrique de Moraes Holschuh

[permalink] [raw]

Subject: Re: ext4 performance regression 2.6.27-stable versus 2.6.32 and later

On Mon, 02 Aug 2010, Henrique de Moraes Holschuh wrote:
> On Mon, 02 Aug 2010, Kay Diederichs wrote:
> > Performance-wise, we tried mounting with barrier versus nobarrier (or
> > barrier=1 versus barrier=0) and re-did the 2.6.32+ benchmarks. It turned
> > out that the benchmark difference with and without barrier is less than
> > the variation between runs (which is much higher with 2.6.32+ than with
> > 2.6.27-stable), so the influence seems to be minor.
>
> Did you check interactions with the IO scheduler?

Never mind, I reread your first message, and you did. I apologise for the
noise.

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh

2010-08-02 16:12:23

by Eric Sandeen

[permalink] [raw]

Subject: Re: ext4 performance regression 2.6.27-stable versus 2.6.32 and later

On 08/02/2010 09:52 AM, Kay Diederichs wrote:
> Dave,
>
> as you suggested, we reverted "ext4: Avoid group preallocation for
> closed files" and this indeed fixes a big part of the problem: after
> booting the NFS server we get
>
> NFS-Server: turn5 2.6.32.16p i686
> NFS-Client: turn10 2.6.18-194.8.1.el5 x86_64
>
> exported directory on the nfs-server:
> /dev/md5 /mnt/md5 ext4
> rw,seclabel,noatime,barrier=1,stripe=512,data=writeback 0 0
>
> 48 seconds for preparations
> 28 seconds to rsync 100 frames with 597M from nfs directory
> 57 seconds to rsync 100 frames with 595M to nfs directory
> 70 seconds to untar 24353 kernel files with 323M to nfs directory
> 57 seconds to rsync 24353 kernel files with 323M from nfs directory
> 133 seconds to run xds_par in nfs directory
> 425 seconds to run the script

Interesting, I had found this commit to be a problem for small files
which are constantly created & deleted; the commit had the effect of
packing the newly created files in the first free space that could be
found, rather than walking down the disk leaving potentially fragmented
freespace behind (see seekwatcher graph attached). Reverting the patch
sped things up for this test, but left the filesystem freespace in bad
shape.

But you seem to see one of the largest effects in here:

261 seconds to rsync 100 frames with 595M to nfs directory
vs
57 seconds to rsync 100 frames with 595M to nfs directory

with the patch reverted making things go faster. So you are doing 100
6MB writes to the server, correct? Is the filesystem mkfs'd fresh
before each test, or is it aged? If not mkfs'd, is it at least
completely empty prior to the test, or does data remain on it? I'm just
wondering if fragmented freespace is contributing to this behavior as
well. If there is fragmented freespace, then with the patch I think the
allocator is more likely to hunt around for small discontiguous chunks
of free sapce, rather than going further out in the disk looking for a
large area to allocate from.

It might be interesting to use seekwatcher on the server to visualize
the allocation/IO patterns for the test running just this far?

-Eric

Attachments:

rhel6_ext4_comparison.png (110.87 kB)

2010-08-02 21:08:18

by Kay Diederichs

[permalink] [raw]

Subject: Re: ext4 performance regression 2.6.27-stable versus 2.6.32 and later

Am 02.08.2010 18:12, schrieb Eric Sandeen:
> On 08/02/2010 09:52 AM, Kay Diederichs wrote:
>> Dave,
>>
>> as you suggested, we reverted "ext4: Avoid group preallocation for
>> closed files" and this indeed fixes a big part of the problem: after
>> booting the NFS server we get
>>
>> NFS-Server: turn5 2.6.32.16p i686
>> NFS-Client: turn10 2.6.18-194.8.1.el5 x86_64
>>
>> exported directory on the nfs-server:
>> /dev/md5 /mnt/md5 ext4
>> rw,seclabel,noatime,barrier=1,stripe=512,data=writeback 0 0
>>
>> 48 seconds for preparations
>> 28 seconds to rsync 100 frames with 597M from nfs directory
>> 57 seconds to rsync 100 frames with 595M to nfs directory
>> 70 seconds to untar 24353 kernel files with 323M to nfs directory
>> 57 seconds to rsync 24353 kernel files with 323M from nfs directory
>> 133 seconds to run xds_par in nfs directory
>> 425 seconds to run the script
>
> Interesting, I had found this commit to be a problem for small files
> which are constantly created& deleted; the commit had the effect of
> packing the newly created files in the first free space that could be
> found, rather than walking down the disk leaving potentially fragmented
> freespace behind (see seekwatcher graph attached). Reverting the patch
> sped things up for this test, but left the filesystem freespace in bad
> shape.
>
> But you seem to see one of the largest effects in here:
>
> 261 seconds to rsync 100 frames with 595M to nfs directory
> vs
> 57 seconds to rsync 100 frames with 595M to nfs directory
>
> with the patch reverted making things go faster. So you are doing 100
> 6MB writes to the server, correct?

correct.

>
> Is the filesystem mkfs'd fresh
> before each test, or is it aged?

it is too big to "just create it freshly". It was actually created a
week ago, and filled by a single ~ 10-hour rsync job run on the server
such that the filesystem should be filled in the most linear way
possible. Since then, the benchmarking has created and deleted lots of
files.

> If not mkfs'd, is it at least
> completely empty prior to the test, or does data remain on it? I'm just

it's not empty: df -h reports
Filesystem Size Used Avail Use% Mounted on
/dev/md5 3.7T 2.8T 712G 80% /mnt/md5

e2freefrag-1.41.12 reports:
Device: /dev/md5
Blocksize: 4096 bytes
Total blocks: 976761344
Free blocks: 235345984 (24.1%)

Min. free extent: 4 KB
Max. free extent: 99348 KB
Avg. free extent: 1628 KB

HISTOGRAM OF FREE EXTENT SIZES:
Extent Size Range : Free extents Free Blocks Percent
4K... 8K- : 1858 1858 0.00%
8K... 16K- : 3415 8534 0.00%
16K... 32K- : 9952 54324 0.02%
32K... 64K- : 23884 288848 0.12%
64K... 128K- : 27901 658130 0.28%
128K... 256K- : 25761 1211519 0.51%
256K... 512K- : 35863 3376274 1.43%
512K... 1024K- : 48643 9416851 4.00%
1M... 2M- : 150311 60704033 25.79%
2M... 4M- : 244895 148283666 63.01%
4M... 8M- : 3970 5508499 2.34%
8M... 16M- : 187 551835 0.23%
16M... 32M- : 302 1765912 0.75%
32M... 64M- : 282 2727162 1.16%
64M... 128M- : 42 788539 0.34%

> wondering if fragmented freespace is contributing to this behavior as
> well. If there is fragmented freespace, then with the patch I think the
> allocator is more likely to hunt around for small discontiguous chunks
> of free sapce, rather than going further out in the disk looking for a
> large area to allocate from.

the last step of the benchmark, "xds_par", reads 600MB and writes 50MB.
It has 16 threads which might put some additional pressure on the
freespace hunting. That step also is fast in 2.6.27.48 but slow in 2.6.32+ .

>
> It might be interesting to use seekwatcher on the server to visualize
> the allocation/IO patterns for the test running just this far?
>
> -Eric

will try to install seekwatcher.

thanks,

Kay

Attachments:

smime.p7s (5.11 kB)
S/MIME Cryptographic Signature

2010-08-03 13:31:16

by Kay Diederichs

[permalink] [raw]

Subject: Re: ext4 performance regression 2.6.27-stable versus 2.6.32 and later

Eric Sandeen schrieb:
> On 08/02/2010 09:52 AM, Kay Diederichs wrote:
>> Dave,
>>
>> as you suggested, we reverted "ext4: Avoid group preallocation for
>> closed files" and this indeed fixes a big part of the problem: after
>> booting the NFS server we get
>>
>> NFS-Server: turn5 2.6.32.16p i686
>> NFS-Client: turn10 2.6.18-194.8.1.el5 x86_64
>>
>> exported directory on the nfs-server:
>> /dev/md5 /mnt/md5 ext4
>> rw,seclabel,noatime,barrier=1,stripe=512,data=writeback 0 0
>>
>> 48 seconds for preparations
>> 28 seconds to rsync 100 frames with 597M from nfs directory
>> 57 seconds to rsync 100 frames with 595M to nfs directory
>> 70 seconds to untar 24353 kernel files with 323M to nfs directory
>> 57 seconds to rsync 24353 kernel files with 323M from nfs directory
>> 133 seconds to run xds_par in nfs directory
>> 425 seconds to run the script
>
> Interesting, I had found this commit to be a problem for small files
> which are constantly created & deleted; the commit had the effect of
> packing the newly created files in the first free space that could be
> found, rather than walking down the disk leaving potentially fragmented
> freespace behind (see seekwatcher graph attached). Reverting the patch
> sped things up for this test, but left the filesystem freespace in bad
> shape.
>
> But you seem to see one of the largest effects in here:
>
> 261 seconds to rsync 100 frames with 595M to nfs directory
> vs
> 57 seconds to rsync 100 frames with 595M to nfs directory
>
> with the patch reverted making things go faster. So you are doing 100
> 6MB writes to the server, correct? Is the filesystem mkfs'd fresh
> before each test, or is it aged? If not mkfs'd, is it at least
> completely empty prior to the test, or does data remain on it? I'm just
> wondering if fragmented freespace is contributing to this behavior as
> well. If there is fragmented freespace, then with the patch I think the
> allocator is more likely to hunt around for small discontiguous chunks
> of free sapce, rather than going further out in the disk looking for a
> large area to allocate from.
>
> It might be interesting to use seekwatcher on the server to visualize
> the allocation/IO patterns for the test running just this far?
>
> -Eric
>
>
> ------------------------------------------------------------------------
>

Eric,

seekwatcher does not seem to understand the blktrace output of old
kernels, so I rolled my own primitive plotting, e.g.

blkparse -i md5.xds_par.2.6.32.16p_run1 > blkparse.out

grep flush blkparse.out | grep W > flush_W
grep flush blkparse.out | grep R > flush_R

grep nfsd blkparse.out | grep R > nfsd_R
grep nfsd blkparse.out | grep W > nfsd_W

grep sync blkparse.out | grep R > sync_R
grep sync blkparse.out | grep W > sync_W

gnuplot<<EOF
set term png
set out '2.6.32.16p_run1.png'
set key outside
set title "2.6.32.16p_run1"
plot 'nfsd_W' us 4:8,'flush_W' us 4:8,'sync_W' us 4:8,'nfsd_R' us
4:8,'flush_R' us 4:8
EOF

I attach the resulting plots for 2.6.27.48_run1 (after booting) and
2.6.27.48_run2 (after run1 ; sync; and drop_cache). They show seconds on
the x axis (horizontal) and block numbers (512-byte blocks, I suppose;
the ext4 filesystem has 976761344 4096-byte blocks so that would be
about 8e+09 512-byte blocks) on the y axis (vertical).

You'll have to do the real interpretation of the plots yourself, but
even someone who does not know exactly what the pdflush (in 2.6.27.48)
or flush (in 2.6.32+) kernel threads are supposed to do can tell that
the kernels behave _very_ differently.

In particular, stock 2.6.32.16 every time (only run1 is shown, but run2
is the same) has the flush thread visiting all of the filesystem, in
steps of 263168 blocks. I have no idea why it does this.
Roughly the first 1/3 of the filesystem is also visited by kernels
2.6.27.48 and the patched 2.6.32.16 that Dave Chinner suggested, but
only in the first run after booting. Subsequent runs are fast and do not
employ the flush thread much.

Hope this helps to pin down the regression.

thanks,

Kay

Attachments:

2.6.27.48_run1.png (5.03 kB)
2.6.27.48_run2.png (4.38 kB)
2.6.32.16p_run1.png (4.82 kB)
2.6.32.16p_run2.png (4.34 kB)
2.6.32.16.png (5.23 kB)
smime.p7s (5.11 kB)
S/MIME Cryptographic Signature Download all attachments

2010-08-04 08:18:07

by Kay Diederichs

[permalink] [raw]

Subject: Re: ext4 performance regression 2.6.27-stable versus 2.6.32 and later

Am 02.08.2010 22:21, schrieb Ted Ts'o:
> On Mon, Aug 02, 2010 at 05:30:03PM +0200, Kay Diederichs wrote:
>>
>> we pared down the benchmark to the last step (called "run xds_par in nfs
>> directory (reads 600M, and writes 50M)") because this captures most of
>> the problem. Here we report kernel messages with stacktrace, and the
>> blktrace output that you requested.
>
> Thanks, I'll take a look at it.
>
> Is NFS required to reproduce the problem? If you simply copy the 100
> files using rsync, or cp -r while logged onto the server, do you
> notice the performance regression?
>
> Thanks, regards,
>
> - Ted

Ted,

we've run the benchmarks internally on the file server; it turns out
that NFS is not required to reproduce the problem.

We also took the opportunity to try 2.6.32.17 which just came out.
2.6.32.17 behaves similar to 2.6.32.16-patched (i.e. with reverted
"ext4: Avoid group preallocation for closed files"); 2.6.32.17 has quite
a few ext4 patches so one or a couple of those seems to have a similar
effect as reverting "ext4: Avoid group preallocation for closed files".

These are the times for the second (and higher) benchmark runs; the
first run is always slower. The last step ("run xds_par") is slower than
in the NFS case because it's heavy in CPU usage (total CPU time is more
than 200 seconds); the NFS client is a 8-core (+HT) Nehalem-type
machine, whereas the NFS server is just a 2-core Pentium D @ 3.40GHz

Local machine: turn5 2.6.27.48 i686
Raid5: /dev/md5 /mnt/md5 ext4dev
rw,noatime,barrier=1,stripe=512,data=writeback 0 0
32 seconds for preparations
19 seconds to rsync 100 frames with 597M from raid5,ext4 directory
17 seconds to rsync 100 frames with 595M to raid5,ext4 directory
36 seconds to untar 24353 kernel files with 323M to raid5,ext4 directory
31 seconds to rsync 24353 kernel files with 323M from raid5,ext4 directory
267 seconds to run xds_par in raid5,ext4 directory
427 seconds to run the script

Local machine: turn5 2.6.32.16 i686 (vanilla, i.e. not patched)
Raid5: /dev/md5 /mnt/md5 ext4
rw,seclabel,noatime,barrier=0,stripe=512,data=writeback 0 0
36 seconds for preparations
18 seconds to rsync 100 frames with 597M from raid5,ext4 directory
33 seconds to rsync 100 frames with 595M to raid5,ext4 directory
68 seconds to untar 24353 kernel files with 323M to raid5,ext4 directory
40 seconds to rsync 24353 kernel files with 323M from raid5,ext4 directory
489 seconds to run xds_par in raid5,ext4 directory
714 seconds to run the script

Local machine: turn5 2.6.32.17 i686
Raid5: /dev/md5 /mnt/md5 ext4
rw,seclabel,noatime,barrier=0,stripe=512,data=writeback 0 0
38 seconds for preparations
18 seconds to rsync 100 frames with 597M from raid5,ext4 directory
33 seconds to rsync 100 frames with 595M to raid5,ext4 directory
67 seconds to untar 24353 kernel files with 323M to raid5,ext4 directory
41 seconds to rsync 24353 kernel files with 323M from raid5,ext4 directory
266 seconds to run xds_par in raid5,ext4 directory
492 seconds to run the script

So even if the patches that went into 2.6.32.17 seem to fix the worst
stalls, it is obvious that untarring and rsyncing kernel files is
significantly slower on 2.6.32.17 than 2.6.27.48 .

HTH,

Kay

Attachments:

smime.p7s (5.11 kB)
S/MIME Cryptographic Signature