LinuxLists.cc - Ext3 sequential read performance drop 2.6.29 -> 2.6.30,2.6.31,...

2009-10-13 10:35:53

Subject: Ext3 sequential read performance drop 2.6.29 -> 2.6.30,2.6.31,...

Hi all,

While benchmarking some systems I discover a big sequential read performance
drop using ext3 on ~ big files. The drop seems to be introduced in 2.6.30. I'm
testing with 2.6.28.6 -> 2.6.29.6 -> 2.6.30.4 -> 2.6.31.3.

I'm running a software raid6 (chunk 256k) on 6 750Go 7200rpm disks. here are
the raw datas of disks and raid device:

$ dd if=/dev/sda of=/dev/null bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 98.7483 seconds, 109 MB/s

$ dd if=/dev/md7 of=/dev/null bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 34.8744 seconds, 308 MB/s

Over the different kernels changes here are not important (~1MB on the raw disk
and ~5MB on the raid device). The write of a 10GB file over the fs here is also
almost constant at ~100MB/s.

$ dd if=/dev/zero of=/mnt/space/benchtmp//dd.out bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 102.547 seconds, 105 MB/s

However while reading this file there is a huge perf drop between 2.6.29.6 and
2.6.30.4 and 2.6.31.3:

2.6.28.6:
sj-dev-7:/mnt/space/Benchmark# dd if=dd.out of=/dev/null bs=1M
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 43.8288 seconds, 245 MB/s

2.6.29.6:
sj-dev-7:/mnt/space/Benchmark# dd if=dd.out of=/dev/null bs=1M
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 42.745 seconds, 251 MB/s

2.6.30.4:
$ dd if=/mnt/space/benchtmp//dd.out of=/dev/null bs=1M
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 48.621 seconds, 221 MB/s

2.6.31.3:
sj-dev-7:/mnt/space/Benchmark# dd if=dd.out of=/dev/null bs=1M
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 51.4148 seconds, 209 MB/s

... Things going worst over time ...

Numbers are average over ~10 runs each.

I first check for stripe/stride aligment of the ext3 fs that is quite important
in raid6. I recheck it and everything seems fine from my understandings and
formula:
raid6 chunk 256k -> stride = 64. 4 data disks -> stripe-width = 256 ?

In both case I'm using cfq IO scheduler and no special tuning is done with it.

For informations the test server is a Dell PowerEdge R710 with SAS 6iR, 4GB
ram and 6*750GB sata disks. I got the same behavior on PE2950 Perc6i, 2GB
ram and 6*750GB sata disks.

Here are misc informations about the setup:
sj-dev-7:/mnt/space/Benchmark# cat /proc/mdstat
md7 : active raid6 sdf7[5] sde7[4] sdd7[3] sdc7[2] sdb7[1] sda7[0]
2923443200 blocks level 6, 256k chunk, algorithm 2 [6/6] [UUUUUU]
bitmap: 0/175 pages [0KB], 2048KB chunk

sj-dev-7:/mnt/space/Benchmark# dumpe2fs -h /dev/md7
dumpe2fs 1.40-WIP (14-Nov-2006)
Filesystem volume name: <none>
Last mounted on: <not available>
Filesystem UUID: 9c29f236-e4f2-4db4-bf48-ea613cd0ebad
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal resize_inode dir_index filetype
needs_recovery sparse_super large_file Filesystem flags: signed
directory hash Default mount options: (none)
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 713760
Block count: 730860800
Reserved block count: 0
Free blocks: 705211695
Free inodes: 713655
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 849
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 32
Inode blocks per group: 1
Filesystem created: Thu Oct 1 15:45:01 2009
Last mount time: Mon Oct 12 13:17:45 2009
Last write time: Mon Oct 12 13:17:45 2009
Mount count: 10
Maximum mount count: 30
Last checked: Thu Oct 1 15:45:01 2009
Check interval: 15552000 (6 months)
Next check after: Tue Mar 30 15:45:01 2010
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 128
Journal inode: 8
Default directory hash: tea
Directory Hash Seed: 378d4fd2-23c9-487c-b635-5601585f0da7
Journal backup: inode blocks
Journal size: 128M

Thanks all.

--
Laurent Corbes - [email protected]
SmartJog SAS | Phone: +33 1 5868 6225 | Fax: +33 1 5868 6255 | http://www.smartjog.com
27 Blvd Hippolyte Marqu?s, 94200 Ivry-sur-Seine, France
A TDF Group company

2009-10-13 13:08:12

by Laurent CORBES

[permalink] [raw]

Subject: Re: Ext3 sequential read performance drop 2.6.29 -> 2.6.30,2.6.31,...

Some updates and added linux-fsdevel in the loop:

> While benchmarking some systems I discover a big sequential read performance
> drop using ext3 on ~ big files. The drop seems to be introduced in 2.6.30. I'm
> testing with 2.6.28.6 -> 2.6.29.6 -> 2.6.30.4 -> 2.6.31.3.
>
> I'm running a software raid6 (chunk 256k) on 6 750Go 7200rpm disks. here are
> the raw datas of disks and raid device:
>
> $ dd if=/dev/sda of=/dev/null bs=1M count=10240
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 98.7483 seconds, 109 MB/s
>
> $ dd if=/dev/md7 of=/dev/null bs=1M count=10240
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 34.8744 seconds, 308 MB/s
>
> Over the different kernels changes here are not important (~1MB on the raw disk
> and ~5MB on the raid device). The write of a 10GB file over the fs here is also
> almost constant at ~100MB/s.
>
> $ dd if=/dev/zero of=/mnt/space/benchtmp//dd.out bs=1M count=10240
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 102.547 seconds, 105 MB/s
>
> However while reading this file there is a huge perf drop between 2.6.29.6 and
> 2.6.30.4 and 2.6.31.3:

I add slabtop infos before and after the runs for 2.6.28.6 and 2.6.31.3. run is
just after a system reboot

Active / Total Objects (% used) : 83612 / 90199 (92.7%)
Active / Total Slabs (% used) : 4643 / 4643 (100.0%)
Active / Total Caches (% used) : 93 / 150 (62.0%)
Active / Total Size (% used) : 16989.63K / 17858.85K (95.1%)
Minimum / Average / Maximum Object : 0.01K / 0.20K / 4096.00K

OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
20820 20688 99% 0.12K 694 30 2776K dentry
12096 12029 99% 0.04K 144 84 576K sysfs_dir_cache
8701 8523 97% 0.03K 77 113 308K size-32
6036 6018 99% 0.32K 503 12 2012K inode_cache
4757 4646 97% 0.05K 71 67 284K buffer_head
4602 4254 92% 0.06K 78 59 312K size-64
4256 4256 100% 0.47K 532 8 2128K ext3_inode_cache
3864 3607 93% 0.08K 84 46 336K vm_area_struct
2509 2509 100% 0.28K 193 13 772K radix_tree_node
2130 1373 64% 0.12K 71 30 284K filp
1962 1938 98% 0.41K 218 9 872K shmem_inode_cache
1580 1580 100% 0.19K 79 20 316K skbuff_head_cache
1524 1219 79% 0.01K 6 254 24K anon_vma
1450 1450 100% 2.00K 725 2 2900K size-2048
1432 1382 96% 0.50K 179 8 716K size-512
1260 1198 95% 0.12K 42 30 168K size-128

> 2.6.28.6:
> sj-dev-7:/mnt/space/Benchmark# dd if=dd.out of=/dev/null bs=1M
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 43.8288 seconds, 245 MB/s

Active / Total Objects (% used) : 78853 / 90405 (87.2%)
Active / Total Slabs (% used) : 5079 / 5084 (99.9%)
Active / Total Caches (% used) : 93 / 150 (62.0%)
Active / Total Size (% used) : 17612.24K / 19391.84K (90.8%)
Minimum / Average / Maximum Object : 0.01K / 0.21K / 4096.00K

OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
17589 17488 99% 0.28K 1353 13 5412K radix_tree_node
12096 12029 99% 0.04K 144 84 576K sysfs_dir_cache
9840 5659 57% 0.12K 328 30 1312K dentry
8701 8568 98% 0.03K 77 113 308K size-32
5226 4981 95% 0.05K 78 67 312K buffer_head
4602 4366 94% 0.06K 78 59 312K size-64
4264 4253 99% 0.47K 533 8 2132K ext3_inode_cache
3726 3531 94% 0.08K 81 46 324K vm_area_struct
2130 1364 64% 0.12K 71 30 284K filp
1962 1938 98% 0.41K 218 9 872K shmem_inode_cache
1580 1460 92% 0.19K 79 20 316K skbuff_head_cache
1548 1406 90% 0.32K 129 12 516K inode_cache
1524 1228 80% 0.01K 6 254 24K anon_vma
1450 1424 98% 2.00K 725 2 2900K size-2048
1432 1370 95% 0.50K 179 8 716K size-512
1260 1202 95% 0.12K 42 30 168K size-128

> 2.6.29.6:
> sj-dev-7:/mnt/space/Benchmark# dd if=dd.out of=/dev/null bs=1M
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 42.745 seconds, 251 MB/s
>
> 2.6.30.4:
> $ dd if=/mnt/space/benchtmp//dd.out of=/dev/null bs=1M
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 48.621 seconds, 221 MB/s

Active / Total Objects (% used) : 88438 / 97670 (90.5%)
Active / Total Slabs (% used) : 5451 / 5451 (100.0%)
Active / Total Caches (% used) : 93 / 155 (60.0%)
Active / Total Size (% used) : 19564.52K / 20948.54K (93.4%)
Minimum / Average / Maximum Object : 0.01K / 0.21K / 4096.00K

OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
21547 21527 99% 0.13K 743 29 2972K dentry
12684 12636 99% 0.04K 151 84 604K sysfs_dir_cache
8927 8639 96% 0.03K 79 113 316K size-32
6721 6720 99% 0.33K 611 11 2444K inode_cache
4425 4007 90% 0.06K 75 59 300K size-64
4240 4237 99% 0.48K 530 8 2120K ext3_inode_cache
4154 4089 98% 0.05K 62 67 248K buffer_head
3910 3574 91% 0.08K 85 46 340K vm_area_struct
2483 2449 98% 0.28K 191 13 764K radix_tree_node
2280 1330 58% 0.12K 76 30 304K filp
2240 2132 95% 0.19K 112 20 448K skbuff_head_cache
2198 2198 100% 2.00K 1099 2 4396K size-2048
1935 1910 98% 0.43K 215 9 860K shmem_inode_cache
1770 1738 98% 0.12K 59 30 236K size-96
1524 1278 83% 0.01K 6 254 24K anon_vma
1056 936 88% 0.50K 132 8 528K size-512

> 2.6.31.3:
> sj-dev-7:/mnt/space/Benchmark# dd if=dd.out of=/dev/null bs=1M
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 51.4148 seconds, 209 MB/s

Active / Total Objects (% used) : 81843 / 97478 (84.0%)
Active / Total Slabs (% used) : 5759 / 5763 (99.9%)
Active / Total Caches (% used) : 92 / 155 (59.4%)
Active / Total Size (% used) : 19486.81K / 22048.45K (88.4%)
Minimum / Average / Maximum Object : 0.01K / 0.23K / 4096.00K

OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
17589 17426 99% 0.28K 1353 13 5412K radix_tree_node
12684 12636 99% 0.04K 151 84 604K sysfs_dir_cache
10991 6235 56% 0.13K 379 29 1516K dentry
8927 8624 96% 0.03K 79 113 316K size-32
4824 4819 99% 0.05K 72 67 288K buffer_head
4425 3853 87% 0.06K 75 59 300K size-64
3910 3527 90% 0.08K 85 46 340K vm_area_struct
3560 3268 91% 0.48K 445 8 1780K ext3_inode_cache
2288 1394 60% 0.33K 208 11 832K inode_cache
2280 1236 54% 0.12K 76 30 304K filp
2240 2183 97% 0.19K 112 20 448K skbuff_head_cache
2216 2191 98% 2.00K 1108 2 4432K size-2048
1935 1910 98% 0.43K 215 9 860K shmem_inode_cache
1770 1719 97% 0.12K 59 30 236K size-96
1524 1203 78% 0.01K 6 254 24K anon_vma
1056 921 87% 0.50K 132 8 528K size-512

> ... Things going worst over time ...
>
> Numbers are average over ~10 runs each.
>
> I first check for stripe/stride aligment of the ext3 fs that is quite important
> in raid6. I recheck it and everything seems fine from my understandings and
> formula:
> raid6 chunk 256k -> stride = 64. 4 data disks -> stripe-width = 256 ?
>
> In both case I'm using cfq IO scheduler and no special tuning is done with it.
>
>
> For informations the test server is a Dell PowerEdge R710 with SAS 6iR, 4GB
> ram and 6*750GB sata disks. I got the same behavior on PE2950 Perc6i, 2GB
> ram and 6*750GB sata disks.
>
> Here are misc informations about the setup:
> sj-dev-7:/mnt/space/Benchmark# cat /proc/mdstat
> md7 : active raid6 sdf7[5] sde7[4] sdd7[3] sdc7[2] sdb7[1] sda7[0]
> 2923443200 blocks level 6, 256k chunk, algorithm 2 [6/6] [UUUUUU]
> bitmap: 0/175 pages [0KB], 2048KB chunk
>
> sj-dev-7:/mnt/space/Benchmark# dumpe2fs -h /dev/md7
> dumpe2fs 1.40-WIP (14-Nov-2006)
> Filesystem volume name: <none>
> Last mounted on: <not available>
> Filesystem UUID: 9c29f236-e4f2-4db4-bf48-ea613cd0ebad
> Filesystem magic number: 0xEF53
> Filesystem revision #: 1 (dynamic)
> Filesystem features: has_journal resize_inode dir_index filetype
> needs_recovery sparse_super large_file Filesystem flags: signed
> directory hash Default mount options: (none)
> Filesystem state: clean
> Errors behavior: Continue
> Filesystem OS type: Linux
> Inode count: 713760
> Block count: 730860800
> Reserved block count: 0
> Free blocks: 705211695
> Free inodes: 713655
> First block: 0
> Block size: 4096
> Fragment size: 4096
> Reserved GDT blocks: 849
> Blocks per group: 32768
> Fragments per group: 32768
> Inodes per group: 32
> Inode blocks per group: 1
> Filesystem created: Thu Oct 1 15:45:01 2009
> Last mount time: Mon Oct 12 13:17:45 2009
> Last write time: Mon Oct 12 13:17:45 2009
> Mount count: 10
> Maximum mount count: 30
> Last checked: Thu Oct 1 15:45:01 2009
> Check interval: 15552000 (6 months)
> Next check after: Tue Mar 30 15:45:01 2010
> Reserved blocks uid: 0 (user root)
> Reserved blocks gid: 0 (group root)
> First inode: 11
> Inode size: 128
> Journal inode: 8
> Default directory hash: tea
> Directory Hash Seed: 378d4fd2-23c9-487c-b635-5601585f0da7
> Journal backup: inode blocks
> Journal size: 128M

Thanks all.
--
Laurent Corbes - [email protected]
SmartJog SAS | Phone: +33 1 5868 6225 | Fax: +33 1 5868 6255 | http://www.smartjog.com
27 Blvd Hippolyte Marqu?s, 94200 Ivry-sur-Seine, France
A TDF Group company

2009-11-02 21:56:13

by Andrew Morton

[permalink] [raw]

Subject: Re: Ext3 sequential read performance drop 2.6.29 -> 2.6.30,2.6.31,...

On Tue, 13 Oct 2009 12:09:55 +0200
Laurent CORBES <[email protected]> wrote:

> Hi all,
>
> While benchmarking some systems I discover a big sequential read performance
> drop using ext3 on ~ big files. The drop seems to be introduced in 2.6.30. I'm
> testing with 2.6.28.6 -> 2.6.29.6 -> 2.6.30.4 -> 2.6.31.3.

Seems that large performance regressions aren't of interest to this
list :(

> I'm running a software raid6 (chunk 256k) on 6 750Go 7200rpm disks. here are
> the raw datas of disks and raid device:
>
> $ dd if=/dev/sda of=/dev/null bs=1M count=10240
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 98.7483 seconds, 109 MB/s
>
> $ dd if=/dev/md7 of=/dev/null bs=1M count=10240
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 34.8744 seconds, 308 MB/s
>
> Over the different kernels changes here are not important (~1MB on the raw disk
> and ~5MB on the raid device). The write of a 10GB file over the fs here is also
> almost constant at ~100MB/s.
>
> $ dd if=/dev/zero of=/mnt/space/benchtmp//dd.out bs=1M count=10240
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 102.547 seconds, 105 MB/s
>
> However while reading this file there is a huge perf drop between 2.6.29.6 and
> 2.6.30.4 and 2.6.31.3:
>
> 2.6.28.6:
> sj-dev-7:/mnt/space/Benchmark# dd if=dd.out of=/dev/null bs=1M
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 43.8288 seconds, 245 MB/s
>
> 2.6.29.6:
> sj-dev-7:/mnt/space/Benchmark# dd if=dd.out of=/dev/null bs=1M
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 42.745 seconds, 251 MB/s
>
> 2.6.30.4:
> $ dd if=/mnt/space/benchtmp//dd.out of=/dev/null bs=1M
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 48.621 seconds, 221 MB/s
>
> 2.6.31.3:
> sj-dev-7:/mnt/space/Benchmark# dd if=dd.out of=/dev/null bs=1M
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 51.4148 seconds, 209 MB/s
>
> ... Things going worst over time ...

Did you do any further investigation? Do you think the regression is
due to MD changes, or to something else?

Thanks.

> Numbers are average over ~10 runs each.
>
> I first check for stripe/stride aligment of the ext3 fs that is quite important
> in raid6. I recheck it and everything seems fine from my understandings and
> formula:
> raid6 chunk 256k -> stride = 64. 4 data disks -> stripe-width = 256 ?
>
> In both case I'm using cfq IO scheduler and no special tuning is done with it.
>
>
> For informations the test server is a Dell PowerEdge R710 with SAS 6iR, 4GB
> ram and 6*750GB sata disks. I got the same behavior on PE2950 Perc6i, 2GB
> ram and 6*750GB sata disks.
>
> Here are misc informations about the setup:
> sj-dev-7:/mnt/space/Benchmark# cat /proc/mdstat
> md7 : active raid6 sdf7[5] sde7[4] sdd7[3] sdc7[2] sdb7[1] sda7[0]
> 2923443200 blocks level 6, 256k chunk, algorithm 2 [6/6] [UUUUUU]
> bitmap: 0/175 pages [0KB], 2048KB chunk
>
> sj-dev-7:/mnt/space/Benchmark# dumpe2fs -h /dev/md7
> dumpe2fs 1.40-WIP (14-Nov-2006)
> Filesystem volume name: <none>
> Last mounted on: <not available>
> Filesystem UUID: 9c29f236-e4f2-4db4-bf48-ea613cd0ebad
> Filesystem magic number: 0xEF53
> Filesystem revision #: 1 (dynamic)
> Filesystem features: has_journal resize_inode dir_index filetype
> needs_recovery sparse_super large_file Filesystem flags: signed
> directory hash Default mount options: (none)
> Filesystem state: clean
> Errors behavior: Continue
> Filesystem OS type: Linux
> Inode count: 713760
> Block count: 730860800
> Reserved block count: 0
> Free blocks: 705211695
> Free inodes: 713655
> First block: 0
> Block size: 4096
> Fragment size: 4096
> Reserved GDT blocks: 849
> Blocks per group: 32768
> Fragments per group: 32768
> Inodes per group: 32
> Inode blocks per group: 1
> Filesystem created: Thu Oct 1 15:45:01 2009
> Last mount time: Mon Oct 12 13:17:45 2009
> Last write time: Mon Oct 12 13:17:45 2009
> Mount count: 10
> Maximum mount count: 30
> Last checked: Thu Oct 1 15:45:01 2009
> Check interval: 15552000 (6 months)
> Next check after: Tue Mar 30 15:45:01 2010
> Reserved blocks uid: 0 (user root)
> Reserved blocks gid: 0 (group root)
> First inode: 11
> Inode size: 128
> Journal inode: 8
> Default directory hash: tea
> Directory Hash Seed: 378d4fd2-23c9-487c-b635-5601585f0da7
> Journal backup: inode blocks
> Journal size: 128M
>
>
> Thanks all.
>
> --
> Laurent Corbes - [email protected]
> SmartJog SAS | Phone: +33 1 5868 6225 | Fax: +33 1 5868 6255 | http://www.smartjog.com
> 27 Blvd Hippolyte Marqu__s, 94200 Ivry-sur-Seine, France
> A TDF Group company
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2009-11-03 10:06:46

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [dm-devel] Re: Ext3 sequential read performance drop 2.6.29 -> 2.6.30,2.6.31,...

On Mon, Nov 02, 2009 at 01:55:54PM -0800, Andrew Morton wrote:
> On Tue, 13 Oct 2009 12:09:55 +0200
> Laurent CORBES <[email protected]> wrote:
>
> > Hi all,
> >
> > While benchmarking some systems I discover a big sequential read performance
> > drop using ext3 on ~ big files. The drop seems to be introduced in 2.6.30. I'm
> > testing with 2.6.28.6 -> 2.6.29.6 -> 2.6.30.4 -> 2.6.31.3.
>
> Seems that large performance regressions aren't of interest to this
> list :(

No sure which list you mean, but dm-devel is for dm, not md. We're also
seeing similarly massive performance drops with md and ext3/xfs as
already reported on the list. Someone tracked it down to writeback
changes as usual, but there it got stuck.

2009-11-03 10:42:39

by NeilBrown

[permalink] [raw]

Subject: Re: [dm-devel] Re: Ext3 sequential read performance drop 2.6.29 -> 2.6.30,2.6.31,...

On Tue, November 3, 2009 9:06 pm, Christoph Hellwig wrote:
> On Mon, Nov 02, 2009 at 01:55:54PM -0800, Andrew Morton wrote:
>> On Tue, 13 Oct 2009 12:09:55 +0200
>> Laurent CORBES <[email protected]> wrote:
>>
>> > Hi all,
>> >
>> > While benchmarking some systems I discover a big sequential read
>> performance
>> > drop using ext3 on ~ big files. The drop seems to be introduced in
>> 2.6.30. I'm
>> > testing with 2.6.28.6 -> 2.6.29.6 -> 2.6.30.4 -> 2.6.31.3.
>>
>> Seems that large performance regressions aren't of interest to this
>> list :(
>
> No sure which list you mean, but dm-devel is for dm, not md. We're also
> seeing similarly massive performance drops with md and ext3/xfs as
> already reported on the list. Someone tracked it down to writeback
> changes as usual, but there it got stuck.

I'm still looking - running some basic tests on 4 filesystems over
half a dozen recent kernels to see what has been happening.

I have a suspicion that there a multiple problems.
In particular, XFS has a strange degradation which was papered over
by commit c8a4051c3731b.
I'm beginning to wonder if it was caused by commit 17bc6c30cf6bf
but I haven't actually tested that yet.

NeilBrown

2009-11-03 10:54:39

by Laurent CORBES

[permalink] [raw]

Subject: Re: [dm-devel] Re: Ext3 sequential read performance drop 2.6.29 -> 2.6.30,2.6.31,...

Hi all,

> >> > Hi all,
> >> >
> >> > While benchmarking some systems I discover a big sequential read
> >> performance
> >> > drop using ext3 on ~ big files. The drop seems to be introduced in
> >> 2.6.30. I'm
> >> > testing with 2.6.28.6 -> 2.6.29.6 -> 2.6.30.4 -> 2.6.31.3.
> >>
> >> Seems that large performance regressions aren't of interest to this
> >> list :(

Or +200MB/s is enough for a lot of people :)

> > No sure which list you mean, but dm-devel is for dm, not md. We're also
> > seeing similarly massive performance drops with md and ext3/xfs as
> > already reported on the list. Someone tracked it down to writeback
> > changes as usual, but there it got stuck.
>
> I'm still looking - running some basic tests on 4 filesystems over
> half a dozen recent kernels to see what has been happening.
>
> I have a suspicion that there a multiple problems.
> In particular, XFS has a strange degradation which was papered over
> by commit c8a4051c3731b.
> I'm beginning to wonder if it was caused by commit 17bc6c30cf6bf
> but I haven't actually tested that yet.

What is really strange is that from all the tests I did the raw md perfs never
dropped. only a few MB of diff between kernel (~2%). This is maybe related to
the way upper FS write datas on the md layer.

I'll make the tests on raw disks to see if there is some troubles here also. I
can also test with other raid layers. Is there any tuning/debug I
can make for you ? I can also setup a remote access to this system if needed.

Thanks.
--
Laurent Corbes - [email protected]
SmartJog SAS | Phone: +33 1 5868 6225 | Fax: +33 1 5868 6255 | http://www.smartjog.com
27 Blvd Hippolyte Marqu?s, 94200 Ivry-sur-Seine, France
A TDF Group company

2009-11-03 16:51:08

by Andrew Morton

[permalink] [raw]

Subject: Re: [dm-devel] Re: Ext3 sequential read performance drop 2.6.29 -> 2.6.30,2.6.31,...

On Tue, 3 Nov 2009 21:42:30 +1100 "NeilBrown" <[email protected]> wrote:

> On Tue, November 3, 2009 9:06 pm, Christoph Hellwig wrote:
> > On Mon, Nov 02, 2009 at 01:55:54PM -0800, Andrew Morton wrote:
> >> On Tue, 13 Oct 2009 12:09:55 +0200
> >> Laurent CORBES <[email protected]> wrote:
> >>
> >> > Hi all,
> >> >
> >> > While benchmarking some systems I discover a big sequential read
> >> performance
> >> > drop using ext3 on ~ big files. The drop seems to be introduced in
> >> 2.6.30. I'm
> >> > testing with 2.6.28.6 -> 2.6.29.6 -> 2.6.30.4 -> 2.6.31.3.
> >>
> >> Seems that large performance regressions aren't of interest to this
> >> list :(
> >
> > No sure which list you mean, but dm-devel is for dm, not md.

bah.

> We're also
> > seeing similarly massive performance drops with md and ext3/xfs as
> > already reported on the list. Someone tracked it down to writeback
> > changes as usual, but there it got stuck.
>
> I'm still looking - running some basic tests on 4 filesystems over
> half a dozen recent kernels to see what has been happening.
>
> I have a suspicion that there a multiple problems.
> In particular, XFS has a strange degradation which was papered over
> by commit c8a4051c3731b.
> I'm beginning to wonder if it was caused by commit 17bc6c30cf6bf
> but I haven't actually tested that yet.

I think Laurent's workload involves only reads, with no writes.