2012-02-29 13:52:33

by Jacek Luczak

[permalink] [raw]
Subject: getdents - ext4 vs btrfs performance

Hi All,

/*Sorry for sending incomplete email, hit wrong button :) I guess I
can't use Gmail */

Long story short: We've found that operations on a directory structure
holding many dirs takes ages on ext4.

The Question: Why there's that huge difference in ext4 and btrfs? See
below test results for real values.

Background: I had to backup a Jenkins directory holding workspace for
few projects which were co from svn (implies lot of extra .svn dirs).
The copy takes lot of time (at least more than I've expected) and
process was mostly in D (disk sleep). I've dig more and done some
extra test to see if this is not a regression on block/fs site. To
isolate the issue I've also performed same tests on btrfs.

Test environment configuration:
1) HW: HP ProLiant BL460 G6, 48 GB of memory, 2x 6 core Intel X5670 HT
enabled, Smart Array P410i, RAID 1 on top of 2x 10K RPM SAS HDDs.
2) Kernels: All tests were done on following kernels:
- 2.6.39.4-3 -- the build ID (3) is used here for internal tacking of
config changes mostly. In -3 we've introduced ,,fix readahead pipeline
break caused by block plug'' patch. Otherwise it's pure 2.6.39.4.
- 3.2.7 -- latest kernel at the time of testing (3.2.8 has been
release recently).
3) A subject of tests, directory holding:
- 54GB of data (measured on ext4)
- 1978149 files
- 844008 directories
4) Mount options:
- ext4 -- errors=remount-ro,noatime,
data=writeback
- btrfs -- noatime,nodatacow and for later investigation on
copression effect: noatime,nodatacow,compress=lzo

In all tests I've been measuring time of execution. Following tests
were performed:
- find . -type d
- find . -type f
- cp -a
- rm -rf

Ext4 results:
| Type | 2.6.39.4-3 | 3.2.7
| Dir cnt | 17m 40sec | 11m 20sec
| File cnt | 17m 36sec | 11m 22sec
| Copy | 1h 28m | 1h 27m
| Remove| 3m 43sec | 3m 38sec

Btrfs results (without lzo comression):
| Type | 2.6.39.4-3 | 3.2.7
| Dir cnt | 2m 22sec | 2m 21sec
| File cnt | 2m 26sec | 2m 23sec
| Copy | 36m 22sec | 39m 35sec
| Remove| 7m 51sec | 10m 43sec

>From above one can see that copy takes close to 1h less on btrfs. I've
done strace counting times of calls, results are as follows (from
3.2.7):
1) Ext4 (only to elements):
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
57.01 13.257850 1 15082163 read
23.40 5.440353 3 1687702 getdents
6.15 1.430559 0 3672418 lstat
3.80 0.883767 0 13106961 write
2.32 0.539959 0 4794099 open
1.69 0.393589 0 843695 mkdir
1.28 0.296700 0 5637802 setxattr
0.80 0.186539 0 7325195 stat

2) Btrfs:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
53.38 9.486210 1 15179751 read
11.38 2.021662 1 1688328 getdents
10.64 1.890234 0 4800317 open
6.83 1.213723 0 13201590 write
4.85 0.862731 0 5644314 setxattr
3.50 0.621194 1 844008 mkdir
2.75 0.489059 0 3675992 1 lstat
1.71 0.303544 0 5644314 llistxattr
1.50 0.265943 0 1978149 utimes
1.02 0.180585 0 5644314 844008 getxattr

On btrfs getdents takes much less time which prove the bottleneck in
copy time on ext4 is this syscall. In 2.6.39.4 it shows even less time
for getdents:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
50.77 10.978816 1 15033132 read
14.46 3.125996 1 4733589 open
7.15 1.546311 0 5566988 setxattr
5.89 1.273845 0 3626505 lstat
5.81 1.255858 1 1667050 getdents
5.66 1.224403 0 13083022 write
3.40 0.735114 1 833371 mkdir
1.96 0.424881 0 5566988 llistxattr


Why so huge difference in the getdents timings?

-Jacek


2012-02-29 13:55:18

by Jacek Luczak

[permalink] [raw]
Subject: Re: getdents - ext4 vs btrfs performance

Hi Chris,

the last one was borked :) Please check this one.

-jacek

2012/2/29 Jacek Luczak <[email protected]>:
> Hi All,
>
> /*Sorry for sending incomplete email, hit wrong button :) I guess I
> can't use Gmail */
>
> Long story short: We've found that operations on a directory structure
> holding many dirs takes ages on ext4.
>
> The Question: Why there's that huge difference in ext4 and btrfs? See
> below test results for real values.
>
> Background: I had to backup a Jenkins directory holding workspace for
> few projects which were co from svn (implies lot of extra .svn dirs).
> The copy takes lot of time (at least more than I've expected) and
> process was mostly in D (disk sleep). I've dig more and done some
> extra test to see if this is not a regression on block/fs site. To
> isolate the issue I've also performed same tests on btrfs.
>
> Test environment configuration:
> 1) HW: HP ProLiant BL460 G6, 48 GB of memory, 2x 6 core Intel X5670 HT
> enabled, Smart Array P410i, RAID 1 on top of 2x 10K RPM SAS HDDs.
> 2) Kernels: All tests were done on following kernels:
> ?- 2.6.39.4-3 -- the build ID (3) is used here for internal tacking of
> config changes mostly. In -3 we've introduced ,,fix readahead pipeline
> break caused by block plug'' patch. Otherwise it's pure 2.6.39.4.
> ?- 3.2.7 -- latest kernel at the time of testing (3.2.8 has been
> release recently).
> 3) A subject of tests, directory holding:
> ?- 54GB of data (measured on ext4)
> ?- 1978149 files
> ?- 844008 directories
> 4) Mount options:
> ?- ext4 -- errors=remount-ro,noatime,
> data=writeback
> ?- btrfs -- noatime,nodatacow and for later investigation on
> copression effect: noatime,nodatacow,compress=lzo
>
> In all tests I've been measuring time of execution. Following tests
> were performed:
> - find . -type d
> - find . -type f
> - cp -a
> - rm -rf
>
> Ext4 results:
> | Type ? ? | 2.6.39.4-3 ? | 3.2.7
> | Dir cnt ?| 17m 40sec ?| 11m 20sec
> | File cnt | ?17m 36sec | 11m 22sec
> | Copy ? ?| 1h 28m ? ? ? ?| 1h 27m
> | Remove| 3m 43sec ? ?| 3m 38sec
>
> Btrfs results (without lzo comression):
> | Type ? ? | 2.6.39.4-3 ? | 3.2.7
> | Dir cnt ?| 2m 22sec ?| 2m 21sec
> | File cnt | ?2m 26sec | 2m 23sec
> | Copy ? ?| 36m 22sec | 39m 35sec
> | Remove| 7m 51sec ? | 10m 43sec
>
> From above one can see that copy takes close to 1h less on btrfs. I've
> done strace counting times of calls, results are as follows (from
> 3.2.7):
> 1) Ext4 (only to elements):
> % time ? ? seconds ?usecs/call ? ? calls ? ?errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> ?57.01 ? 13.257850 ? ? ? ? ? 1 ?15082163 ? ? ? ? ? read
> ?23.40 ? ?5.440353 ? ? ? ? ? 3 ? 1687702 ? ? ? ? ? getdents
> ?6.15 ? ?1.430559 ? ? ? ? ? 0 ? 3672418 ? ? ? ? ? lstat
> ?3.80 ? ?0.883767 ? ? ? ? ? 0 ?13106961 ? ? ? ? ? write
> ?2.32 ? ?0.539959 ? ? ? ? ? 0 ? 4794099 ? ? ? ? ? open
> ?1.69 ? ?0.393589 ? ? ? ? ? 0 ? ?843695 ? ? ? ? ? mkdir
> ?1.28 ? ?0.296700 ? ? ? ? ? 0 ? 5637802 ? ? ? ? ? setxattr
> ?0.80 ? ?0.186539 ? ? ? ? ? 0 ? 7325195 ? ? ? ? ? stat
>
> 2) Btrfs:
> % time ? ? seconds ?usecs/call ? ? calls ? ?errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> 53.38 ? ?9.486210 ? ? ? ? ? 1 ?15179751 ? ? ? ? ? read
> 11.38 ? ?2.021662 ? ? ? ? ? 1 ? 1688328 ? ? ? ? ? getdents
> ?10.64 ? ?1.890234 ? ? ? ? ? 0 ? 4800317 ? ? ? ? ? open
> ?6.83 ? ?1.213723 ? ? ? ? ? 0 ?13201590 ? ? ? ? ? write
> ?4.85 ? ?0.862731 ? ? ? ? ? 0 ? 5644314 ? ? ? ? ? setxattr
> ?3.50 ? ?0.621194 ? ? ? ? ? 1 ? ?844008 ? ? ? ? ? mkdir
> ?2.75 ? ?0.489059 ? ? ? ? ? 0 ? 3675992 ? ? ? ? 1 lstat
> ?1.71 ? ?0.303544 ? ? ? ? ? 0 ? 5644314 ? ? ? ? ? llistxattr
> ?1.50 ? ?0.265943 ? ? ? ? ? 0 ? 1978149 ? ? ? ? ? utimes
> ?1.02 ? ?0.180585 ? ? ? ? ? 0 ? 5644314 ? ?844008 getxattr
>
> On btrfs getdents takes much less time which prove the bottleneck in
> copy time on ext4 is this syscall. In 2.6.39.4 it shows even less time
> for getdents:
> % time ? ? seconds ?usecs/call ? ? calls ? ?errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> ?50.77 ? 10.978816 ? ? ? ? ? 1 ?15033132 ? ? ? ? ? read
> ?14.46 ? ?3.125996 ? ? ? ? ? 1 ? 4733589 ? ? ? ? ? open
> ?7.15 ? ?1.546311 ? ? ? ? ? 0 ? 5566988 ? ? ? ? ? setxattr
> ?5.89 ? ?1.273845 ? ? ? ? ? 0 ? 3626505 ? ? ? ? ? lstat
> ?5.81 ? ?1.255858 ? ? ? ? ? 1 ? 1667050 ? ? ? ? ? getdents
> ?5.66 ? ?1.224403 ? ? ? ? ? 0 ?13083022 ? ? ? ? ? write
> ?3.40 ? ?0.735114 ? ? ? ? ? 1 ? ?833371 ? ? ? ? ? mkdir
> ?1.96 ? ?0.424881 ? ? ? ? ? 0 ? 5566988 ? ? ? ? ? llistxattr
>
>
> Why so huge difference in the getdents timings?
>
> -Jacek

2012-02-29 14:07:49

by Jacek Luczak

[permalink] [raw]
Subject: Re: getdents - ext4 vs btrfs performance

2012/2/29 Jacek Luczak <[email protected]>:
> Hi Chris,
>
> the last one was borked :) Please check this one.
>
> -jacek
>
> 2012/2/29 Jacek Luczak <[email protected]>:
>> Hi All,
>>
>> /*Sorry for sending incomplete email, hit wrong button :) I guess I
>> can't use Gmail */
>>
>> Long story short: We've found that operations on a directory structure
>> holding many dirs takes ages on ext4.
>>
>> The Question: Why there's that huge difference in ext4 and btrfs? See
>> below test results for real values.
>>
>> Background: I had to backup a Jenkins directory holding workspace for
>> few projects which were co from svn (implies lot of extra .svn dirs).
>> The copy takes lot of time (at least more than I've expected) and
>> process was mostly in D (disk sleep). I've dig more and done some
>> extra test to see if this is not a regression on block/fs site. To
>> isolate the issue I've also performed same tests on btrfs.
>>
>> Test environment configuration:
>> 1) HW: HP ProLiant BL460 G6, 48 GB of memory, 2x 6 core Intel X5670 HT
>> enabled, Smart Array P410i, RAID 1 on top of 2x 10K RPM SAS HDDs.
>> 2) Kernels: All tests were done on following kernels:
>> ?- 2.6.39.4-3 -- the build ID (3) is used here for internal tacking of
>> config changes mostly. In -3 we've introduced ,,fix readahead pipeline
>> break caused by block plug'' patch. Otherwise it's pure 2.6.39.4.
>> ?- 3.2.7 -- latest kernel at the time of testing (3.2.8 has been
>> release recently).
>> 3) A subject of tests, directory holding:
>> ?- 54GB of data (measured on ext4)
>> ?- 1978149 files
>> ?- 844008 directories
>> 4) Mount options:
>> ?- ext4 -- errors=remount-ro,noatime,
>> data=writeback
>> ?- btrfs -- noatime,nodatacow and for later investigation on
>> copression effect: noatime,nodatacow,compress=lzo
>>
>> In all tests I've been measuring time of execution. Following tests
>> were performed:
>> - find . -type d
>> - find . -type f
>> - cp -a
>> - rm -rf
>>
>> Ext4 results:
>> | Type ? ? | 2.6.39.4-3 ? | 3.2.7
>> | Dir cnt ?| 17m 40sec ?| 11m 20sec
>> | File cnt | ?17m 36sec | 11m 22sec
>> | Copy ? ?| 1h 28m ? ? ? ?| 1h 27m
>> | Remove| 3m 43sec ? ?| 3m 38sec
>>
>> Btrfs results (without lzo comression):
>> | Type ? ? | 2.6.39.4-3 ? | 3.2.7
>> | Dir cnt ?| 2m 22sec ?| 2m 21sec
>> | File cnt | ?2m 26sec | 2m 23sec
>> | Copy ? ?| 36m 22sec | 39m 35sec
>> | Remove| 7m 51sec ? | 10m 43sec
>>
>> From above one can see that copy takes close to 1h less on btrfs. I've
>> done strace counting times of calls, results are as follows (from
>> 3.2.7):
>> 1) Ext4 (only to elements):
>> % time ? ? seconds ?usecs/call ? ? calls ? ?errors syscall
>> ------ ----------- ----------- --------- --------- ----------------
>> ?57.01 ? 13.257850 ? ? ? ? ? 1 ?15082163 ? ? ? ? ? read
>> ?23.40 ? ?5.440353 ? ? ? ? ? 3 ? 1687702 ? ? ? ? ? getdents
>> ?6.15 ? ?1.430559 ? ? ? ? ? 0 ? 3672418 ? ? ? ? ? lstat
>> ?3.80 ? ?0.883767 ? ? ? ? ? 0 ?13106961 ? ? ? ? ? write
>> ?2.32 ? ?0.539959 ? ? ? ? ? 0 ? 4794099 ? ? ? ? ? open
>> ?1.69 ? ?0.393589 ? ? ? ? ? 0 ? ?843695 ? ? ? ? ? mkdir
>> ?1.28 ? ?0.296700 ? ? ? ? ? 0 ? 5637802 ? ? ? ? ? setxattr
>> ?0.80 ? ?0.186539 ? ? ? ? ? 0 ? 7325195 ? ? ? ? ? stat
>>
>> 2) Btrfs:
>> % time ? ? seconds ?usecs/call ? ? calls ? ?errors syscall
>> ------ ----------- ----------- --------- --------- ----------------
>> 53.38 ? ?9.486210 ? ? ? ? ? 1 ?15179751 ? ? ? ? ? read
>> 11.38 ? ?2.021662 ? ? ? ? ? 1 ? 1688328 ? ? ? ? ? getdents
>> ?10.64 ? ?1.890234 ? ? ? ? ? 0 ? 4800317 ? ? ? ? ? open
>> ?6.83 ? ?1.213723 ? ? ? ? ? 0 ?13201590 ? ? ? ? ? write
>> ?4.85 ? ?0.862731 ? ? ? ? ? 0 ? 5644314 ? ? ? ? ? setxattr
>> ?3.50 ? ?0.621194 ? ? ? ? ? 1 ? ?844008 ? ? ? ? ? mkdir
>> ?2.75 ? ?0.489059 ? ? ? ? ? 0 ? 3675992 ? ? ? ? 1 lstat
>> ?1.71 ? ?0.303544 ? ? ? ? ? 0 ? 5644314 ? ? ? ? ? llistxattr
>> ?1.50 ? ?0.265943 ? ? ? ? ? 0 ? 1978149 ? ? ? ? ? utimes
>> ?1.02 ? ?0.180585 ? ? ? ? ? 0 ? 5644314 ? ?844008 getxattr
>>
>> On btrfs getdents takes much less time which prove the bottleneck in
>> copy time on ext4 is this syscall. In 2.6.39.4 it shows even less time
>> for getdents:
>> % time ? ? seconds ?usecs/call ? ? calls ? ?errors syscall
>> ------ ----------- ----------- --------- --------- ----------------
>> ?50.77 ? 10.978816 ? ? ? ? ? 1 ?15033132 ? ? ? ? ? read
>> ?14.46 ? ?3.125996 ? ? ? ? ? 1 ? 4733589 ? ? ? ? ? open
>> ?7.15 ? ?1.546311 ? ? ? ? ? 0 ? 5566988 ? ? ? ? ? setxattr
>> ?5.89 ? ?1.273845 ? ? ? ? ? 0 ? 3626505 ? ? ? ? ? lstat
>> ?5.81 ? ?1.255858 ? ? ? ? ? 1 ? 1667050 ? ? ? ? ? getdents
>> ?5.66 ? ?1.224403 ? ? ? ? ? 0 ?13083022 ? ? ? ? ? write
>> ?3.40 ? ?0.735114 ? ? ? ? ? 1 ? ?833371 ? ? ? ? ? mkdir
>> ?1.96 ? ?0.424881 ? ? ? ? ? 0 ? 5566988 ? ? ? ? ? llistxattr
>>
>>
>> Why so huge difference in the getdents timings?
>>
>> -Jacek

I will try to answer the question from the broken email I've sent.

@Lukas, it was always a fresh FS on top of LVM logical volume. I've
been cleaning cache/remounting to sync all data before (re)doing
tests.

-Jacek

BTW: Sorry for the email mixture. I just can't get this gmail thing to
work (why forcing top posting:/). Please use this thread.

2012-02-29 14:21:24

by Jacek Luczak

[permalink] [raw]
Subject: Re: getdents - ext4 vs btrfs performance

2012/2/29 Jacek Luczak <[email protected]>:
> 2012/2/29 Jacek Luczak <[email protected]>:
>> Hi Chris,
>>
>> the last one was borked :) Please check this one.
>>
>> -jacek
>>
>> 2012/2/29 Jacek Luczak <[email protected]>:
>>> Hi All,
>>>
>>> /*Sorry for sending incomplete email, hit wrong button :) I guess I
>>> can't use Gmail */
>>>
>>> Long story short: We've found that operations on a directory structure
>>> holding many dirs takes ages on ext4.
>>>
>>> The Question: Why there's that huge difference in ext4 and btrfs? See
>>> below test results for real values.
>>>
>>> Background: I had to backup a Jenkins directory holding workspace for
>>> few projects which were co from svn (implies lot of extra .svn dirs).
>>> The copy takes lot of time (at least more than I've expected) and
>>> process was mostly in D (disk sleep). I've dig more and done some
>>> extra test to see if this is not a regression on block/fs site. To
>>> isolate the issue I've also performed same tests on btrfs.
>>>
>>> Test environment configuration:
>>> 1) HW: HP ProLiant BL460 G6, 48 GB of memory, 2x 6 core Intel X5670 HT
>>> enabled, Smart Array P410i, RAID 1 on top of 2x 10K RPM SAS HDDs.
>>> 2) Kernels: All tests were done on following kernels:
>>> ?- 2.6.39.4-3 -- the build ID (3) is used here for internal tacking of
>>> config changes mostly. In -3 we've introduced ,,fix readahead pipeline
>>> break caused by block plug'' patch. Otherwise it's pure 2.6.39.4.
>>> ?- 3.2.7 -- latest kernel at the time of testing (3.2.8 has been
>>> release recently).
>>> 3) A subject of tests, directory holding:
>>> ?- 54GB of data (measured on ext4)
>>> ?- 1978149 files
>>> ?- 844008 directories
>>> 4) Mount options:
>>> ?- ext4 -- errors=remount-ro,noatime,
>>> data=writeback
>>> ?- btrfs -- noatime,nodatacow and for later investigation on
>>> copression effect: noatime,nodatacow,compress=lzo
>>>
>>> In all tests I've been measuring time of execution. Following tests
>>> were performed:
>>> - find . -type d
>>> - find . -type f
>>> - cp -a
>>> - rm -rf
>>>
>>> Ext4 results:
>>> | Type ? ? | 2.6.39.4-3 ? | 3.2.7
>>> | Dir cnt ?| 17m 40sec ?| 11m 20sec
>>> | File cnt | ?17m 36sec | 11m 22sec
>>> | Copy ? ?| 1h 28m ? ? ? ?| 1h 27m
>>> | Remove| 3m 43sec ? ?| 3m 38sec
>>>
>>> Btrfs results (without lzo comression):
>>> | Type ? ? | 2.6.39.4-3 ? | 3.2.7
>>> | Dir cnt ?| 2m 22sec ?| 2m 21sec
>>> | File cnt | ?2m 26sec | 2m 23sec
>>> | Copy ? ?| 36m 22sec | 39m 35sec
>>> | Remove| 7m 51sec ? | 10m 43sec
>>>
>>> From above one can see that copy takes close to 1h less on btrfs. I've
>>> done strace counting times of calls, results are as follows (from
>>> 3.2.7):
>>> 1) Ext4 (only to elements):
>>> % time ? ? seconds ?usecs/call ? ? calls ? ?errors syscall
>>> ------ ----------- ----------- --------- --------- ----------------
>>> ?57.01 ? 13.257850 ? ? ? ? ? 1 ?15082163 ? ? ? ? ? read
>>> ?23.40 ? ?5.440353 ? ? ? ? ? 3 ? 1687702 ? ? ? ? ? getdents
>>> ?6.15 ? ?1.430559 ? ? ? ? ? 0 ? 3672418 ? ? ? ? ? lstat
>>> ?3.80 ? ?0.883767 ? ? ? ? ? 0 ?13106961 ? ? ? ? ? write
>>> ?2.32 ? ?0.539959 ? ? ? ? ? 0 ? 4794099 ? ? ? ? ? open
>>> ?1.69 ? ?0.393589 ? ? ? ? ? 0 ? ?843695 ? ? ? ? ? mkdir
>>> ?1.28 ? ?0.296700 ? ? ? ? ? 0 ? 5637802 ? ? ? ? ? setxattr
>>> ?0.80 ? ?0.186539 ? ? ? ? ? 0 ? 7325195 ? ? ? ? ? stat
>>>
>>> 2) Btrfs:
>>> % time ? ? seconds ?usecs/call ? ? calls ? ?errors syscall
>>> ------ ----------- ----------- --------- --------- ----------------
>>> 53.38 ? ?9.486210 ? ? ? ? ? 1 ?15179751 ? ? ? ? ? read
>>> 11.38 ? ?2.021662 ? ? ? ? ? 1 ? 1688328 ? ? ? ? ? getdents
>>> ?10.64 ? ?1.890234 ? ? ? ? ? 0 ? 4800317 ? ? ? ? ? open
>>> ?6.83 ? ?1.213723 ? ? ? ? ? 0 ?13201590 ? ? ? ? ? write
>>> ?4.85 ? ?0.862731 ? ? ? ? ? 0 ? 5644314 ? ? ? ? ? setxattr
>>> ?3.50 ? ?0.621194 ? ? ? ? ? 1 ? ?844008 ? ? ? ? ? mkdir
>>> ?2.75 ? ?0.489059 ? ? ? ? ? 0 ? 3675992 ? ? ? ? 1 lstat
>>> ?1.71 ? ?0.303544 ? ? ? ? ? 0 ? 5644314 ? ? ? ? ? llistxattr
>>> ?1.50 ? ?0.265943 ? ? ? ? ? 0 ? 1978149 ? ? ? ? ? utimes
>>> ?1.02 ? ?0.180585 ? ? ? ? ? 0 ? 5644314 ? ?844008 getxattr
>>>
>>> On btrfs getdents takes much less time which prove the bottleneck in
>>> copy time on ext4 is this syscall. In 2.6.39.4 it shows even less time
>>> for getdents:
>>> % time ? ? seconds ?usecs/call ? ? calls ? ?errors syscall
>>> ------ ----------- ----------- --------- --------- ----------------
>>> ?50.77 ? 10.978816 ? ? ? ? ? 1 ?15033132 ? ? ? ? ? read
>>> ?14.46 ? ?3.125996 ? ? ? ? ? 1 ? 4733589 ? ? ? ? ? open
>>> ?7.15 ? ?1.546311 ? ? ? ? ? 0 ? 5566988 ? ? ? ? ? setxattr
>>> ?5.89 ? ?1.273845 ? ? ? ? ? 0 ? 3626505 ? ? ? ? ? lstat
>>> ?5.81 ? ?1.255858 ? ? ? ? ? 1 ? 1667050 ? ? ? ? ? getdents
>>> ?5.66 ? ?1.224403 ? ? ? ? ? 0 ?13083022 ? ? ? ? ? write
>>> ?3.40 ? ?0.735114 ? ? ? ? ? 1 ? ?833371 ? ? ? ? ? mkdir
>>> ?1.96 ? ?0.424881 ? ? ? ? ? 0 ? 5566988 ? ? ? ? ? llistxattr
>>>
>>>
>>> Why so huge difference in the getdents timings?
>>>
>>> -Jacek
>
> I will try to answer the question from the broken email I've sent.
>
> @Lukas, it was always a fresh FS on top of LVM logical volume. I've
> been cleaning cache/remounting to sync all data before (re)doing
> tests.
>
> -Jacek
>
> BTW: Sorry for the email mixture. I just can't get this gmail thing to
> work (why forcing top posting:/). Please use this thread.

More from the observations:
1) 10s dump of the process state during copy shows:
- Ext4: 526 probes done, 34 hits R state, 492 hits D state
- Btrfs (2.6.39.4): 218, 83, 135
- Btrfs (3.2.7): 238, 62, 174, 2 hit sleeping
2) dd write/read of 55GB file to/from volume:
- Ext4: write 127MB/s, read 107MB/s
- Btrfs: 110MB/s, read 176MB/s

-Jacek

2012-02-29 14:44:14

by Chris Mason

[permalink] [raw]
Subject: Re: getdents - ext4 vs btrfs performance

On Wed, Feb 29, 2012 at 03:07:45PM +0100, Jacek Luczak wrote:

[ btrfs faster than ext for find and cp -a ]

> 2012/2/29 Jacek Luczak <[email protected]>:
>
> I will try to answer the question from the broken email I've sent.
>
> @Lukas, it was always a fresh FS on top of LVM logical volume. I've
> been cleaning cache/remounting to sync all data before (re)doing
> tests.

The next step is to get cp -a out of the picture, in this case you're
benchmarking both the read speed and the write speed (what are you
copying to btw?).

Using tar cf /dev/zero <some_dir> is one way to get a consistent picture
of the read speed.

You can confirm the theory that it is directory order causing problems
by using acp to read the data.

http://oss.oracle.com/~mason/acp/acp-0.6.tar.bz2

-chris

2012-02-29 14:55:12

by Jacek Luczak

[permalink] [raw]
Subject: Re: getdents - ext4 vs btrfs performance

2012/2/29 Chris Mason <[email protected]>:
> On Wed, Feb 29, 2012 at 03:07:45PM +0100, Jacek Luczak wrote:
>
> [ btrfs faster than ext for find and cp -a ]
>
>> 2012/2/29 Jacek Luczak <[email protected]>:
>>
>> I will try to answer the question from the broken email I've sent.
>>
>> @Lukas, it was always a fresh FS on top of LVM logical volume. I've
>> been cleaning cache/remounting to sync all data before (re)doing
>> tests.
>
> The next step is to get cp -a out of the picture, in this case you're
> benchmarking both the read speed and the write speed (what are you
> copying to btw?).

It's simple cp -a Jenkins{,.bak} so dir to dir copy on same volume.

> Using tar cf /dev/zero <some_dir> is one way to get a consistent picture
> of the read speed.

IMO the problem is not - only - in read speed. The directory order hit
here. There's a difference in the sequential tests that place btrfs as
the winner but still this should not have that huge influence on
getdents. I know a bit on the difference between ext4 and btrfs
directory handling and I would not expect that huge difference. On the
production system where the issue has been observed doing some real
work in the background copy takes up to 4h.

For me btrfs looks perfect here, what could be worth checking is the
change of timing in syscall between 39.4 and 3.2.7. Before getdents
was not that high on the list while now it jumps to second position
but without huge impact on the timings.

> You can confirm the theory that it is directory order causing problems
> by using acp to read the data.
>
> http://oss.oracle.com/~mason/acp/acp-0.6.tar.bz2

Will check this still today and report back.

-jacek

2012-03-01 04:44:27

by Theodore Ts'o

[permalink] [raw]
Subject: Re: getdents - ext4 vs btrfs performance

You might try sorting the entries returned by readdir by inode number before you stat them. This is a long-standing weakness in ext3/ext4, and it has to do with how we added hashed tree indexes to directories in (a) a backwards compatible way, that (b) was POSIX compliant with respect to adding and removing directory entries concurrently with reading all of the directory entries using readdir.

You might try compiling spd_readdir from the e2fsprogs source tree (in the contrib directory):

http://git.kernel.org/?p=fs/ext2/e2fsprogs.git;a=blob;f=contrib/spd_readdir.c;h=f89832cd7146a6f5313162255f057c5a754a4b84;hb=d9a5d37535794842358e1cfe4faa4a89804ed209

? and then using that as a LD_PRELOAD, and see how that changes things.

The short version is that we can't easily do this in the kernel since it's a problem that primarily shows up with very big directories, and using non-swappable kernel memory to store all of the directory entries and then sort them so they can be returned in inode number just isn't practical. It is something which can be easily done in userspace, though, and a number of programs (including mutt for its Maildir support) does do, and it helps greatly for workloads where you are calling readdir() followed by something that needs to access the inode (i.e., stat, unlink, etc.)

-- Ted


On Feb 29, 2012, at 8:52 AM, Jacek Luczak wrote:

> Hi All,
>
> /*Sorry for sending incomplete email, hit wrong button :) I guess I
> can't use Gmail */
>
> Long story short: We've found that operations on a directory structure
> holding many dirs takes ages on ext4.
>
> The Question: Why there's that huge difference in ext4 and btrfs? See
> below test results for real values.
>
> Background: I had to backup a Jenkins directory holding workspace for
> few projects which were co from svn (implies lot of extra .svn dirs).
> The copy takes lot of time (at least more than I've expected) and
> process was mostly in D (disk sleep). I've dig more and done some
> extra test to see if this is not a regression on block/fs site. To
> isolate the issue I've also performed same tests on btrfs.
>
> Test environment configuration:
> 1) HW: HP ProLiant BL460 G6, 48 GB of memory, 2x 6 core Intel X5670 HT
> enabled, Smart Array P410i, RAID 1 on top of 2x 10K RPM SAS HDDs.
> 2) Kernels: All tests were done on following kernels:
> - 2.6.39.4-3 -- the build ID (3) is used here for internal tacking of
> config changes mostly. In -3 we've introduced ,,fix readahead pipeline
> break caused by block plug'' patch. Otherwise it's pure 2.6.39.4.
> - 3.2.7 -- latest kernel at the time of testing (3.2.8 has been
> release recently).
> 3) A subject of tests, directory holding:
> - 54GB of data (measured on ext4)
> - 1978149 files
> - 844008 directories
> 4) Mount options:
> - ext4 -- errors=remount-ro,noatime,
> data=writeback
> - btrfs -- noatime,nodatacow and for later investigation on
> copression effect: noatime,nodatacow,compress=lzo
>
> In all tests I've been measuring time of execution. Following tests
> were performed:
> - find . -type d
> - find . -type f
> - cp -a
> - rm -rf
>
> Ext4 results:
> | Type | 2.6.39.4-3 | 3.2.7
> | Dir cnt | 17m 40sec | 11m 20sec
> | File cnt | 17m 36sec | 11m 22sec
> | Copy | 1h 28m | 1h 27m
> | Remove| 3m 43sec | 3m 38sec
>
> Btrfs results (without lzo comression):
> | Type | 2.6.39.4-3 | 3.2.7
> | Dir cnt | 2m 22sec | 2m 21sec
> | File cnt | 2m 26sec | 2m 23sec
> | Copy | 36m 22sec | 39m 35sec
> | Remove| 7m 51sec | 10m 43sec
>
> From above one can see that copy takes close to 1h less on btrfs. I've
> done strace counting times of calls, results are as follows (from
> 3.2.7):
> 1) Ext4 (only to elements):
> % time seconds usecs/call calls errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> 57.01 13.257850 1 15082163 read
> 23.40 5.440353 3 1687702 getdents
> 6.15 1.430559 0 3672418 lstat
> 3.80 0.883767 0 13106961 write
> 2.32 0.539959 0 4794099 open
> 1.69 0.393589 0 843695 mkdir
> 1.28 0.296700 0 5637802 setxattr
> 0.80 0.186539 0 7325195 stat
>
> 2) Btrfs:
> % time seconds usecs/call calls errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> 53.38 9.486210 1 15179751 read
> 11.38 2.021662 1 1688328 getdents
> 10.64 1.890234 0 4800317 open
> 6.83 1.213723 0 13201590 write
> 4.85 0.862731 0 5644314 setxattr
> 3.50 0.621194 1 844008 mkdir
> 2.75 0.489059 0 3675992 1 lstat
> 1.71 0.303544 0 5644314 llistxattr
> 1.50 0.265943 0 1978149 utimes
> 1.02 0.180585 0 5644314 844008 getxattr
>
> On btrfs getdents takes much less time which prove the bottleneck in
> copy time on ext4 is this syscall. In 2.6.39.4 it shows even less time
> for getdents:
> % time seconds usecs/call calls errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> 50.77 10.978816 1 15033132 read
> 14.46 3.125996 1 4733589 open
> 7.15 1.546311 0 5566988 setxattr
> 5.89 1.273845 0 3626505 lstat
> 5.81 1.255858 1 1667050 getdents
> 5.66 1.224403 0 13083022 write
> 3.40 0.735114 1 833371 mkdir
> 1.96 0.424881 0 5566988 llistxattr
>
>
> Why so huge difference in the getdents timings?
>
> -Jacek
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html