2014-06-16 06:02:08

by Tanya Brokhman

[permalink] [raw]
Subject: Quadrant write performance degradation - kernel3.10 vs kernel3.4

Hello,
Recently we encountered a performance degradation on 3.10kernel based
build, compared to 3.4 based one, when running the fs_write Quadrant
benchmark.
We profiled the test and came to the conclusion that the root cause of
the degradation is in the vfs_write call stack (overhead of 2611.2us is
observed in 3.10 kernel compared to 3.4):

ret_fast_syscall
SyS_write
vfs_write (total time spent: 3.10kernel-21295us, 3.4kernel-18683.79us)
do_sync_write
ext4_file_write
generic_file_aio_write (total time spent: 3.10kernel-19124.4us,
3.4kernel-16815us)
__generic_file_aio_write
generic_file_buffered_write
ext4_da_write_begin (total time spent: 3.10kernel-10935.2us,
3.4kernel-8444.6us)
__block_write_begin
ext4_da_get_block_prep (total time spent: 3.10kernel-5402.6us,
3.4kernel-3576.8us)
ext4_es_lookup_extent (total time spent: 3.10kernel-2219.7us,
3.4kernel-0us)


We tried to revert just the ext4 code back to 3.4 (on a 3.10 kernel)
build and got an improvement of 50% in the test result.
When looking deeper into the changes made to the ext4 FS between 3.4 and
3.10 versions we stumbled across two major features making an explicit
tradeoff in favor of robustness and good design over performance in some
use cases:

1) Metadata Checksums
http://kernelnewbies.org/Linux_3.5#head-e8ea0d70436ea63590eac3dc25a7b417333147f8
?As far as performance impact goes, it shouldn't be noticeable for
common desktop and server workloads. A mail server ffsb simulation show
nearly no change. On a test doing only file creation and deletion and
extent tree modifications, a performance drop of about 20 percent was
measured. However, it's a workload very heavily oriented towards
metadata, in most real-world workloads metadata is usually a small
fraction of total IO, so unless your workload is metadata-oriented, the
cost of enabling this feature should be negligible.?

2) Extents status tracking:
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/fs/ext4/extents_status.c?id=refs/tags/v3.10.42#n20
?There is a cache extent for write access, so if writes are not very
random, adding space operations are in O(1) time.?

We tried pick up several performance-enhancement patches from the
community, released between 3.10 and 3.14 kernel versions. The
performance was almost the same.

I was wondering what performance tests were performed on these features?
Has anyone encountered same issue?

Best Regards
Tanya Brokhman
--
QUALCOMM ISRAEL, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


2014-06-16 19:20:17

by Darrick J. Wong

[permalink] [raw]
Subject: Re: Quadrant write performance degradation - kernel3.10 vs kernel3.4

On Mon, Jun 16, 2014 at 09:02:08AM +0300, Tanya Brokhman wrote:
> Hello,
> Recently we encountered a performance degradation on 3.10kernel
> based build, compared to 3.4 based one, when running the fs_write
> Quadrant benchmark.
> We profiled the test and came to the conclusion that the root cause
> of the degradation is in the vfs_write call stack (overhead of
> 2611.2us is observed in 3.10 kernel compared to 3.4):
>
> ret_fast_syscall
> SyS_write
> vfs_write (total time spent: 3.10kernel-21295us, 3.4kernel-18683.79us)
> do_sync_write
> ext4_file_write
> generic_file_aio_write (total time spent: 3.10kernel-19124.4us,
> 3.4kernel-16815us)
> __generic_file_aio_write
> generic_file_buffered_write
> ext4_da_write_begin (total time spent: 3.10kernel-10935.2us,
> 3.4kernel-8444.6us)
> __block_write_begin
> ext4_da_get_block_prep (total time spent: 3.10kernel-5402.6us,
> 3.4kernel-3576.8us)
> ext4_es_lookup_extent (total time spent: 3.10kernel-2219.7us,
> 3.4kernel-0us)
>
>
> We tried to revert just the ext4 code back to 3.4 (on a 3.10 kernel)
> build and got an improvement of 50% in the test result.
> When looking deeper into the changes made to the ext4 FS between 3.4
> and 3.10 versions we stumbled across two major features making an
> explicit tradeoff in favor of robustness and good design over
> performance in some use cases:
>
> 1) Metadata Checksums http://kernelnewbies.org/Linux_3.5#head-e8ea0d70436ea63590eac3dc25a7b417333147f8
> “As far as performance impact goes, it shouldn't be noticeable for
> common desktop and server workloads. A mail server ffsb simulation
> show nearly no change. On a test doing only file creation and
> deletion and extent tree modifications, a performance drop of about
> 20 percent was measured. However, it's a workload very heavily
> oriented towards metadata, in most real-world workloads metadata is
> usually a small fraction of total IO, so unless your workload is
> metadata-oriented, the cost of enabling this feature should be
> negligible.”

Dumb question, but do you have metadata_csum enabled? That would be a little
surprising, since (afaik) the only way you can turn it on is via unreleased
e2fsprogs-1.43.

(Otoh if you /do/ have it enabled and it's slowing you down, I'd like to hear
about it. ;))

> 2) Extents status tracking: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/fs/ext4/extents_status.c?id=refs/tags/v3.10.42#n20
> “There is a cache extent for write access, so if writes are not very
> random, adding space operations are in O(1) time.”

I'm no expert on the extent status cache, but this seems like a possible cause.

--D
>
> We tried pick up several performance-enhancement patches from the
> community, released between 3.10 and 3.14 kernel versions. The
> performance was almost the same.
>
> I was wondering what performance tests were performed on these
> features? Has anyone encountered same issue?
>
> Best Regards
> Tanya Brokhman
> --
> QUALCOMM ISRAEL, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-06-17 07:52:56

by Lukas Czerner

[permalink] [raw]
Subject: Re: Quadrant write performance degradation - kernel3.10 vs kernel3.4

On Mon, 16 Jun 2014, Darrick J. Wong wrote:

> Date: Mon, 16 Jun 2014 12:20:09 -0700
> From: Darrick J. Wong <[email protected]>
> To: Tanya Brokhman <[email protected]>
> Cc: [email protected], [email protected],
> [email protected], [email protected],
> Dolev Raviv <[email protected]>
> Subject: Re: Quadrant write performance degradation - kernel3.10 vs kernel3.4
>
> On Mon, Jun 16, 2014 at 09:02:08AM +0300, Tanya Brokhman wrote:
> > Hello,
> > Recently we encountered a performance degradation on 3.10kernel
> > based build, compared to 3.4 based one, when running the fs_write
> > Quadrant benchmark.
> > We profiled the test and came to the conclusion that the root cause
> > of the degradation is in the vfs_write call stack (overhead of
> > 2611.2us is observed in 3.10 kernel compared to 3.4):
> >
> > ret_fast_syscall
> > SyS_write
> > vfs_write (total time spent: 3.10kernel-21295us, 3.4kernel-18683.79us)
> > do_sync_write
> > ext4_file_write
> > generic_file_aio_write (total time spent: 3.10kernel-19124.4us,
> > 3.4kernel-16815us)
> > __generic_file_aio_write
> > generic_file_buffered_write
> > ext4_da_write_begin (total time spent: 3.10kernel-10935.2us,
> > 3.4kernel-8444.6us)
> > __block_write_begin
> > ext4_da_get_block_prep (total time spent: 3.10kernel-5402.6us,
> > 3.4kernel-3576.8us)
> > ext4_es_lookup_extent (total time spent: 3.10kernel-2219.7us,
> > 3.4kernel-0us)
> >
> >
> > We tried to revert just the ext4 code back to 3.4 (on a 3.10 kernel)
> > build and got an improvement of 50% in the test result.
> > When looking deeper into the changes made to the ext4 FS between 3.4
> > and 3.10 versions we stumbled across two major features making an
> > explicit tradeoff in favor of robustness and good design over
> > performance in some use cases:
> >
> > 1) Metadata Checksums http://kernelnewbies.org/Linux_3.5#head-e8ea0d70436ea63590eac3dc25a7b417333147f8
> > “As far as performance impact goes, it shouldn't be noticeable for
> > common desktop and server workloads. A mail server ffsb simulation
> > show nearly no change. On a test doing only file creation and
> > deletion and extent tree modifications, a performance drop of about
> > 20 percent was measured. However, it's a workload very heavily
> > oriented towards metadata, in most real-world workloads metadata is
> > usually a small fraction of total IO, so unless your workload is
> > metadata-oriented, the cost of enabling this feature should be
> > negligible.”
>
> Dumb question, but do you have metadata_csum enabled? That would be a little
> surprising, since (afaik) the only way you can turn it on is via unreleased
> e2fsprogs-1.43.
>
> (Otoh if you /do/ have it enabled and it's slowing you down, I'd like to hear
> about it. ;))
>
> > 2) Extents status tracking: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/fs/ext4/extents_status.c?id=refs/tags/v3.10.42#n20
> > “There is a cache extent for write access, so if writes are not very
> > random, adding space operations are in O(1) time.”
>
> I'm no expert on the extent status cache, but this seems like a possible cause.

Exactly, there has been some fixes since the introduction of extent
status tree, however I've noticed some performance going down as
well and I believe that extent status tree is to blame.

AFAIK you can not turn it off in any way, but there might be some
way to test it's overhead. Zheng, do you have any suggestions ?

Thanks!
-Lukas

>
> --D
> >
> > We tried pick up several performance-enhancement patches from the
> > community, released between 3.10 and 3.14 kernel versions. The
> > performance was almost the same.
> >
> > I was wondering what performance tests were performed on these
> > features? Has anyone encountered same issue?
> >
> > Best Regards
> > Tanya Brokhman
> > --
> > QUALCOMM ISRAEL, on behalf of Qualcomm Innovation Center, Inc. is a member
> > of Code Aurora Forum, hosted by The Linux Foundation
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2014-06-20 02:36:15

by Zheng Liu

[permalink] [raw]
Subject: Re: Quadrant write performance degradation - kernel3.10 vs kernel3.4

On Tue, Jun 17, 2014 at 09:52:46AM +0200, Lukáš Czerner wrote:
> On Mon, 16 Jun 2014, Darrick J. Wong wrote:
>
> > Date: Mon, 16 Jun 2014 12:20:09 -0700
> > From: Darrick J. Wong <[email protected]>
> > To: Tanya Brokhman <[email protected]>
> > Cc: [email protected], [email protected],
> > [email protected], [email protected],
> > Dolev Raviv <[email protected]>
> > Subject: Re: Quadrant write performance degradation - kernel3.10 vs kernel3.4
> >
> > On Mon, Jun 16, 2014 at 09:02:08AM +0300, Tanya Brokhman wrote:
> > > Hello,
> > > Recently we encountered a performance degradation on 3.10kernel
> > > based build, compared to 3.4 based one, when running the fs_write
> > > Quadrant benchmark.
> > > We profiled the test and came to the conclusion that the root cause
> > > of the degradation is in the vfs_write call stack (overhead of
> > > 2611.2us is observed in 3.10 kernel compared to 3.4):
> > >
> > > ret_fast_syscall
> > > SyS_write
> > > vfs_write (total time spent: 3.10kernel-21295us, 3.4kernel-18683.79us)
> > > do_sync_write
> > > ext4_file_write
> > > generic_file_aio_write (total time spent: 3.10kernel-19124.4us,
> > > 3.4kernel-16815us)
> > > __generic_file_aio_write
> > > generic_file_buffered_write
> > > ext4_da_write_begin (total time spent: 3.10kernel-10935.2us,
> > > 3.4kernel-8444.6us)
> > > __block_write_begin
> > > ext4_da_get_block_prep (total time spent: 3.10kernel-5402.6us,
> > > 3.4kernel-3576.8us)
> > > ext4_es_lookup_extent (total time spent: 3.10kernel-2219.7us,
> > > 3.4kernel-0us)
[snip]
> > > 2) Extents status tracking: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/fs/ext4/extents_status.c?id=refs/tags/v3.10.42#n20
> > > “There is a cache extent for write access, so if writes are not very
> > > random, adding space operations are in O(1) time.”
> >
> > I'm no expert on the extent status cache, but this seems like a possible cause.
>
> Exactly, there has been some fixes since the introduction of extent
> status tree, however I've noticed some performance going down as
> well and I believe that extent status tree is to blame.
>
> AFAIK you can not turn it off in any way, but there might be some
> way to test it's overhead. Zheng, do you have any suggestions ?

Sigh, sorry for the delay reply.

Lukas, Could you please share your test with me? From the calltrace it
seems that the latency is in ext4_da_get_block_prep. It is not easy to
disable ext4_es_lookup_extent() because we need to lookup delayed extent
from extent status tree and determine whether or not we need to reserve
some disk spaces.

Tanya, I really appreciate if you can disable delalloc and re-run your
test. You can use the following command to turn off the delalloc
feature.

$ sudo mount -t ext4 -o remount,nodelalloc ${DEV} ${MNT}

Thanks,
- Zheng

>
> Thanks!
> -Lukas
>
> >
> > --D
> > >
> > > We tried pick up several performance-enhancement patches from the
> > > community, released between 3.10 and 3.14 kernel versions. The
> > > performance was almost the same.
> > >
> > > I was wondering what performance tests were performed on these
> > > features? Has anyone encountered same issue?
> > >
> > > Best Regards
> > > Tanya Brokhman
> > > --
> > > QUALCOMM ISRAEL, on behalf of Qualcomm Innovation Center, Inc. is a member
> > > of Code Aurora Forum, hosted by The Linux Foundation
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > > the body of a message to [email protected]
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >

2014-07-01 07:07:18

by Dolev Raviv

[permalink] [raw]
Subject: Re: Quadrant write performance degradation - kernel3.10 vs kernel3.4


On 06/20/2014 05:36 AM, Zheng Liu wrote:
> On Tue, Jun 17, 2014 at 09:52:46AM +0200, Lukáš Czerner wrote:
>> On Mon, 16 Jun 2014, Darrick J. Wong wrote:
>>
>>> Date: Mon, 16 Jun 2014 12:20:09 -0700
>>> From: Darrick J. Wong <[email protected]>
>>> To: Tanya Brokhman <[email protected]>
>>> Cc: [email protected], [email protected],
>>> [email protected], [email protected],
>>> Dolev Raviv <[email protected]>
>>> Subject: Re: Quadrant write performance degradation - kernel3.10 vs kernel3.4
>>>
>>> On Mon, Jun 16, 2014 at 09:02:08AM +0300, Tanya Brokhman wrote:
>>>> Hello,
>>>> Recently we encountered a performance degradation on 3.10kernel
>>>> based build, compared to 3.4 based one, when running the fs_write
>>>> Quadrant benchmark.
>>>> We profiled the test and came to the conclusion that the root cause
>>>> of the degradation is in the vfs_write call stack (overhead of
>>>> 2611.2us is observed in 3.10 kernel compared to 3.4):
>>>>
>>>> ret_fast_syscall
>>>> SyS_write
>>>> vfs_write (total time spent: 3.10kernel-21295us, 3.4kernel-18683.79us)
>>>> do_sync_write
>>>> ext4_file_write
>>>> generic_file_aio_write (total time spent: 3.10kernel-19124.4us,
>>>> 3.4kernel-16815us)
>>>> __generic_file_aio_write
>>>> generic_file_buffered_write
>>>> ext4_da_write_begin (total time spent: 3.10kernel-10935.2us,
>>>> 3.4kernel-8444.6us)
>>>> __block_write_begin
>>>> ext4_da_get_block_prep (total time spent: 3.10kernel-5402.6us,
>>>> 3.4kernel-3576.8us)
>>>> ext4_es_lookup_extent (total time spent: 3.10kernel-2219.7us,
>>>> 3.4kernel-0us)
> [snip]
>>>> 2) Extents status tracking: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/fs/ext4/extents_status.c?id=refs/tags/v3.10.42#n20
>>>> “There is a cache extent for write access, so if writes are not very
>>>> random, adding space operations are in O(1) time.”
>>> I'm no expert on the extent status cache, but this seems like a possible cause.
>> Exactly, there has been some fixes since the introduction of extent
>> status tree, however I've noticed some performance going down as
>> well and I believe that extent status tree is to blame.
>>
>> AFAIK you can not turn it off in any way, but there might be some
>> way to test it's overhead. Zheng, do you have any suggestions ?
> Sigh, sorry for the delay reply.
>
> Lukas, Could you please share your test with me? From the calltrace it
> seems that the latency is in ext4_da_get_block_prep. It is not easy to
> disable ext4_es_lookup_extent() because we need to lookup delayed extent
> from extent status tree and determine whether or not we need to reserve
> some disk spaces.
>
> Tanya, I really appreciate if you can disable delalloc and re-run your
> test. You can use the following command to turn off the delalloc
> feature.
>
> $ sudo mount -t ext4 -o remount,nodelalloc ${DEV} ${MNT}
>
> Thanks,
> - Zheng
Thanks Zheng, Lukas and all for your help.
Zheng, we have tested with the delalloc feature turned off. We didn't
notice any Improvement.

Any other suggestions :) , or other thought regarding this?
>> Thanks!
>> -Lukas
>>
>>> --D
>>>> We tried pick up several performance-enhancement patches from the
>>>> community, released between 3.10 and 3.14 kernel versions. The
>>>> performance was almost the same.
>>>>
>>>> I was wondering what performance tests were performed on these
>>>> features? Has anyone encountered same issue?
>>>>
>>>> Best Regards
>>>> Tanya Brokhman
>>>> --
>>>> QUALCOMM ISRAEL, on behalf of Qualcomm Innovation Center, Inc. is a member
>>>> of Code Aurora Forum, hosted by The Linux Foundation
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>>>> the body of a message to [email protected]
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>