LinuxLists.cc - Ext4 performance regression: Post 2.6.30

2010-03-29 06:25:24

Subject: Ext4 performance regression: Post 2.6.30

After 2.6.30 I am seeing large performance regressions on a raid setup.
I am working to publish a larger amount of data but I wanted to get some
quick data out about what I am seeing.

The test (FFSB test suite) I am running is basically random direct io
writes. The below data is from 128 threads all doing these random
writes. 1 and 32 thread results are not as drastically bad but 2.6.30
has the strongest results.

Under a mailserver workload I see similar performance impacts at this
same kernel change point. I hope to publish better data soon. Several
other workload types do not show this performance regression.

2.6.30:

Total Results
===============
Op Name Transactions Trans/sec % Trans % Op Weight
Throughput
======= ============ ========= ======= ===========
==========
write : 9015040 29561.46 100.000% 100.000%
115MB/sec
-
29561.46 Transactions per Second

Any kernel past 2.6.30. This is from 2.6.31-rc1:

Total Results
===============
Op Name Transactions Trans/sec % Trans % Op Weight
Throughput
======= ============ ========= ======= ===========
==========
write : 3185920 10120.50 100.000% 100.000%
39.5MB/sec

2010-03-29 15:10:59

by Greg Freemyer

[permalink] [raw]

Subject: Re: Ext4 performance regression: Post 2.6.30

On Mon, Mar 29, 2010 at 2:25 AM, Keith Mannthey <[email protected]> wrote:
>
>
> After 2.6.30 I am seeing large performance regressions on a raid setup.
> I am working to publish a larger amount of data but I wanted to get some
> quick data out about what I am seeing.
>

Is mdraid involved?

They added barrier support for some configs after 2.6.30 I believe.
It can cause a drastic perf change, but it increases reliability and
is "correct".

Greg

2010-03-31 01:56:08

by Keith Mannthey

[permalink] [raw]

Subject: Re: Ext4 performance regression: Post 2.6.30

On Mon, 2010-03-29 at 11:10 -0400, Greg Freemyer wrote:
> On Mon, Mar 29, 2010 at 2:25 AM, Keith Mannthey <[email protected]> wrote:
> >
> >
> > After 2.6.30 I am seeing large performance regressions on a raid setup.
> > I am working to publish a larger amount of data but I wanted to get some
> > quick data out about what I am seeing.
> >
>
> Is mdraid involved?
>
> They added barrier support for some configs after 2.6.30 I believe.
> It can cause a drastic perf change, but it increases reliability and
> is "correct".

lvm and device mapper are is involved. The git bisect just took me to:

374bf7e7f6cc38b0483351a2029a97910eadde1b is first bad commit
commit 374bf7e7f6cc38b0483351a2029a97910eadde1b
Author: Mikulas Patocka <[email protected]>
Date: Mon Jun 22 10:12:22 2009 +0100

dm: stripe support flush

Flush support for the stripe target.

This sets ti->num_flush_requests to the number of stripes and
remaps individual flush requests to the appropriate stripe devices.

Signed-off-by: Mikulas Patocka <[email protected]>
Signed-off-by: Alasdair G Kergon <[email protected]>

:040000 040000 542f4b9b442d1371c6534f333b7e00714ef98609 d490479b660139fc1b6b0ecd17bb58c9e00e597e M drivers

This may be correct behavior but the performance penalty in this test
case is pretty high.

I am going to move back to current kernels and starting looking into
ext4/dm flushing.

Thanks,
Keith Mannthey

2010-03-31 04:06:31

by Eric Sandeen

[permalink] [raw]

Subject: Re: Ext4 performance regression: Post 2.6.30

Keith Mannthey wrote:
> On Mon, 2010-03-29 at 11:10 -0400, Greg Freemyer wrote:
>> On Mon, Mar 29, 2010 at 2:25 AM, Keith Mannthey <[email protected]> wrote:
>>>
>>> After 2.6.30 I am seeing large performance regressions on a raid setup.
>>> I am working to publish a larger amount of data but I wanted to get some
>>> quick data out about what I am seeing.
>>>
>> Is mdraid involved?
>>
>> They added barrier support for some configs after 2.6.30 I believe.
>> It can cause a drastic perf change, but it increases reliability and
>> is "correct".
>
> lvm and device mapper are is involved. The git bisect just took me to:
>
> 374bf7e7f6cc38b0483351a2029a97910eadde1b is first bad commit
> commit 374bf7e7f6cc38b0483351a2029a97910eadde1b
> Author: Mikulas Patocka <[email protected]>
> Date: Mon Jun 22 10:12:22 2009 +0100
>
> dm: stripe support flush
>
> Flush support for the stripe target.
>
> This sets ti->num_flush_requests to the number of stripes and
> remaps individual flush requests to the appropriate stripe devices.
>
> Signed-off-by: Mikulas Patocka <[email protected]>
> Signed-off-by: Alasdair G Kergon <[email protected]>
>
> :040000 040000 542f4b9b442d1371c6534f333b7e00714ef98609 d490479b660139fc1b6b0ecd17bb58c9e00e597e M drivers
>
>
> This may be correct behavior but the performance penalty in this test
> case is pretty high.
>
> I am going to move back to current kernels and starting looking into
> ext4/dm flushing.

It would probably be interesting to do a mount -o nobarrier to see if
that makes the regression go away.

-Eric

> Thanks,
> Keith Mannthey

2010-03-31 22:02:27

by Keith Mannthey

[permalink] [raw]

Subject: Re: Ext4 performance regression: Post 2.6.30

On Tue, 2010-03-30 at 23:06 -0500, Eric Sandeen wrote:
> Keith Mannthey wrote:
> > On Mon, 2010-03-29 at 11:10 -0400, Greg Freemyer wrote:
> >> On Mon, Mar 29, 2010 at 2:25 AM, Keith Mannthey <[email protected]> wrote:
> >>>
> >>> After 2.6.30 I am seeing large performance regressions on a raid setup.
> >>> I am working to publish a larger amount of data but I wanted to get some
> >>> quick data out about what I am seeing.
> >>>
> >> Is mdraid involved?
> >>
> >> They added barrier support for some configs after 2.6.30 I believe.
> >> It can cause a drastic perf change, but it increases reliability and
> >> is "correct".
> >
> > lvm and device mapper are is involved. The git bisect just took me to:
> >
> > 374bf7e7f6cc38b0483351a2029a97910eadde1b is first bad commit
> > commit 374bf7e7f6cc38b0483351a2029a97910eadde1b
> > Author: Mikulas Patocka <[email protected]>
> > Date: Mon Jun 22 10:12:22 2009 +0100
> >
> > dm: stripe support flush
> >
> > Flush support for the stripe target.
> >
> > This sets ti->num_flush_requests to the number of stripes and
> > remaps individual flush requests to the appropriate stripe devices.
> >
> > Signed-off-by: Mikulas Patocka <[email protected]>
> > Signed-off-by: Alasdair G Kergon <[email protected]>
> >
> > :040000 040000 542f4b9b442d1371c6534f333b7e00714ef98609 d490479b660139fc1b6b0ecd17bb58c9e00e597e M drivers
> >
> >
> > This may be correct behavior but the performance penalty in this test
> > case is pretty high.
> >
> > I am going to move back to current kernels and starting looking into
> > ext4/dm flushing.
>
> It would probably be interesting to do a mount -o nobarrier to see if
> that makes the regression go away.

-o nobarrier takes the regression away with 2.6.34-rc3:

Default mount: ~27500

-o nobarrier: ~12500

Barriers on this setup cost ALOT during writes.

Interestingly as well the "mailserver" workload regression is also
removed by mounting with "-o nobarrier".

I am going to see what impact is seen on a single disk setup.

Thanks,
Keith Mannthey
LTC FS-Dev

2010-03-31 22:06:28

by Greg Freemyer

[permalink] [raw]

Subject: Re: Ext4 performance regression: Post 2.6.30

On Wed, Mar 31, 2010 at 6:02 PM, Keith Mannthey <[email protected]> wrote:
> On Tue, 2010-03-30 at 23:06 -0500, Eric Sandeen wrote:
>> Keith Mannthey wrote:
>> > On Mon, 2010-03-29 at 11:10 -0400, Greg Freemyer wrote:
>> >> On Mon, Mar 29, 2010 at 2:25 AM, Keith Mannthey <[email protected]> wrote:
>> >>>
>> >>> After 2.6.30 I am seeing large performance regressions on a raid setup.
>> >>> I am working to publish a larger amount of data but I wanted to get some
>> >>> quick data out about what I am seeing.
>> >>>
>> >> Is mdraid involved?
>> >>
>> >> They added barrier support for some configs after 2.6.30 I believe.
>> >> It can cause a drastic perf change, but it increases reliability and
>> >> is "correct".
>> >
>> > lvm and device mapper are is involved. ?The git bisect just took me to:
>> >
>> > 374bf7e7f6cc38b0483351a2029a97910eadde1b is first bad commit
>> > commit 374bf7e7f6cc38b0483351a2029a97910eadde1b
>> > Author: Mikulas Patocka <[email protected]>
>> > Date: ? Mon Jun 22 10:12:22 2009 +0100
>> >
>> > ? ? dm: stripe support flush
>> >
>> > ? ? Flush support for the stripe target.
>> >
>> > ? ? This sets ti->num_flush_requests to the number of stripes and
>> > ? ? remaps individual flush requests to the appropriate stripe devices.
>> >
>> > ? ? Signed-off-by: Mikulas Patocka <[email protected]>
>> > ? ? Signed-off-by: Alasdair G Kergon <[email protected]>
>> >
>> > :040000 040000 542f4b9b442d1371c6534f333b7e00714ef98609 d490479b660139fc1b6b0ecd17bb58c9e00e597e M ?drivers
>> >
>> >
>> > This may be correct behavior but the performance penalty in this test
>> > case is pretty high.
>> >
>> > I am going to move back to current kernels and starting looking into
>> > ext4/dm flushing.
>>
>> It would probably be interesting to do a mount -o nobarrier to see if
>> that makes the regression go away.
>
> -o nobarrier takes the regression away with 2.6.34-rc3:
>
> Default mount: ~27500
>
> -o nobarrier: ~12500
>
> Barriers on this setup cost ALOT during writes.
>
> Interestingly as well the "mailserver" workload regression is also
> removed by mounting with "-o nobarrier".
>
> I am going to see what impact is seen on a single disk setup.
>
> Thanks,
> ?Keith Mannthey
> ?LTC FS-Dev

I'm curious if your using an internal or external journal?

I'd guess the cost of barriers is much greater with an internal
journal, but I don't recall seeing any benchmarks one way or the
other.

Greg

2010-03-31 22:14:40

by Keith Mannthey

[permalink] [raw]

Subject: Re: Ext4 performance regression: Post 2.6.30

On Wed, 2010-03-31 at 18:06 -0400, Greg Freemyer wrote:
> On Wed, Mar 31, 2010 at 6:02 PM, Keith Mannthey <[email protected]> wrote:
> > On Tue, 2010-03-30 at 23:06 -0500, Eric Sandeen wrote:
> >> Keith Mannthey wrote:
> >> > On Mon, 2010-03-29 at 11:10 -0400, Greg Freemyer wrote:
> >> >> On Mon, Mar 29, 2010 at 2:25 AM, Keith Mannthey <[email protected]> wrote:
> >> >>>
> >> >>> After 2.6.30 I am seeing large performance regressions on a raid setup.
> >> >>> I am working to publish a larger amount of data but I wanted to get some
> >> >>> quick data out about what I am seeing.
> >> >>>
> >> >> Is mdraid involved?
> >> >>
> >> >> They added barrier support for some configs after 2.6.30 I believe.
> >> >> It can cause a drastic perf change, but it increases reliability and
> >> >> is "correct".
> >> >
> >> > lvm and device mapper are is involved. The git bisect just took me to:
> >> >
> >> > 374bf7e7f6cc38b0483351a2029a97910eadde1b is first bad commit
> >> > commit 374bf7e7f6cc38b0483351a2029a97910eadde1b
> >> > Author: Mikulas Patocka <[email protected]>
> >> > Date: Mon Jun 22 10:12:22 2009 +0100
> >> >
> >> > dm: stripe support flush
> >> >
> >> > Flush support for the stripe target.
> >> >
> >> > This sets ti->num_flush_requests to the number of stripes and
> >> > remaps individual flush requests to the appropriate stripe devices.
> >> >
> >> > Signed-off-by: Mikulas Patocka <[email protected]>
> >> > Signed-off-by: Alasdair G Kergon <[email protected]>
> >> >
> >> > :040000 040000 542f4b9b442d1371c6534f333b7e00714ef98609 d490479b660139fc1b6b0ecd17bb58c9e00e597e M drivers
> >> >
> >> >
> >> > This may be correct behavior but the performance penalty in this test
> >> > case is pretty high.
> >> >
> >> > I am going to move back to current kernels and starting looking into
> >> > ext4/dm flushing.
> >>
> >> It would probably be interesting to do a mount -o nobarrier to see if
> >> that makes the regression go away.
> >
> > -o nobarrier takes the regression away with 2.6.34-rc3:
> >
> > Default mount: ~27500
> >
> > -o nobarrier: ~12500
> >
> > Barriers on this setup cost ALOT during writes.
> >
> > Interestingly as well the "mailserver" workload regression is also
> > removed by mounting with "-o nobarrier".
> >
> > I am going to see what impact is seen on a single disk setup.
> >
> > Thanks,
> > Keith Mannthey
> > LTC FS-Dev
>
> I'm curious if your using an internal or external journal?

I am unsure. How do I tell? I am using defaults except with the -o
nobarrier. I know jdb2 is being used.

Thanks,
Keith

> I'd guess the cost of barriers is much greater with an internal
> journal, but I don't recall seeing any benchmarks one way or the
> other.
>
> Greg

2010-03-31 22:55:18

by Greg Freemyer

[permalink] [raw]

Subject: Re: Ext4 performance regression: Post 2.6.30

On Wed, Mar 31, 2010 at 6:14 PM, Keith Mannthey <[email protected]> wrote:
> On Wed, 2010-03-31 at 18:06 -0400, Greg Freemyer wrote:
>> On Wed, Mar 31, 2010 at 6:02 PM, Keith Mannthey <[email protected]> wrote:
>> > On Tue, 2010-03-30 at 23:06 -0500, Eric Sandeen wrote:
>> >> Keith Mannthey wrote:
>> >> > On Mon, 2010-03-29 at 11:10 -0400, Greg Freemyer wrote:
>> >> >> On Mon, Mar 29, 2010 at 2:25 AM, Keith Mannthey <[email protected]> wrote:
>> >> >>>
>> >> >>> After 2.6.30 I am seeing large performance regressions on a raid setup.
>> >> >>> I am working to publish a larger amount of data but I wanted to get some
>> >> >>> quick data out about what I am seeing.
>> >> >>>
>> >> >> Is mdraid involved?
>> >> >>
>> >> >> They added barrier support for some configs after 2.6.30 I believe.
>> >> >> It can cause a drastic perf change, but it increases reliability and
>> >> >> is "correct".
>> >> >
>> >> > lvm and device mapper are is involved. ?The git bisect just took me to:
>> >> >
>> >> > 374bf7e7f6cc38b0483351a2029a97910eadde1b is first bad commit
>> >> > commit 374bf7e7f6cc38b0483351a2029a97910eadde1b
>> >> > Author: Mikulas Patocka <[email protected]>
>> >> > Date: ? Mon Jun 22 10:12:22 2009 +0100
>> >> >
>> >> > ? ? dm: stripe support flush
>> >> >
>> >> > ? ? Flush support for the stripe target.
>> >> >
>> >> > ? ? This sets ti->num_flush_requests to the number of stripes and
>> >> > ? ? remaps individual flush requests to the appropriate stripe devices.
>> >> >
>> >> > ? ? Signed-off-by: Mikulas Patocka <[email protected]>
>> >> > ? ? Signed-off-by: Alasdair G Kergon <[email protected]>
>> >> >
>> >> > :040000 040000 542f4b9b442d1371c6534f333b7e00714ef98609 d490479b660139fc1b6b0ecd17bb58c9e00e597e M ?drivers
>> >> >
>> >> >
>> >> > This may be correct behavior but the performance penalty in this test
>> >> > case is pretty high.
>> >> >
>> >> > I am going to move back to current kernels and starting looking into
>> >> > ext4/dm flushing.
>> >>
>> >> It would probably be interesting to do a mount -o nobarrier to see if
>> >> that makes the regression go away.
>> >
>> > -o nobarrier takes the regression away with 2.6.34-rc3:
>> >
>> > Default mount: ~27500
>> >
>> > -o nobarrier: ~12500
>> >
>> > Barriers on this setup cost ALOT during writes.
>> >
>> > Interestingly as well the "mailserver" workload regression is also
>> > removed by mounting with "-o nobarrier".
>> >
>> > I am going to see what impact is seen on a single disk setup.
>> >
>> > Thanks,
>> > ?Keith Mannthey
>> > ?LTC FS-Dev
>>
>> I'm curious if your using an internal or external journal?
>
> I am unsure. ?How do I tell? ?I am using defaults except with the -o
> nobarrier. ? I know jdb2 is being used.
>
> Thanks,
> ?Keith

The default is internal. External requires a separate partition be
provided to hold the journal.

Since journals are typically very small relative to the overall
filesystem, a small raid 1 partition would be my production
recommendation to hold an external journal.

But for performance testing purposes, if you have a drive that is not
participating in your current raid setup, you can simply create a
small partition on it and use it to hold the external journal. I
believe you can convert your existing file system to an external
journal easily and without having to recreate your file system.

Greg