2006-05-19 09:35:14

by Jan Engelhardt

[permalink] [raw]
Subject: XFS write speed drop

Hello,


I have noticed that after an upgrade from 2.6.16-rcX -> 2.6.17-rc4, writes
to one (hdc) xfs filesystem have become significantly slower (factor 6 to
8), like if -o sync (or funky journal options on ext3) was on. Also, reads
would stall until writes have completed. I would only expect such behavior
when /proc/sys/vm/dirty_* are set to high values (like 95%, like I do on a
notebook). hda remained fast.

It eventually turned out that it are the log barriers; hda does not support
barriers and XFS gave a nice kernel message hint that pointed me to try out
a mount flag. -o nobarrier makes it fast again.




Jan Engelhardt
--


2006-05-22 00:53:55

by Nathan Scott

[permalink] [raw]
Subject: Re: XFS write speed drop

On Fri, May 19, 2006 at 11:34:28AM +0200, Jan Engelhardt wrote:
> Hello,

Hi Jan,

> I have noticed that after an upgrade from 2.6.16-rcX -> 2.6.17-rc4, writes
> to one (hdc) xfs filesystem have become significantly slower (factor 6 to

Buffered writes? Direct writes? Sync writes? Log writes?
You're a bit light on details here.

Can you send the benchmark results themselves please? (as in,
the test(s) you've run that lead you to see 6-8x, and the data
those tests produced). Also, xfs_info output, and maybe list
the device driver(s) involved here too.

thanks!

--
Nathan

2006-05-22 12:22:38

by Jan Engelhardt

[permalink] [raw]
Subject: Re: XFS write speed drop

Hi Nathan,


>> I have noticed that after an upgrade from 2.6.16-rcX -> 2.6.17-rc4, writes
>> to one (hdc) xfs filesystem have become significantly slower (factor 6 to
>
>Buffered writes? Direct writes? Sync writes? Log writes?

How would I know? :) "The standard thing".
I shuffled a bit in the source code and the word 'order' is quite often
around 'barrier', so I thought the 'barrier' option makes the IO
scheduler strictly ordered, that is, write log first, then data. (like
ext3's data=ordered). Anyhow, just see below

>You're a bit light on details here.
>
>Can you send the benchmark results themselves please? (as in,
>the test(s) you've run that lead you to see 6-8x, and the data
>those tests produced). Also, xfs_info output, and maybe list
>the device driver(s) involved here too.

Drivers
=======
xfs.ko sis5513.ko ide_disk.ko ide_core.ko

xfs_info
========
14:20 shanghai:/root # xfs_info /
meta-data=/dev/hda3 isize=256 agcount=19, agsize=462621 blks
= sectsz=512 attr=1
data = bsize=4096 blocks=8586742, imaxpct=25
= sunit=0 swidth=0 blks, unwritten=1
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=3614, version=1
= sectsz=512 sunit=0 blks
realtime =none extsz=65536 blocks=0, rtextents=0
14:20 shanghai:/root # xfs_info /D
meta-data=/dev/hdc2 isize=256 agcount=16, agsize=3821963 blks
= sectsz=512 attr=1
data = bsize=4096 blocks=61151408, imaxpct=25
= sunit=0 swidth=0 blks, unwritten=1
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=29859, version=1
= sectsz=512 sunit=0 blks
realtime =none extsz=65536 blocks=0, rtextents=0
14:20 shanghai:/root # fdisk -l /dev/hda

Disk /dev/hda: 40.0 GB, 40020664320 bytes
255 heads, 63 sectors/track, 4865 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/hda1 1 64 514048+ 82 Linux swap / Solaris
/dev/hda2 * 65 588 4209030 c W95 FAT32 (LBA)
/dev/hda3 590 4865 34346970 83 Linux
14:21 shanghai:/root # fdisk -l /dev/hdc

Disk /dev/hdc: 251.0 GB, 251000193024 bytes
255 heads, 63 sectors/track, 30515 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/hdc1 1 63 506016 82 Linux swap / Solaris
/dev/hdc2 64 30515 244605690 83 Linux



So here are the results. I have run these tests from single-user mode (-b
bootflag) so that no other userspace programs would interfere.

Script started on Mon May 22 15:32:29 2006
15:32 (none):~ # mount /dev/hdc2 /D -o barrier
15:32 (none):~ # df -Th / /D
Filesystem Type Size Used Avail Use% Mounted on
/dev/hda3 xfs 33G 6.7G 27G 21% /
/dev/hdc2 xfs 234G 142G 92G 61% /D
15:33 (none):~ # cat /proc/mounts
rootfs / rootfs rw 0 0
udev /dev tmpfs rw 0 0
/dev/hda3 / xfs rw 0 0
devpts /dev/pts devpts rw 0 0
/dev/hdc2 /D xfs rw 0 0
proc /proc proc rw 0 0

A small nitpick BTW, the barrier flag is not listed in /proc/mounts.

hda is always in nobarrier mode.
<6>hda: cache flushes not supported
<6>hdc: cache flushes supported
<5>Filesystem "hda3": Disabling barriers, not supported by the underlying
device
hdc can do barriers if wanted.


CASE 1: Copying from one disk to another
========================================
Copying a compiled 2.6.17-rc4 tree; 306907 KB in 28566 files in 2090
directories.

15:33 (none):~ # cd /tmp
15:33 (none):/tmp # time rsync -PHSav kernel-source/ /D/kernel-0/ >/dev/null

real 7m31.776s
user 0m7.030s
sys 0m14.450s

I rebooted here (only here - no more reboots below) to get a cold slate.

15:42 (none):~ # mount /dev/hdc2 /D -o nobarrier
15:42 (none):/tmp # rsync -PHSav kernel-source/ /D/kernel-1/ >/dev/null

real 2m14.401s
user 0m7.290s
sys 0m14.900s
15:45 (none):/tmp # umount /D


CASE 2: Removing
================
Remove the copies we created in case 1.

15:45 (none):/tmp # mount /dev/hdc2 /D -o barrier
15:45 (none):/tmp # cd /D
15:45 (none):/D # time rm -Rf kernel-0

real 3m31.901s
user 0m0.050s
sys 0m3.140s
15:49 (none):/D # cd /
15:49 (none):/ # umount /D
15:49 (none):/ # mount /dev/hdc2 /D -o nobarrier
15:49 (none):/ # cd /D
15:49 (none):/D # time rm -Rf kernel-1

real 0m53.471s
user 0m0.070s
sys 0m1.990s
15:50 (none):/D # cd /
15:50 (none):/ # umount /D


CASE 3: Copying to the same disk
================================
Copy a file from hdc2 to hdc2. The result is _extremely_ interesting. So the
'barrier' thing is only relevant for large sets of files, it seems.

15:50 (none):/ # mount /dev/hdc2 /D -o barrier
15:50 (none):/ # cd /D/Video
15:50 (none):/D/Video # ls -l The*
-rw-r--r-- 1 jengelh users 6861772800 Mar 17 23:02 TLK3.iso
15:50 (none):/D/Video # time cp TLK3.iso ../tlk-0.iso

real 8m1.051s
user 0m0.670s
sys 0m41.120s
15:59 (none):/D/Video # cd /
15:59 (none):/ # umount /D
15:59 (none):/ # mount /dev/hdc2 /D -o nobarrier
15:59 (none):/ # Kcd /D/Video
15:59 (none):/D/Video # time cp TLK3.iso ../tlk-1.iso

real 7m53.275s
user 0m0.560s
sys 0m40.010s
16:07 (none):/D/Video # cd /
16:08 (none):/ # umount /D
16:08 (none):/ # exit

Script done on Mon May 22 16:08:09 2006



Jan Engelhardt
--

2006-05-22 22:43:34

by Nathan Scott

[permalink] [raw]
Subject: Re: XFS write speed drop

Hi Jan,

On Mon, May 22, 2006 at 02:21:48PM +0200, Jan Engelhardt wrote:
> I shuffled a bit in the source code and the word 'order' is quite often
> around 'barrier', so I thought the 'barrier' option makes the IO
> scheduler strictly ordered, that is, write log first, then data. (like

Not quite, theres an explanation here:
http://oss.sgi.com/projects/xfs/faq.html#wcache

> >Can you send the benchmark results themselves please? (as in,
> >the test(s) you've run that lead you to see 6-8x, and the data
> >those tests produced). Also, xfs_info output, and maybe list
> >the device driver(s) involved here too.

Thanks!

> xfs_info
> ========
> 14:20 shanghai:/root # xfs_info /
> meta-data=/dev/hda3 isize=256 agcount=19, agsize=462621 blks
> = sectsz=512 attr=1
> data = bsize=4096 blocks=8586742, imaxpct=25
> = sunit=0 swidth=0 blks, unwritten=1
> naming =version 2 bsize=4096
> log =internal bsize=4096 blocks=3614, version=1
> = sectsz=512 sunit=0 blks
> realtime =none extsz=65536 blocks=0, rtextents=0
> 14:20 shanghai:/root # xfs_info /D
> meta-data=/dev/hdc2 isize=256 agcount=16, agsize=3821963 blks
> = sectsz=512 attr=1
> data = bsize=4096 blocks=61151408, imaxpct=25
> = sunit=0 swidth=0 blks, unwritten=1
> naming =version 2 bsize=4096
> log =internal bsize=4096 blocks=29859, version=1
> = sectsz=512 sunit=0 blks
> realtime =none extsz=65536 blocks=0, rtextents=0
> ...
> Script started on Mon May 22 15:32:29 2006
> 15:32 (none):~ # mount /dev/hdc2 /D -o barrier
> 15:32 (none):~ # df -Th / /D
> Filesystem Type Size Used Avail Use% Mounted on
> /dev/hda3 xfs 33G 6.7G 27G 21% /
> /dev/hdc2 xfs 234G 142G 92G 61% /D

> A small nitpick BTW, the barrier flag is not listed in /proc/mounts.

Its the default, and so we don't list that. Quirk of the code I guess,
we could add code to distinguish default from requested but it doesn't
seem worth it really.

> hda is always in nobarrier mode.
> <6>hda: cache flushes not supported
> <6>hdc: cache flushes supported
> <5>Filesystem "hda3": Disabling barriers, not supported by the underlying
> device
> hdc can do barriers if wanted.
>
> CASE 1: Copying from one disk to another
> ========================================
> Copying a compiled 2.6.17-rc4 tree; 306907 KB in 28566 files in 2090
> directories.

OK, we can call this a metadata intensive workload - lots of small
files, lots of creates. Barriers will hurt the most here, as we'd
already have been log I/O bound most likely, and I'd expect barriers
to only slow that further.

> 15:33 (none):/tmp # time rsync -PHSav kernel-source/ /D/kernel-0/ >/dev/null
> real 7m31.776s
> user 0m7.030s
> sys 0m14.450s

versus:

> 15:42 (none):/tmp # rsync -PHSav kernel-source/ /D/kernel-1/ >/dev/null
> real 2m14.401s
> user 0m7.290s
> sys 0m14.900s

Yep, note the user/sys shows no change, we're basically IO bound in
both tests, and barriers are hurting (as expected).

You may be able to combat this by switching to version 2 log format
(see the xfs_db version command in the _very_ latest xfs_db, as of a
few days ago, which allows this switch to be made on an unmounted fs).
Then you will be able to use larger incore log buffers, and this
should reduce the impact the barriers have (fewer, larger log IOs, so
fewer barriers).

> CASE 2: Removing
> ================
> Remove the copies we created in case 1.
>
> 15:45 (none):/tmp # mount /dev/hdc2 /D -o barrier
> 15:45 (none):/D # time rm -Rf kernel-0
> real 3m31.901s
> user 0m0.050s
> sys 0m3.140s

versus:

> 15:49 (none):/ # mount /dev/hdc2 /D -o nobarrier
> 15:49 (none):/D # time rm -Rf kernel-1
>
> real 0m53.471s
> user 0m0.070s
> sys 0m1.990s
> 15:50 (none):/D # cd /
> 15:50 (none):/ # umount /D

Also metadata intensive, of course. All the same issues as above,
and the same techniques should be used to address it.

Odd that the system time jumped here. Roughly the same decrease in
performance though (3-4x).

> CASE 3: Copying to the same disk
> ================================
> Copy a file from hdc2 to hdc2. The result is _extremely_ interesting. So the
> 'barrier' thing is only relevant for large sets of files, it seems.

Its consistent, and easily understood however...

> 15:50 (none):/ # mount /dev/hdc2 /D -o barrier
> 15:50 (none):/D/Video # ls -l The*
> -rw-r--r-- 1 jengelh users 6861772800 Mar 17 23:02 TLK3.iso
> 15:50 (none):/D/Video # time cp TLK3.iso ../tlk-0.iso
> real 8m1.051s
> user 0m0.670s
> sys 0m41.120s

versus:

> 15:59 (none):/ # mount /dev/hdc2 /D -o nobarrier
> 15:59 (none):/D/Video # time cp TLK3.iso ../tlk-1.iso
> real 7m53.275s
> user 0m0.560s
> sys 0m40.010s

This one is a file throughput workload, and so there are far fewer
metadata updates involved here. This means far fewer barrier
operations (far fewer log writes), so we should expect the results
to be as they are - slightly worse with barriers enabled, nothing
drastic though.

So, I agree, you're seeing the cost of write barriers here, but I
don't see anything unexpected (unfortunately for you, I guess).
The FAQ entry above will explain why they're enabled by default.
Try the v2 log change I suggested, hopefully that will mitigate
the problem somewhat. Alternatively, buy some decent hardware if
you're after performance. ;)

cheers.

--
Nathan

2006-05-23 01:26:38

by David Chinner

[permalink] [raw]
Subject: Re: XFS write speed drop

On Mon, May 22, 2006 at 02:21:48PM +0200, Jan Engelhardt wrote:
> >You're a bit light on details here.
> >
> >Can you send the benchmark results themselves please? (as in,
> >the test(s) you've run that lead you to see 6-8x, and the data
> >those tests produced). Also, xfs_info output, and maybe list
> >the device driver(s) involved here too.

Jan, just out of curiousity, can you also run the tests with
barriers off and the drive write cache turned off? I would
expect the cache+barriers to be faster...

Cheers,

Dave.
--
Dave Chinner
R&D Software Enginner
SGI Australian Software Group

2006-05-23 13:24:35

by Jan Engelhardt

[permalink] [raw]
Subject: Re: XFS write speed drop


>> CASE 1: Copying from one disk to another
>> ========================================
>> Copying a compiled 2.6.17-rc4 tree; 306907 KB in 28566 files in 2090
>> directories.
>
>OK, we can call this a metadata intensive workload - lots of small
>files, lots of creates. Barriers will hurt the most here, as we'd
>already have been log I/O bound most likely, and I'd expect barriers
>to only slow that further.
>
Yes and the most important thing is that someone made -o barrier the
default and did not notice. Someone else? :-D

>Yep, note the user/sys shows no change, we're basically IO bound in
>both tests, and barriers are hurting (as expected).


>> CASE 2: Removing
>> ================
>> Remove the copies we created in case 1.
>>
>> 15:45 (none):/tmp # mount /dev/hdc2 /D -o barrier
>> 15:45 (none):/D # time rm -Rf kernel-0
>> real 3m31.901s
>> user 0m0.050s
>> sys 0m3.140s
>
>versus:
>
>> 15:49 (none):/ # mount /dev/hdc2 /D -o nobarrier
>> 15:49 (none):/D # time rm -Rf kernel-1
>>
>> real 0m53.471s
>> user 0m0.070s
>> sys 0m1.990s
>> 15:50 (none):/D # cd /
>> 15:50 (none):/ # umount /D
>
>Also metadata intensive, of course. All the same issues as above,
>and the same techniques should be used to address it.
>
>Odd that the system time jumped here. Roughly the same decrease in
>performance though (3-4x).

You may, or may not, take that serious. I only ran each test once (since
3m31s vs 53s shows "enough" of the issue).

>So, I agree, you're seeing the cost of write barriers here, but I
>don't see anything unexpected (unfortunately for you, I guess).
>
>The FAQ entry above will explain why they're enabled by default.
>Try the v2 log change I suggested, hopefully that will mitigate
>the problem somewhat. Alternatively, buy some decent hardware if
>you're after performance. ;)
>
Well as mentioned, -o nobarrier solved it, and that's it. I do not actually
need barriers (or an UPS, to poke on another thread), because power
failures are rather rare in Germany.


Jan Engelhardt
--

2006-05-23 22:22:47

by Nathan Scott

[permalink] [raw]
Subject: Re: XFS write speed drop

On Tue, May 23, 2006 at 03:23:31PM +0200, Jan Engelhardt wrote:
> >> CASE 1: Copying from one disk to another
> >> ========================================
> >> Copying a compiled 2.6.17-rc4 tree; 306907 KB in 28566 files in 2090
> >> directories.
> >
> >OK, we can call this a metadata intensive workload - lots of small
> >files, lots of creates. Barriers will hurt the most here, as we'd
> >already have been log I/O bound most likely, and I'd expect barriers
> >to only slow that further.
> >
> Yes and the most important thing is that someone made -o barrier the
> default and did not notice. Someone else? :-D

Not sure what you're trying to say here. Yes, barriers are on
by default now if the hardware supports them, yes, they will
slow things down relative to write-cache-without-barriers, and
yes we all knew that ... its not the case that someone "did not
notice" or forgot about something. There is no doubt that this
is the right thing to be doing by default - there's no way that
I can tell from inside XFS in the kernel that you have a UPS. ;)

cheers.

--
Nathan

2006-05-23 23:18:24

by Nathan Scott

[permalink] [raw]
Subject: Re: XFS write speed drop

On Wed, May 24, 2006 at 08:22:19AM +1000, Nathan Scott wrote:
> On Tue, May 23, 2006 at 03:23:31PM +0200, Jan Engelhardt wrote:
> > >> CASE 1: Copying from one disk to another
> > >> ========================================
> > >> Copying a compiled 2.6.17-rc4 tree; 306907 KB in 28566 files in 2090
> > >> directories.
> > >
> > >OK, we can call this a metadata intensive workload - lots of small
> > >files, lots of creates. Barriers will hurt the most here, as we'd
> > >already have been log I/O bound most likely, and I'd expect barriers
> > >to only slow that further.
> > >
> > Yes and the most important thing is that someone made -o barrier the
> > default and did not notice. Someone else? :-D
>
> Not sure what you're trying to say here. Yes, barriers are on
> by default now if the hardware supports them, yes, they will
> slow things down relative to write-cache-without-barriers, and
> yes we all knew that ... its not the case that someone "did not
> notice" or forgot about something. There is no doubt that this
> is the right thing to be doing by default - there's no way that
> I can tell from inside XFS in the kernel that you have a UPS. ;)

Oh, I realised I've slightly misread your mail, you said...

| I do not actually need barriers (or an UPS, to poke on another thread),
| because power failures are rather rare in Germany.

Hmm, even harder for us to detect at runtime, in the kernel,
that you're in Germany. :)

Power failures aren't the only thing to cause crashes, however.

cheers.

--
Nathan

2006-05-24 07:16:38

by Jeremy Higdon

[permalink] [raw]
Subject: Re: XFS write speed drop

On Wed, May 24, 2006 at 09:17:41AM +1000, Nathan Scott wrote:
> On Wed, May 24, 2006 at 08:22:19AM +1000, Nathan Scott wrote:
> >
> > Not sure what you're trying to say here. Yes, barriers are on
> > by default now if the hardware supports them, yes, they will
> > slow things down relative to write-cache-without-barriers, and
> > yes we all knew that ... its not the case that someone "did not
> > notice" or forgot about something. There is no doubt that this
> > is the right thing to be doing by default - there's no way that
> > I can tell from inside XFS in the kernel that you have a UPS. ;)
>
> Oh, I realised I've slightly misread your mail, you said...
>
> | I do not actually need barriers (or an UPS, to poke on another thread),
> | because power failures are rather rare in Germany.
>
> Hmm, even harder for us to detect at runtime, in the kernel,
> that you're in Germany. :)
>
> Power failures aren't the only thing to cause crashes, however.

There have been several examples of filesystem corruption without
power failures too, though that is a common cause. It can even happen
on a normal system shutdown if there is no synchronize operation to
the disk at shutdown time, depending on how much data is in the cache
and how long you have until the ATA chip is reset.

jeremy

2006-05-26 07:23:33

by Helge Hafting

[permalink] [raw]
Subject: Re: XFS write speed drop

Jan Engelhardt wrote:
> Well as mentioned, -o nobarrier solved it, and that's it. I do not actually
> need barriers (or an UPS, to poke on another thread), because power
> failures are rather rare in Germany.
Then there are mistakes like someone stepping on the power
cord, pulling out the wrong one, drilling holes in the wall,
there are kernel crashes, there is lightning,
there is the possibility that some
component inside the box just breaks.

Helge Hafting

2006-05-26 08:51:33

by Jan Engelhardt

[permalink] [raw]
Subject: Re: XFS write speed drop


>> Well as mentioned, -o nobarrier solved it, and that's it. I do not
>> actually need barriers (or an UPS, to poke on another thread), because
>> power failures are rather rare in Germany.
>
> Then there are mistakes like someone stepping on the power
> cord, pulling out the wrong one, drilling holes in the wall,
> there are kernel crashes, there is lightning,
> there is the possibility that some
> component inside the box just breaks.
>
A mathematician has fear of flying because there could possibly be a
terrorist bomb inside the plane. What does he do?
Taking an bomb with himself, because the probablity that there are two
bombs inside a plane is lower than one.

What I am wanting to say is that the factors you mentioned just do not happen
here, at this one - home - box.


Jan Engelhardt
--

2006-05-26 21:06:07

by be-news06

[permalink] [raw]
Subject: Re: XFS write speed drop

Jan Engelhardt <[email protected]> wrote:
> What I am wanting to say is that the factors you mentioned just do not happen
> here, at this one - home - box.

They will as soon as you run the box in unsafe mode and you have important
data without backup :)

Gruss
Bernd