LinuxLists.cc - extremely slow nfs when sync enabled

2012-05-06 09:26:05

Subject: extremely slow nfs when sync enabled

I've been observing some very slow nfs write performance when the server
has `sync' in /etc/exports

I want to avoid using async, but I have tested it and on my gigabit
network, it gives almost the same speed as if I was on the server
itself. (e.g. 30MB/sec to one disk, or less than 1MB/sec to the same
disk over NFS with `sync')

I'm using Debian 6 with 2.6.38 kernels on client and server, NFSv3

I've also tried a client running Debian 7/Linux 3.2.0 with both NFSv3
and NFSv4, speed is still slow

Looking at iostat on the server, I notice that avgrq-sz = 8 sectors
(4096 bytes) throughout the write operations

I've tried various tests, e.g. dd a large file, or unpack a tarball with
many small files, the iostat output is always the same

Looking at /proc/mounts on the clients, everything looks good, large
wsize, tcp:

rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.x.x.x,mountvers=3,mountport=58727,mountproto=udp,local_lock=none,addr=192.x.x.x
0 0

and
rw,relatime,vers=4,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.x.x.x.,minorversion=0,local_lock=none,addr=192.x.x.x 0 0

and in /proc/fs/nfs/exports on the server, I have sync and wdelay:

/nfs4/daniel
192.168.1.0/24,192.x.x.x(rw,insecure,root_squash,sync,wdelay,no_subtree_check,uuid=aa2a6f37:9cc94eeb:bcbf983c:d6e041d9,sec=1)
/home/daniel
192.168.1.0/24,192.x.x.x(rw,root_squash,sync,wdelay,no_subtree_check,uuid=aa2a6f37:9cc94eeb:bcbf983c:d6e041d9)

Can anyone suggest anything else? Or is this really the performance hit
of `sync'?

2012-05-08 12:07:28

by Daniel Pocock

[permalink] [raw]

Subject: Re: extremely slow nfs when sync enabled

On 07/05/12 17:18, J. Bruce Fields wrote:
> On Mon, May 07, 2012 at 01:59:42PM +0000, Daniel Pocock wrote:
>>
>>
>> On 07/05/12 09:19, Daniel Pocock wrote:
>>>
>>>>> Ok, so the combination of:
>>>>>
>>>>> - enable writeback with hdparm
>>>>> - use ext4 (and not ext3)
>>>>> - barrier=1 and data=writeback? or data=?
>>>>>
>>>>> - is there a particular kernel version (on either client or server side)
>>>>> that will offer more stability using this combination of features?
>>>>
>>>> Not that I'm aware of. As long as you have a kernel > 2.6.29, then LVM
>>>> should work correctly. The main problem is that some SATA hardware tends
>>>> to be buggy, defeating the methods used by the barrier code to ensure
>>>> data is truly on disk. I believe that XFS will therefore actually test
>>>> the hardware when you mount with write caching and barriers, and should
>>>> report if the test fails in the syslogs.
>>>> See http://xfs.org/index.php/XFS_FAQ#Write_barrier_support.
>>>>
>>>>> I think there are some other variations of my workflow that I can
>>>>> attempt too, e.g. I've contemplated compiling C++ code onto a RAM disk
>>>>> because I don't need to keep the hundreds of object files.
>>>>
>>>> You might also consider using something like ccache and set the
>>>> CCACHE_DIR to a local disk if you have one.
>>>>
>>>
>>>
>>> Thanks for the feedback about these options, I am going to look at these
>>> strategies more closely
>>>
>>
>>
>> I decided to try and take md and LVM out of the picture, I tried two
>> variations:
>>
>> a) the boot partitions are not mirrored, so I reformatted one of them as
>> ext4,
>> - enabled write-cache for the whole of sdb,
>> - mounted ext4, barrier=1,data=ordered
>> - and exported this volume over NFS
>>
>> unpacking a large source tarball on this volume, iostat reports write
>> speeds that are even slower, barely 300kBytes/sec
>
> How many file creates per second?
>

I ran:
nfsstat -s -o all -l -Z5
and during the test (unpacking the tarball), I see numbers like these
every 5 seconds for about 2 minutes:

nfs v3 server total: 319
------------- ------------- --------
nfs v3 server getattr: 1
nfs v3 server setattr: 126
nfs v3 server access: 6
nfs v3 server write: 61
nfs v3 server create: 61
nfs v3 server mkdir: 3
nfs v3 server commit: 61

I decided to expand the scope of my testing too, I want to rule out the
possibility that my HP Microserver with onboard SATA is the culprit. I
set up two other NFS servers (all Debian 6, kernel 2.6.38):

HP Z800 Xeon workstation
Intel Corporation 82801 SATA RAID Controller (operating as AHCI)
VB0250EAVER (250GB 7200rpm)

Lenovo Thinkpad X220
Intel Corporation Cougar Point 6 port SATA AHCI Controller (rev 04)
SSDSA2BW160G3L (160GB SSD)

Both the Z800 and X220 run as NFSv3 servers
Each one has a fresh 10GB logical volume formatted ext4,
mount options: barrier=1,data=ordered
write cache (hdparm -W 1): enabled

Results:
NFS client: X220
NFS server: Z800 (regular disk)
iostat reports about 1,000kbytes/sec when unpacking the tarball
This is just as slow as the original NFS server

NFS client: Z800
NFS server: X220 (SSD disk)
iostat reports about 22,000kbytes/sec when unpacking the tarball

It seems that buying a pair of SSDs for my HP MicroServer will let me
use NFS `sync' and enjoy healthy performance - 20x faster.

However, is there really no other way to get more speed out of NFS when
using the `sync' option?

2012-05-08 12:46:03

by J. Bruce Fields

[permalink] [raw]

Subject: Re: extremely slow nfs when sync enabled

On Tue, May 08, 2012 at 12:06:59PM +0000, Daniel Pocock wrote:
>
>
> On 07/05/12 17:18, J. Bruce Fields wrote:
> > How many file creates per second?
> >
>
> I ran:
> nfsstat -s -o all -l -Z5
> and during the test (unpacking the tarball), I see numbers like these
> every 5 seconds for about 2 minutes:
>
> nfs v3 server total: 319
> ------------- ------------- --------
> nfs v3 server getattr: 1
> nfs v3 server setattr: 126
> nfs v3 server access: 6
> nfs v3 server write: 61
> nfs v3 server create: 61
> nfs v3 server mkdir: 3
> nfs v3 server commit: 61

OK, so it's probably creating about 60 new files, each requiring a
create, write, commit, and two setattrs?

Each of those operations is synchronous, so probably has to wait for at
least one disk seek. About 300 such operations every 5 seconds is about
60 per second, or about 16ms each. That doesn't sound so far off.

(I wonder why it needs two setattrs?)

> I decided to expand the scope of my testing too, I want to rule out the
> possibility that my HP Microserver with onboard SATA is the culprit. I
> set up two other NFS servers (all Debian 6, kernel 2.6.38):
>
> HP Z800 Xeon workstation
> Intel Corporation 82801 SATA RAID Controller (operating as AHCI)
> VB0250EAVER (250GB 7200rpm)
>
> Lenovo Thinkpad X220
> Intel Corporation Cougar Point 6 port SATA AHCI Controller (rev 04)
> SSDSA2BW160G3L (160GB SSD)
>
> Both the Z800 and X220 run as NFSv3 servers
> Each one has a fresh 10GB logical volume formatted ext4,
> mount options: barrier=1,data=ordered
> write cache (hdparm -W 1): enabled
>
> Results:
> NFS client: X220
> NFS server: Z800 (regular disk)
> iostat reports about 1,000kbytes/sec when unpacking the tarball
> This is just as slow as the original NFS server

Again, reporting kbytes/second alone isn't useful--data throughput isn't
interesting for a workload like unpacking a tarball with a lot of small
files. The limiting factor is the synchronous operations.

> NFS client: Z800
> NFS server: X220 (SSD disk)
> iostat reports about 22,000kbytes/sec when unpacking the tarball
>
> It seems that buying a pair of SSDs for my HP MicroServer will let me
> use NFS `sync' and enjoy healthy performance - 20x faster.

And an SSD has much lower write latency, so this isn't surprising.

> However, is there really no other way to get more speed out of NFS when
> using the `sync' option?

I've heard reports of people being able to get better performance on
this kind of workload by using an external journal on an SSD.

(Last I tried this--with a machine at home, using (if I remember
correctly) ext4 on software raid with the journal on an intel x25-m, I
wasn't able to get any improvement. I didn't try to figure out why.)

--b.

2012-05-07 17:18:06

by J. Bruce Fields

[permalink] [raw]

Subject: Re: extremely slow nfs when sync enabled

On Mon, May 07, 2012 at 01:59:42PM +0000, Daniel Pocock wrote:
>
>
> On 07/05/12 09:19, Daniel Pocock wrote:
> >
> >>> Ok, so the combination of:
> >>>
> >>> - enable writeback with hdparm
> >>> - use ext4 (and not ext3)
> >>> - barrier=1 and data=writeback? or data=?
> >>>
> >>> - is there a particular kernel version (on either client or server side)
> >>> that will offer more stability using this combination of features?
> >>
> >> Not that I'm aware of. As long as you have a kernel > 2.6.29, then LVM
> >> should work correctly. The main problem is that some SATA hardware tends
> >> to be buggy, defeating the methods used by the barrier code to ensure
> >> data is truly on disk. I believe that XFS will therefore actually test
> >> the hardware when you mount with write caching and barriers, and should
> >> report if the test fails in the syslogs.
> >> See http://xfs.org/index.php/XFS_FAQ#Write_barrier_support.
> >>
> >>> I think there are some other variations of my workflow that I can
> >>> attempt too, e.g. I've contemplated compiling C++ code onto a RAM disk
> >>> because I don't need to keep the hundreds of object files.
> >>
> >> You might also consider using something like ccache and set the
> >> CCACHE_DIR to a local disk if you have one.
> >>
> >
> >
> > Thanks for the feedback about these options, I am going to look at these
> > strategies more closely
> >
>
>
> I decided to try and take md and LVM out of the picture, I tried two
> variations:
>
> a) the boot partitions are not mirrored, so I reformatted one of them as
> ext4,
> - enabled write-cache for the whole of sdb,
> - mounted ext4, barrier=1,data=ordered
> - and exported this volume over NFS
>
> unpacking a large source tarball on this volume, iostat reports write
> speeds that are even slower, barely 300kBytes/sec

How many file creates per second?

--b.

>
> b) I took an external USB HDD,
> - created two 20GB partitions sdc1 and sdc2
> - formatted sdc1 as btrfs
> - formatted sdc2 as ext4
> - mounted sdc2 the same as sdb1 in test (a),
> ext4, barrier=1,data=ordered
> - exported both volumes over NFS
>
> unpacking a large source tarball on these two volumes, iostat reports
> write speeds that are around 5MB/sec - much faster than the original
> problem I was having
>
> Bottom line, this leaves me with the impression that either
> - the server's SATA controller or disks need a firmware upgrade,
> - or there is some issue with the kernel barriers and/or cache flushing
> on this specific SATA hardware.
>
> I think it is fair to say that the NFS client is not at fault, however,
> I can imagine many people would be tempted to just use `async' when
> faced with a problem like this, given that async makes everything just
> run fast.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2012-05-06 11:03:30

by Daniel Pocock

[permalink] [raw]

Subject: Re: extremely slow nfs when sync enabled

On 06/05/12 09:26, Daniel Pocock wrote:
>
>
> I've been observing some very slow nfs write performance when the server
> has `sync' in /etc/exports
>
> I want to avoid using async, but I have tested it and on my gigabit
> network, it gives almost the same speed as if I was on the server
> itself. (e.g. 30MB/sec to one disk, or less than 1MB/sec to the same
> disk over NFS with `sync')

Just to clarify this point: if I log in to the server and run one of my
tests (e.g. untar the linux source), it is very fast, iostat shows
30MB/sec write)

Also, I've tried write-back cache (hdparm -W 1), when this is enabled
NFS writes go from about 1MB/sec up to about 10MB/sec, but still way
below the speed of local disk access on the server.

>
> I'm using Debian 6 with 2.6.38 kernels on client and server, NFSv3
>
> I've also tried a client running Debian 7/Linux 3.2.0 with both NFSv3
> and NFSv4, speed is still slow
>
> Looking at iostat on the server, I notice that avgrq-sz = 8 sectors
> (4096 bytes) throughout the write operations
>
> I've tried various tests, e.g. dd a large file, or unpack a tarball with
> many small files, the iostat output is always the same
>
> Looking at /proc/mounts on the clients, everything looks good, large
> wsize, tcp:
>
> rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.x.x.x,mountvers=3,mountport=58727,mountproto=udp,local_lock=none,addr=192.x.x.x
> 0 0
>
> and
> rw,relatime,vers=4,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.x.x.x.,minorversion=0,local_lock=none,addr=192.x.x.x 0 0
>
> and in /proc/fs/nfs/exports on the server, I have sync and wdelay:
>
> /nfs4/daniel
> 192.168.1.0/24,192.x.x.x(rw,insecure,root_squash,sync,wdelay,no_subtree_check,uuid=aa2a6f37:9cc94eeb:bcbf983c:d6e041d9,sec=1)
> /home/daniel
> 192.168.1.0/24,192.x.x.x(rw,root_squash,sync,wdelay,no_subtree_check,uuid=aa2a6f37:9cc94eeb:bcbf983c:d6e041d9)
>
> Can anyone suggest anything else? Or is this really the performance hit
> of `sync'?
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2012-05-08 13:29:58

by Myklebust, Trond

[permalink] [raw]

Subject: Re: extremely slow nfs when sync enabled

T24gVHVlLCAyMDEyLTA1LTA4IGF0IDA4OjQ1IC0wNDAwLCBKLiBCcnVjZSBGaWVsZHMgd3JvdGU6
DQo+IE9uIFR1ZSwgTWF5IDA4LCAyMDEyIGF0IDEyOjA2OjU5UE0gKzAwMDAsIERhbmllbCBQb2Nv
Y2sgd3JvdGU6DQo+ID4gDQo+ID4gDQo+ID4gT24gMDcvMDUvMTIgMTc6MTgsIEouIEJydWNlIEZp
ZWxkcyB3cm90ZToNCj4gPiA+IEhvdyBtYW55IGZpbGUgY3JlYXRlcyBwZXIgc2Vjb25kPw0KPiA+
ID4gDQo+ID4gDQo+ID4gSSByYW46DQo+ID4gbmZzc3RhdCAtcyAtbyBhbGwgLWwgLVo1DQo+ID4g
YW5kIGR1cmluZyB0aGUgdGVzdCAodW5wYWNraW5nIHRoZSB0YXJiYWxsKSwgSSBzZWUgbnVtYmVy
cyBsaWtlIHRoZXNlDQo+ID4gZXZlcnkgNSBzZWNvbmRzIGZvciBhYm91dCAyIG1pbnV0ZXM6DQo+
ID4gDQo+ID4gbmZzIHYzIHNlcnZlciAgICAgICAgdG90YWw6ICAgICAgMzE5DQo+ID4gLS0tLS0t
LS0tLS0tLSAtLS0tLS0tLS0tLS0tIC0tLS0tLS0tDQo+ID4gbmZzIHYzIHNlcnZlciAgICAgIGdl
dGF0dHI6ICAgICAgICAxDQo+ID4gbmZzIHYzIHNlcnZlciAgICAgIHNldGF0dHI6ICAgICAgMTI2
DQo+ID4gbmZzIHYzIHNlcnZlciAgICAgICBhY2Nlc3M6ICAgICAgICA2DQo+ID4gbmZzIHYzIHNl
cnZlciAgICAgICAgd3JpdGU6ICAgICAgIDYxDQo+ID4gbmZzIHYzIHNlcnZlciAgICAgICBjcmVh
dGU6ICAgICAgIDYxDQo+ID4gbmZzIHYzIHNlcnZlciAgICAgICAgbWtkaXI6ICAgICAgICAzDQo+
ID4gbmZzIHYzIHNlcnZlciAgICAgICBjb21taXQ6ICAgICAgIDYxDQo+IA0KPiBPSywgc28gaXQn
cyBwcm9iYWJseSBjcmVhdGluZyBhYm91dCA2MCBuZXcgZmlsZXMsIGVhY2ggcmVxdWlyaW5nIGEN
Cj4gY3JlYXRlLCB3cml0ZSwgY29tbWl0LCBhbmQgdHdvIHNldGF0dHJzPw0KPiANCj4gRWFjaCBv
ZiB0aG9zZSBvcGVyYXRpb25zIGlzIHN5bmNocm9ub3VzLCBzbyBwcm9iYWJseSBoYXMgdG8gd2Fp
dCBmb3IgYXQNCj4gbGVhc3Qgb25lIGRpc2sgc2Vlay4gIEFib3V0IDMwMCBzdWNoIG9wZXJhdGlv
bnMgZXZlcnkgNSBzZWNvbmRzIGlzIGFib3V0DQo+IDYwIHBlciBzZWNvbmQsIG9yIGFib3V0IDE2
bXMgZWFjaC4gIFRoYXQgZG9lc24ndCBzb3VuZCBzbyBmYXIgb2ZmLg0KPiANCj4gKEkgd29uZGVy
IHdoeSBpdCBuZWVkcyB0d28gc2V0YXR0cnM/KQ0KDQpJdCBpcyBhbiB1bnRhciB3b3JrbG9hZCwg
c28gaXQgbmVlZHMgdG8gcmVzZXQgdGhlIGF0aW1lL210aW1lIGFmdGVyDQp3cml0aW5nIHRoZSBm
aWxlLg0KDQpOb3RlIHRoYXQgdGhlIGFib3ZlIHdvcmtsb2FkIGlzIGV4YWN0bHkgdGhlIG9uZSBJ
IHdhcyB0YXJnZXRpbmcgd2l0aCB0aGUNCnVuc3RhYmxlIGZpbGUgY3JlYXRpb24gZHJhZnQ6DQoN
Cmh0dHA6Ly90b29scy5pZXRmLm9yZy9odG1sL2RyYWZ0LW15a2xlYnVzdC1uZnN2NC11bnN0YWJs
ZS1maWxlLWNyZWF0aW9uLTAxDQoNCkknbSBnb2luZyB0byB0cnkgcHVzaGluZyB0aGF0IGFnYWlu
IGZvciBORlN2NC4zLi4uDQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50
IG1haW50YWluZXINCg0KTmV0QXBwDQpUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbQ0Kd3d3Lm5l
dGFwcC5jb20NCg0K

2012-05-07 09:19:47

by Daniel Pocock

[permalink] [raw]

Subject: Re: extremely slow nfs when sync enabled

>> Ok, so the combination of:
>>
>> - enable writeback with hdparm
>> - use ext4 (and not ext3)
>> - barrier=1 and data=writeback? or data=?
>>
>> - is there a particular kernel version (on either client or server side)
>> that will offer more stability using this combination of features?
>
> Not that I'm aware of. As long as you have a kernel > 2.6.29, then LVM
> should work correctly. The main problem is that some SATA hardware tends
> to be buggy, defeating the methods used by the barrier code to ensure
> data is truly on disk. I believe that XFS will therefore actually test
> the hardware when you mount with write caching and barriers, and should
> report if the test fails in the syslogs.
> See http://xfs.org/index.php/XFS_FAQ#Write_barrier_support.
>
>> I think there are some other variations of my workflow that I can
>> attempt too, e.g. I've contemplated compiling C++ code onto a RAM disk
>> because I don't need to keep the hundreds of object files.
>
> You might also consider using something like ccache and set the
> CCACHE_DIR to a local disk if you have one.
>

Thanks for the feedback about these options, I am going to look at these
strategies more closely

>>>>> setups really _suck_ at dealing with fsync(). The latter is used every
>>>>
>>>> I'm using md RAID1, my setup is like this:
>>>>
>>>> 2x 1TB SATA disks ST31000528AS (7200rpm with 32MB cache and NCQ)
>>>>
>>>> SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [AHCI
>>>> mode] (rev 40)
>>>> - not using any of the BIOS softraid stuff
>>>>
>>>> Both devices have identical partitioning:
>>>> 1. 128MB boot
>>>> 2. md volume (1TB - 128MB)
>>>>
>>>> The entire md volume (/dev/md2) is then used as a PV for LVM
>>>>
>>>> I do my write tests on a fresh LV with no fragmentation
>>>>
>>>>> time the NFS client sends a COMMIT or trunc() instruction, and for
>>>>> pretty much all file and directory creation operations (you can use
>>>>> 'nfsstat' to monitor how many such operations the NFS client is sending
>>>>> as part of your test).
>>>>
>>>> I know that my two tests are very different in that way:
>>>>
>>>> - dd is just writing one big file, no fsync
>>>>
>>>> - unpacking a tarball (or compiling a large C++ project) does a lot of
>>>> small writes with many fsyncs
>>>>
>>>> In both cases, it is slow
>>>>
>>>>> Local disk can get away with doing a lot less fsync(), because the cache
>>>>> consistency guarantees are different:
>>>>> * in NFS, the server is allowed to crash or reboot without
>>>>> affecting the client's view of the filesystem.
>>>>> * in the local file system, the expectation is that on reboot any
>>>>> data lost is won't need to be recovered (the application will
>>>>> have used fsync() for any data that does need to be persistent).
>>>>> Only the disk filesystem structures need to be recovered, and
>>>>> that is done using the journal (or fsck).
>>>>
>>>>
>>>> Is this an intractable problem though?
>>>>
>>>> Or do people just work around this, for example, enable async and
>>>> write-back cache, and then try to manage the risk by adding a UPS and/or
>>>> battery backed cache to their RAID setup (to reduce the probability of
>>>> unclean shutdown)?
>>>
>>> It all boils down to what kind of consistency guarantees you are
>>> comfortable living with. The default NFS server setup offers much
>>> stronger data consistency guarantees than local disk, and is therefore
>>> likely to be slower when using cheap hardware.
>>>
>>
>> I'm keen for consistency, because I don't like the idea of corrupting
>> some source code or a whole git repository for example.
>>
>> How did you know I'm using cheap hardware? It is a HP MicroServer, I
>> even got the £100 cash-back cheque:
>>
>> http://www8.hp.com/uk/en/campaign/focus-for-smb/solution.html#/tab2/
>>
>> Seriously though, I've worked with some very large arrays in my business
>> environment, but I use this hardware at home because of the low noise
>> and low heat dissipation rather than for saving money, so I would like
>> to try and get the most out of it if possible and I'm very grateful for
>> these suggestions.
>
> Right. All I'm saying is that when comparing local disk and NFS
> performance, then make sure that you are doing an apples-to-apples
> comparison.
> The main reason for wanting to use NFS in a home setup would usually be
> in order to simultaneously access the same data through several clients.
> If that is not a concern, then perhaps transforming your NFS server into
> an iSCSI target might fit your performance requirements better?

There are various types of content, some things, like VM images, are
only accessed by one client at a time, those could be iSCSI.

However, some of the code I'm compiling needs to be built on different
platforms (e.g. I have a Debian squeeze desktop, and Debian wheezy in a
VM), consequently, it is convenient for me to have access to the git
workspaces from all these hosts using NFS.

2012-05-08 13:44:02

by Daniel Pocock

[permalink] [raw]

Subject: Re: extremely slow nfs when sync enabled

On 08/05/12 12:45, J. Bruce Fields wrote:
> On Tue, May 08, 2012 at 12:06:59PM +0000, Daniel Pocock wrote:
>>
>>
>> On 07/05/12 17:18, J. Bruce Fields wrote:
>>> How many file creates per second?
>>>
>>
>> I ran:
>> nfsstat -s -o all -l -Z5
>> and during the test (unpacking the tarball), I see numbers like these
>> every 5 seconds for about 2 minutes:
>>
>> nfs v3 server total: 319
>> ------------- ------------- --------
>> nfs v3 server getattr: 1
>> nfs v3 server setattr: 126
>> nfs v3 server access: 6
>> nfs v3 server write: 61
>> nfs v3 server create: 61
>> nfs v3 server mkdir: 3
>> nfs v3 server commit: 61
>
> OK, so it's probably creating about 60 new files, each requiring a
> create, write, commit, and two setattrs?
>
> Each of those operations is synchronous, so probably has to wait for at
> least one disk seek. About 300 such operations every 5 seconds is about
> 60 per second, or about 16ms each. That doesn't sound so far off.
>
> (I wonder why it needs two setattrs?)

I checked that with wireshark:
- first SETATTR sets mode=0644 and atime=mtime=`set to server time'
- second SETATTR sets atime and mtime using client values (atime=now,
mtime= some time in the past)

Is this likely to be an application issue (e.g. in tar), or should the
NFS client be able to merge those two requests somehow?

If I add `m' to my tar command (tar xzmf) the nfsstat results change:

nfs v3 server total: 300
------------- ------------- --------
nfs v3 server setattr: 82
nfs v3 server lookup: 17
nfs v3 server access: 5
nfs v3 server write: 90
nfs v3 server create: 83
nfs v3 server mkdir: 3
nfs v3 server remove: 15
nfs v3 server commit: 5

and iostat reports that w/s is about the same (290/sec) but throughput
is now in the region 1.1 - 1.5MBytes/sec

Without using `m', here are the full stats from the Z800 acting as NFSv3
server:

nfs v3 server total: 299
------------- ------------- --------
nfs v3 server setattr: 132
nfs v3 server access: 6
nfs v3 server write: 88
nfs v3 server create: 62
nfs v3 server mkdir: 6
nfs v3 server commit: 5

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await svctm %util

dm-10 0.00 0.00 0.00 294.00 0.00 1.15
8.00 1.04 3.55 3.26 95.92

The other thing that stands out is this:
avgrq-sz = 8 (units = 512 byte sectors)

If the server filesystem is btrfs, the wMB/s figure is the same, but I
notice avgrq-sz = 16, so it seems to be combining more requests into
bigger writes

Here is the entry from /proc/mounts:
/dev/mapper/vg01-test1 /mnt/test1 ext4
rw,relatime,barrier=1,data=ordered 0 0

2012-05-07 13:59:47

by Daniel Pocock

[permalink] [raw]

Subject: Re: extremely slow nfs when sync enabled

On 07/05/12 09:19, Daniel Pocock wrote:
>
>>> Ok, so the combination of:
>>>
>>> - enable writeback with hdparm
>>> - use ext4 (and not ext3)
>>> - barrier=1 and data=writeback? or data=?
>>>
>>> - is there a particular kernel version (on either client or server side)
>>> that will offer more stability using this combination of features?
>>
>> Not that I'm aware of. As long as you have a kernel > 2.6.29, then LVM
>> should work correctly. The main problem is that some SATA hardware tends
>> to be buggy, defeating the methods used by the barrier code to ensure
>> data is truly on disk. I believe that XFS will therefore actually test
>> the hardware when you mount with write caching and barriers, and should
>> report if the test fails in the syslogs.
>> See http://xfs.org/index.php/XFS_FAQ#Write_barrier_support.
>>
>>> I think there are some other variations of my workflow that I can
>>> attempt too, e.g. I've contemplated compiling C++ code onto a RAM disk
>>> because I don't need to keep the hundreds of object files.
>>
>> You might also consider using something like ccache and set the
>> CCACHE_DIR to a local disk if you have one.
>>
>
>
> Thanks for the feedback about these options, I am going to look at these
> strategies more closely
>

I decided to try and take md and LVM out of the picture, I tried two
variations:

a) the boot partitions are not mirrored, so I reformatted one of them as
ext4,
- enabled write-cache for the whole of sdb,
- mounted ext4, barrier=1,data=ordered
- and exported this volume over NFS

unpacking a large source tarball on this volume, iostat reports write
speeds that are even slower, barely 300kBytes/sec

b) I took an external USB HDD,
- created two 20GB partitions sdc1 and sdc2
- formatted sdc1 as btrfs
- formatted sdc2 as ext4
- mounted sdc2 the same as sdb1 in test (a),
ext4, barrier=1,data=ordered
- exported both volumes over NFS

unpacking a large source tarball on these two volumes, iostat reports
write speeds that are around 5MB/sec - much faster than the original
problem I was having

Bottom line, this leaves me with the impression that either
- the server's SATA controller or disks need a firmware upgrade,
- or there is some issue with the kernel barriers and/or cache flushing
on this specific SATA hardware.

I think it is fair to say that the NFS client is not at fault, however,
I can imagine many people would be tempted to just use `async' when
faced with a problem like this, given that async makes everything just
run fast.