2011-06-06 13:58:05

by Tristan Ball

[permalink] [raw]
Subject: NFS Sync with External Journal

Hi,

I've been experimenting with using an external ext3 journal as a way to bring the write performance of an NFS share exported with the the 'sync' option closer to that of one exported with 'async'.
I've mounted the ext3 filesystem with data=journal, and the journal itself is on SSD. I've seen various references on the net saying that this should improve performance, as the nfs process can respond to write requests as soon as the data is in journal, rather than flushed all the way to the filesystem

However, in my tests is seems that when the filesystem is shared as 'sync', then writes are written to the filesystem at the same time as they are written to the journal, and performance isn't significantly different to a plain ext3 filesystem with an internal journal and data=ordered. To me this implies that the NFS layer isn't returning from writes until they're flushed to the filesystem disk?

So, my question really is - should I be expecting this to work as a performance enhancer?

I realise that the server is doing more work with data=journal, however given how much faster than the HD the SSD is, and the fact that the journal is large enough to contain all the data I'm writing in this test, I was hoping to see the nfs writes occur at closer to wirespeed.

Server is Oracle Linux, Kernel 2.6.32-100.28.5.el6.x86_64.
Client was Ubuntu, 2.6.32-32-server x86_64.

/etc/exports:
/plain *(rw,async,no_subtree_check,no_root_squash)
/split *(rw,async,no_subtree_check,no_root_squash) # (FS with external Journal)

Client mounts were done simply with -o 'rw,rsize=32768,wsize=32768'

Benchmark results:
Plain Ext3, data=ordered export=sync, write speed 56-62MB/sec
Split Ext3, data=journal export=sync, write speed = 46-50MB/sec

For reference:
Plain Ext3, data=ordered export=async, write speed 111MB/sec
Split Ext3, data=journal export=async, write speed 110MB/sec

Thanks for your time.

Tristan


Tristan Ball - Hosted Services Manager VIC
Pronto Hosted Services
20 Lakeside Drive, Burwood East, VIC 3151
Phone: +61 3 9887 7770 | Email: [email protected]
Mobile: +61 408 397 473


For PHS helpdesk support, please email [email protected]
For urgent after hours support phone: 1800 622 556

---Legal Notice---
The email message and any attachments are confidential and subject to copyright. If you are not the intended recipient, any use, interference with, disclosure or copying of this material is unauthorised and prohibited. No part may be reproduced, adapted or transmitted without the written permission of the copyright owner. If you have received this email in error, please immediately advise the sender by return email and delete the message from your system. Before opening or using attachments, check for viruses and defects. Our liability is limited to re-supplying any affected attachments.



2011-06-08 04:12:42

by Tristan Ball

[permalink] [raw]
Subject: RE: NFS Sync with External Journal

VGhlIHRlc3Qgd2FzOg0KDQpJb3pvbmUgLUkgMCAtciA2NGsgLXMgMkcgLXcNCg0KQWN0dWFsbHks
IEkgc3RhcnRlZCB3aXRoIGEgbXVjaCBsYXJnZXIgdGVzdCBzdWl0ZSwgYW5kIEknbSBhY3R1YWxs
eSBtb3N0IGludGVyZXN0ZWQgaW4gc21hbGwgcmFuZG9tIHdyaXRlIGxhdGVuY3ksIHJhdGhlciB0
aGFuIG92ZXJhbGwgdGhyb3VnaHB1dCwgYnV0IHRoZSBzZXF1ZW50aWFsIHdyaXRlcyB0ZXN0IG1h
a2UgdGhlIGZpbGUgc3lzdGVtIHZzIGxvZyB3aGl0ZSBiZWhhdmlvdXIgb2J2aW91cyB2aWEgaW9z
dGF0LiBJdCBhbHNvIHNlZW1zIHRoYXQgaW4gbXkgZW52aXJvbm1lbnQsIHRoZSBkaXNrIHdyaXRl
IGxhdGVuY3kgYWN0dWFsbHkgbGltaXRzIHRocm91Z2hwdXQgc2xpZ2h0bHksIHByZXN1bWFibHkg
YmVjYXVzZSB0aGUgZGVsYXkgaW4gc3luY2luZyBlYWNoIHdyaXRlIGRlbGF5cyByZXNwb25zZXMg
YmFjayB0byB0aGUgY2xpZW50Pw0KDQpNeSBpbml0aWFsIHRlc3Qgd2FzIHdpdGggYSB3aXRoIGEg
cGFpciBvZiBzdHJpcGVkIGRyaXZlcyBjYXBhYmxlIG9mIGFib3V0IDEzMC0xNjBNQi9zZWMgb24g
bG9jYWwgd3JpdGVzICh0aGV5J3JlIDU0MDBycG0gbGFwdG9wIGRyaXZlcykuDQoNClNvbWUgc2xp
Z2h0IHR1bmluZyAocmVtb3ZpbmcgdGhlIHJzaXplL3dzaXplIHBhcmFtZXRlcnMsIGFuZCBpbmNy
ZWFzaW5nIHRoZSBbcnx3XW1lbV9kZWZhdWx0L21heCBwYXJhbWV0ZXJzIGdldHMgdGhlIE5GUyBz
cGVlZCB1cCB0byBhYm91dCA4MG1iL3NlYyBvbiB3cml0ZXMuDQoNCkRvaW5nIHRoZSBleGFjdCBz
YW1lIHRlc3Qgb24gYSBmaWxlc3lzdGVtIGJhY2tlZCBvbnRvIHNvbWUgU1NEJ3MgIGdldHMgbWUg
dXAgdG8gYWJvdXQgMTAwLTEwNU1CIHNlYywgYWx0aG91Z2ggdGhlIHdyaXRlIGxhdGVuY3kgb24g
dGhlc2UgaXMgbXVjaCBsb3dlciB0aGFuIHRoZSA1NDAwIGRpc2tzLg0KDQpJJ2xsIGdldCBiYWNr
IHRvIHlvdSBvbiB0aGUgZXh0MyBtYWlsaW5nIGxpc3QgcmVzcG9uc2VzLCBJJ20gZ29pbmcgdG8g
cmVkbyBteSB0ZXN0cyBvbiBleHQ0IGZpcnN0Lg0KDQpSZWdhcmRzLA0KICAgICAgICBUcmlzdGFu
DQoNCg0KDQpUcmlzdGFuIEJhbGwgLSBIb3N0ZWQgU2VydmljZXMgTWFuYWdlciBWSUMNClByb250
byBIb3N0ZWQgU2VydmljZXMNCjIwIExha2VzaWRlIERyaXZlLCBCdXJ3b29kIEVhc3QsIFZJQyAz
MTUxLCBBdXN0cmFsaWENClBob25lOiArNjEgMyA5ODg3IDc3NzAgfCBFbWFpbDogdHJpc3RhbmJA
cHJvbnRvLmNvbS5hdQ0KTW9iaWxlOiArNjEgNDA4IDM5NyA0NzMNCnd3dy5wcm9udG8uY29tLmF1
DQoNCg0KLS0tTGVnYWwgTm90aWNlLS0tDQpUaGUgZW1haWwgbWVzc2FnZSBhbmQgYW55IGF0dGFj
aG1lbnRzIGFyZSBjb25maWRlbnRpYWwgYW5kIHN1YmplY3QgdG8gY29weXJpZ2h0LiBJZiB5b3Ug
YXJlIG5vdCB0aGUgaW50ZW5kZWQgcmVjaXBpZW50LCBhbnkgdXNlLCBpbnRlcmZlcmVuY2Ugd2l0
aCwgZGlzY2xvc3VyZSBvciBjb3B5aW5nIG9mIHRoaXMgbWF0ZXJpYWwgaXMgdW5hdXRob3Jpc2Vk
IGFuZCBwcm9oaWJpdGVkLiBObyBwYXJ0IG1heSBiZSByZXByb2R1Y2VkLCBhZGFwdGVkIG9yIHRy
YW5zbWl0dGVkIHdpdGhvdXQgdGhlIHdyaXR0ZW4gcGVybWlzc2lvbiBvZiB0aGUgY29weXJpZ2h0
IG93bmVyLiBJZiB5b3UgaGF2ZSByZWNlaXZlZCB0aGlzIGVtYWlsIGluIGVycm9yLCBwbGVhc2Ug
aW1tZWRpYXRlbHkgYWR2aXNlIHRoZSBzZW5kZXIgYnkgcmV0dXJuIGVtYWlsIGFuZCBkZWxldGUg
dGhlIG1lc3NhZ2UgZnJvbSB5b3VyIHN5c3RlbS4gQmVmb3JlIG9wZW5pbmcgb3IgdXNpbmcgYXR0
YWNobWVudHMsIGNoZWNrIGZvciB2aXJ1c2VzIGFuZCBkZWZlY3RzLiBPdXIgbGlhYmlsaXR5IGlz
IGxpbWl0ZWQgdG8gcmUtc3VwcGx5aW5nIGFueSBhZmZlY3RlZCBhdHRhY2htZW50cy4NCg0KLS0t
LS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCkZyb206IEouIEJydWNlIEZpZWxkcyBbbWFpbHRvOmJm
aWVsZHNAZmllbGRzZXMub3JnXQ0KU2VudDogV2VkbmVzZGF5LCA4IEp1bmUgMjAxMSA5OjIzIEFN
DQpUbzogV2VuZHkgQ2hlbmcNCkNjOiBUcmlzdGFuIEJhbGw7IGxpbnV4LW5mc0B2Z2VyLmtlcm5l
bC5vcmcNClN1YmplY3Q6IFJlOiBORlMgU3luYyB3aXRoIEV4dGVybmFsIEpvdXJuYWwNCg0KT24g
TW9uLCBKdW4gMDYsIDIwMTEgYXQgMDE6MTI6NTFQTSAtMDcwMCwgV2VuZHkgQ2hlbmcgd3JvdGU6
DQo+IFlvdSdsbCBwcm9iYWJseSBnZXQgYmV0dGVyIGFuc3dlcihzKSBmcm9tIGV4dDMgdXNlciBt
YWlsaW5nIGxpc3QgLi4uLg0KPiBpdCBpcyBtb3JlIGFib3V0IGhvdyBqb3VybmFsaW5nIHdvcmtz
IGZvciB0aGUgc3BlY2lmaWMgZmlsZSBzeXN0ZW0uDQoNClllcywgdGhvdWdoIGxlYXZlIGxpbnV4
LW5mcyBvbiB0aGUgY2M6IGFzIEknZCBiZSBpbnRlcmVzdGVkIHdoYXQgeW91IGZpbmQgb3V0Lg0K
DQo+IEluIGV4dDMgY2FzZSwgSSBiZWxpZXZlICJzeW5jIiBmb3JjZXMgZGF0YSBnZXR0aW5nIGZs
dXNoZWQgdG8gdGhlIGZpbGUNCj4gc3lzdGVtICpyZWdhcmRsZXNzKiB3aGljaCBqb3VybmFsIG1v
ZGUgaXMgY2hvc2VuLiBVc2luZyBhbiBleHRlcm5hbA0KPiBqb3VybmFsIGRldmljZSwgcGFydGlj
dWxhcmx5IG9uIFNTRCwgIGRvZXMgaGVscCBidXQgdGhlIHBlcmZvcm1hbmNlDQo+IGdhaW4gaXMg
bGltaXRlZCBieSB0aGUgYW1vdW50IG9mIGRhdGEgdGhhdCBuZWVkcyB0byBiZSB3cml0dGVuIGlu
dG8NCj4gdGhlIGZpbGUgc3lzdGVtIGl0c2VsZi4NCg0KPiA+IC9ldGMvZXhwb3J0czoNCj4gPiAv
cGxhaW4gICAgICAgICAgKihydyxhc3luYyxub19zdWJ0cmVlX2NoZWNrLG5vX3Jvb3Rfc3F1YXNo
KQ0KPiA+IC9zcGxpdCAgICAgICAgICAqKHJ3LGFzeW5jLG5vX3N1YnRyZWVfY2hlY2ssbm9fcm9v
dF9zcXVhc2gpICMgKEZTDQo+ID4gd2l0aCBleHRlcm5hbCBKb3VybmFsKQ0KPiA+DQo+ID4gQ2xp
ZW50IG1vdW50cyB3ZXJlIGRvbmUgc2ltcGx5IHdpdGggLW8gJ3J3LHJzaXplPTMyNzY4LHdzaXpl
PTMyNzY4Jw0KPiA+DQo+ID4gQmVuY2htYXJrIHJlc3VsdHM6DQo+ID4gUGxhaW4gRXh0MywgZGF0
YT1vcmRlcmVkIGV4cG9ydD1zeW5jLCB3cml0ZSBzcGVlZCA1Ni02Mk1CL3NlYyBTcGxpdA0KPiA+
IEV4dDMsIGRhdGE9am91cm5hbCBleHBvcnQ9c3luYywgd3JpdGUgc3BlZWQgPSA0Ni01ME1CL3Nl
Yw0KPiA+DQo+ID4gRm9yIHJlZmVyZW5jZToNCj4gPiBQbGFpbiBFeHQzLCBkYXRhPW9yZGVyZWQg
ZXhwb3J0PWFzeW5jLCB3cml0ZSBzcGVlZCAxMTFNQi9zZWMgU3BsaXQNCj4gPiBFeHQzLCBkYXRh
PWpvdXJuYWwgZXhwb3J0PWFzeW5jLCB3cml0ZSBzcGVlZCAxMTBNQi9zZWMNCg0KV2hhdCBleGFj
dGx5IGlzIHlvdXIgdGVzdD8NCg0KRm9yIHN1ZmZpY2llbnRseSBsYXJnZSBzZXF1ZW50aWFsIHdy
aXRlcywgSSB3b3VsZG4ndCBhY3R1YWxseSBoYXZlIGV4cGVjdGVkIHN5bmMgdnMuIGFzeW5jIHRv
IG1ha2UgbXVjaCBkaWZmZXJlbmNlOiBldmVudHVhbGx5IHlvdSdyZSBsaW1pdGVkIGJ5IHRoZSBk
cml2ZSBzcGVlZCAoSSdtIGFzc3VtaW5nIHlvdXIgZHJpdmUgZG9lcyB+NjBNQi9zIHdyaXRlIHRo
cm91Z3B1dD8pLiAgQW5kIGluZGl2aWR1YWwgd3JpdGVzIChmb3IgTkZTIHYzIGFuZCBoaWdoZXIp
IGFyZW4ndCBuZWNlc3NhcmlseSByZXF1aXJlZCB0byBiZSBzeW5jaHJvbm91cy4NCg0KQSBiZXR0
ZXIgdGVzdCB3b3VsZCBiZSBjcmVhdGluZyBvciBkZXN0cm95aW5nIGEgYnVuY2ggb2Ygc21hbGwg
ZmlsZXMsIGFzIGNyZWF0ZSBhbmQgdW5saW5rIGFyZSBzeW5jaHJvbm91cyAodGhlIG5mcyBzZXJ2
ZXIgd29uJ3QgcmV0dXJuLCBpbiB0aGUgc3luYyBjYXNlLCBiZWZvcmUgZWFjaCBjcmVhdGUgYW5k
IHVubGluayBhY3R1YWxseSBoaXRzIHRoZSBkaXNrKS4NCg0KLS1iLg0KTu+/ve+/ve+/ve+/ve+/
vXLvv73vv71577+977+977+9Yu+/vVjvv73vv73Hp3bvv71e77+9Kd66ey5u77+9K++/ve+/ve+/
ve+/vXvvv73vv73vv70i77+977+9Xm7vv71y77+977+977+9eu+/vRrvv73vv71o77+977+977+9
77+9Ju+/ve+/vR7vv71H77+977+977+9aO+/vQMo77+96ZqO77+93aJqIu+/ve+/vRrvv70bbe+/
ve+/ve+/ve+/ve+/vXrvv73elu+/ve+/ve+/vWbvv73vv73vv71o77+977+977+9fu+/vW3vv70=

2011-06-08 17:33:36

by Wendy Cheng

[permalink] [raw]
Subject: Re: NFS Sync with External Journal

On Tue, Jun 7, 2011 at 8:57 PM, Tristan Ball <[email protected]> wrote:

... [snip]

> Doing the exact same test on a filesystem backed onto some SSD's ?gets me up to about 100-105MB sec, although the write latency on these is much lower than the 5400 disks.
>
> I'll get back to you on the ext3 mailing list responses, I'm going to redo my tests on ext4 first.

I'm also interested in the subject. More specifically, if users are
handed a bunch of SSDs and/or NVMs, what can they do to leverage the
hardware features for functionality or performance improvement ?

Do keep us informed.

-- Wendy

2011-06-08 18:10:16

by Ben Myers

[permalink] [raw]
Subject: Re: NFS Sync with External Journal

Hey Tristan,

On Mon, Jun 06, 2011 at 01:42:39PM +0000, Tristan Ball wrote:
> I've been experimenting with using an external ext3 journal as a way
> to bring the write performance of an NFS share exported with the the
> 'sync' option closer to that of one exported with 'async'.
>
> I've mounted the ext3 filesystem with data=journal, and the journal
> itself is on SSD. I've seen various references on the net saying that
> this should improve performance, as the nfs process can respond to
> write requests as soon as the data is in journal, rather than flushed
> all the way to the filesystem

There was some work done for xfs awhile back with a similar issue in
mind. Once metadata are in the xfs journal the nfs server can respond
to requests without waiting for it to be flushed to its final place on
disk. We're using the commit_metadata export_operation to do that.
But XFS won't do that for the data itself. We rely on multiple spindles
to improve performance there.

Regards,
Ben

2011-06-06 20:12:52

by Wendy Cheng

[permalink] [raw]
Subject: Re: NFS Sync with External Journal

You'll probably get better answer(s) from ext3 user mailing list
.... it is more about how journaling works for the specific file
system.

In ext3 case, I believe "sync" forces data getting flushed to the file
system *regardless* which journal mode is chosen. Using an external
journal device, particularly on SSD, does help but the performance
gain is limited by the amount of data that needs to be written into
the file system itself.

-- Wendy

On Mon, Jun 6, 2011 at 6:42 AM, Tristan Ball <[email protected]> wrote:
> Hi,
>
> I've been experimenting with using an external ext3 journal as a way to bring the write performance of an NFS share exported with the the 'sync' option closer to that of one exported with 'async'.
> I've mounted the ext3 filesystem with data=journal, and the journal itself is on SSD. ?I've seen various references on the net saying that this should improve performance, as the nfs process can respond to write requests as soon as the data is in journal, rather than flushed all the way to the filesystem
>
> However, in my tests is seems that when the filesystem is shared as 'sync', then writes are written to the filesystem at the same time as they are written to the journal, and performance isn't significantly different to a plain ext3 filesystem with an internal journal and data=ordered. To me this implies that the NFS layer isn't returning from writes until they're flushed to the filesystem disk?
>
> So, my question really is - should I be expecting this to work as a performance enhancer?
>
> I realise that the server is doing more work with data=journal, however given how much faster than the HD the SSD is, and the fact that the journal is large enough to contain all the data I'm writing in this test, I was hoping to see the nfs writes occur at closer to wirespeed.
>
> Server is Oracle Linux, Kernel 2.6.32-100.28.5.el6.x86_64.
> Client was Ubuntu, 2.6.32-32-server x86_64.
>
> /etc/exports:
> /plain ? ? ? ? ?*(rw,async,no_subtree_check,no_root_squash)
> /split ? ? ? ? ?*(rw,async,no_subtree_check,no_root_squash) # (FS with external Journal)
>
> Client mounts were done simply with -o 'rw,rsize=32768,wsize=32768'
>
> Benchmark results:
> Plain Ext3, data=ordered export=sync, write speed 56-62MB/sec
> Split Ext3, data=journal export=sync, write speed = 46-50MB/sec
>
> For reference:
> Plain Ext3, data=ordered export=async, write speed 111MB/sec
> Split Ext3, data=journal export=async, write speed 110MB/sec
>
> Thanks for your time.
>
> Tristan
>
>
> Tristan Ball - Hosted Services Manager VIC
> Pronto Hosted Services
> 20 Lakeside Drive, Burwood East, VIC 3151
> Phone: +61 3 9887 7770 | Email: [email protected]
> Mobile: +61 408 397 473
>
>
> For PHS helpdesk support, please email [email protected]
> For urgent after hours support phone: 1800 622 556
>
> ---Legal Notice---
> The email message and any attachments are confidential and subject to copyright. If you are not the intended recipient, any use, interference with, disclosure or copying of this material is unauthorised and prohibited. No part may be reproduced, adapted or transmitted without the written permission of the copyright owner. If you have received this email in error, please immediately advise the sender by return email and delete the message from your system. Before opening or using attachments, check for viruses and defects. Our liability is limited to re-supplying any affected attachments.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

2011-06-07 23:23:16

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFS Sync with External Journal

On Mon, Jun 06, 2011 at 01:12:51PM -0700, Wendy Cheng wrote:
> You'll probably get better answer(s) from ext3 user mailing list
> .... it is more about how journaling works for the specific file
> system.

Yes, though leave linux-nfs on the cc: as I'd be interested what you
find out.

> In ext3 case, I believe "sync" forces data getting flushed to the file
> system *regardless* which journal mode is chosen. Using an external
> journal device, particularly on SSD, does help but the performance
> gain is limited by the amount of data that needs to be written into
> the file system itself.

> > /etc/exports:
> > /plain          *(rw,async,no_subtree_check,no_root_squash)
> > /split          *(rw,async,no_subtree_check,no_root_squash) # (FS with external Journal)
> >
> > Client mounts were done simply with -o 'rw,rsize=32768,wsize=32768'
> >
> > Benchmark results:
> > Plain Ext3, data=ordered export=sync, write speed 56-62MB/sec
> > Split Ext3, data=journal export=sync, write speed = 46-50MB/sec
> >
> > For reference:
> > Plain Ext3, data=ordered export=async, write speed 111MB/sec
> > Split Ext3, data=journal export=async, write speed 110MB/sec

What exactly is your test?

For sufficiently large sequential writes, I wouldn't actually have
expected sync vs. async to make much difference: eventually you're
limited by the drive speed (I'm assuming your drive does ~60MB/s write
througput?). And individual writes (for NFS v3 and higher) aren't
necessarily required to be synchronous.

A better test would be creating or destroying a bunch of small files, as
create and unlink are synchronous (the nfs server won't return, in the
sync case, before each create and unlink actually hits the disk).

--b.