2012-09-04 09:31:54

by Andrew Holway

[permalink] [raw]
Subject: NFS over RDMA small block DIRECT_IO bug

Hello.

# Avi Kivity avi(a)redhat recommended I copy kvm in on this. It would also seem relevent to libvirt. #

I have a Centos 6.2 server and Centos 6.2 client.

[root@store ~]# cat /etc/exports
/dev/shm 10.149.0.0/16(rw,fsid=1,no_root_squash,insecure) (I have tried with non tempfs targets also)


[root@node001 ~]# cat /etc/fstab
store.ibnet:/dev/shm /mnt nfs rdma,port=2050,defaults 0 0


I wrote a little for loop one liner that dd'd the centos net install image to a file called 'hello' then checksummed that file. Each iteration uses a different block size.

Non DIRECT_IO seems to work fine. DIRECT_IO with 512byte, 1K and 2K block sizes get corrupted.

I want to run my KVM guests on top of NFS over RDMA. My guests cannot create filesystems.

Thanks,

Andrew.

bug report: https://bugzilla.linux-nfs.org/show_bug.cgi?id=228

[root@node001 mnt]# for f in 512 1024 2048 4096 8192 16384 32768 65536 131072; do dd bs="$f" if=CentOS-6.3-x86_64-netinstall.iso of=hello iflag=direct oflag=direct && md5sum hello && rm -f hello; done

409600+0 records in
409600+0 records out
209715200 bytes (210 MB) copied, 62.3649 s, 3.4 MB/s
aadd0ffe3c9dfa35d8354e99ecac9276 hello -- 512 byte block

204800+0 records in
204800+0 records out
209715200 bytes (210 MB) copied, 41.3876 s, 5.1 MB/s
336f6da78f93dab591edc18da81f002e hello -- 1K block

102400+0 records in
102400+0 records out
209715200 bytes (210 MB) copied, 21.1712 s, 9.9 MB/s
f4cefe0a05c9b47ba68effdb17dc95d6 hello -- 2k block

51200+0 records in
51200+0 records out
209715200 bytes (210 MB) copied, 10.9631 s, 19.1 MB/s
690138908de516b6e5d7d180d085c3f3 hello -- 4k block

25600+0 records in
25600+0 records out
209715200 bytes (210 MB) copied, 5.4136 s, 38.7 MB/s
690138908de516b6e5d7d180d085c3f3 hello

12800+0 records in
12800+0 records out
209715200 bytes (210 MB) copied, 3.1448 s, 66.7 MB/s
690138908de516b6e5d7d180d085c3f3 hello

6400+0 records in
6400+0 records out
209715200 bytes (210 MB) copied, 1.77304 s, 118 MB/s
690138908de516b6e5d7d180d085c3f3 hello

3200+0 records in
3200+0 records out
209715200 bytes (210 MB) copied, 1.4331 s, 146 MB/s
690138908de516b6e5d7d180d085c3f3 hello

1600+0 records in
1600+0 records out
209715200 bytes (210 MB) copied, 0.922167 s, 227 MB/s
690138908de516b6e5d7d180d085c3f3 hello




2012-09-11 17:03:02

by Steve Dickson

[permalink] [raw]
Subject: Re: NFS over RDMA small block DIRECT_IO bug



On 09/04/2012 05:31 AM, Andrew Holway wrote:
> Hello.
>
> # Avi Kivity avi(a)redhat recommended I copy kvm in on this. It would also seem relevent to libvirt. #
>
> I have a Centos 6.2 server and Centos 6.2 client.
>
> [root@store ~]# cat /etc/exports
> /dev/shm 10.149.0.0/16(rw,fsid=1,no_root_squash,insecure) (I have tried with non tempfs targets also)
>
>
> [root@node001 ~]# cat /etc/fstab
> store.ibnet:/dev/shm /mnt nfs rdma,port=2050,defaults 0 0
>
>
> I wrote a little for loop one liner that dd'd the centos net install image to a file called 'hello' then checksummed that file. Each iteration uses a different block size.
>
> Non DIRECT_IO seems to work fine. DIRECT_IO with 512byte, 1K and 2K block sizes get corrupted.
>
> I want to run my KVM guests on top of NFS over RDMA. My guests cannot create filesystems.
>
> Thanks,
>
> Andrew.
>
> bug report: https://bugzilla.linux-nfs.org/show_bug.cgi?id=228
Well it appears the RHEL6 kernels are lacking a couple patches that might
help with this....

5c635e09 RPCRDMA: Fix FRMR registration/invalidate handling.
9b78145c xprtrdma: Remove assumption that each segment is <= PAGE_SIZE

I can only image that Centos 6.2 might me lacking these too... ;-)

steved.

2012-09-18 14:03:48

by Andrew Holway

[permalink] [raw]
Subject: Re: NFS over RDMA small block DIRECT_IO bug

Hi Steve,

Do you think these patches will make their way into the redhat kernel sometime soon?

What is the state of support for NFS over RDMA support at redhat?

Thanks,

Andrew


On Sep 11, 2012, at 7:03 PM, Steve Dickson wrote:

>
>
> On 09/04/2012 05:31 AM, Andrew Holway wrote:
>> Hello.
>>
>> # Avi Kivity avi(a)redhat recommended I copy kvm in on this. It would also seem relevent to libvirt. #
>>
>> I have a Centos 6.2 server and Centos 6.2 client.
>>
>> [root@store ~]# cat /etc/exports
>> /dev/shm 10.149.0.0/16(rw,fsid=1,no_root_squash,insecure) (I have tried with non tempfs targets also)
>>
>>
>> [root@node001 ~]# cat /etc/fstab
>> store.ibnet:/dev/shm /mnt nfs rdma,port=2050,defaults 0 0
>>
>>
>> I wrote a little for loop one liner that dd'd the centos net install image to a file called 'hello' then checksummed that file. Each iteration uses a different block size.
>>
>> Non DIRECT_IO seems to work fine. DIRECT_IO with 512byte, 1K and 2K block sizes get corrupted.
>>
>> I want to run my KVM guests on top of NFS over RDMA. My guests cannot create filesystems.
>>
>> Thanks,
>>
>> Andrew.
>>
>> bug report: https://bugzilla.linux-nfs.org/show_bug.cgi?id=228
> Well it appears the RHEL6 kernels are lacking a couple patches that might
> help with this....
>
> 5c635e09 RPCRDMA: Fix FRMR registration/invalidate handling.
> 9b78145c xprtrdma: Remove assumption that each segment is <= PAGE_SIZE
>
> I can only image that Centos 6.2 might me lacking these too... ;-)
>
> steved.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html



2012-09-06 13:45:45

by Myklebust, Trond

[permalink] [raw]
Subject: Re: NFS over RDMA small block DIRECT_IO bug

T24gVGh1LCAyMDEyLTA5LTA2IGF0IDEyOjE0ICswMjAwLCBBbmRyZXcgSG9sd2F5IHdyb3RlOg0K
PiBPbiBTZXAgNSwgMjAxMiwgYXQgNDowMiBQTSwgQXZpIEtpdml0eSB3cm90ZToNCj4gDQo+ID4g
T24gMDkvMDQvMjAxMiAwMzowNCBQTSwgTXlrbGVidXN0LCBUcm9uZCB3cm90ZToNCj4gPj4gT24g
VHVlLCAyMDEyLTA5LTA0IGF0IDExOjMxICswMjAwLCBBbmRyZXcgSG9sd2F5IHdyb3RlOg0KPiA+
Pj4gSGVsbG8uDQo+ID4+PiANCj4gPj4+ICMgQXZpIEtpdml0eSBhdmkoYSlyZWRoYXQgcmVjb21t
ZW5kZWQgSSBjb3B5IGt2bSBpbiBvbiB0aGlzLiBJdCB3b3VsZCBhbHNvIHNlZW0gcmVsZXZlbnQg
dG8gbGlidmlydC4gIw0KPiA+Pj4gDQo+ID4+PiBJIGhhdmUgYSBDZW50b3MgNi4yIHNlcnZlciBh
bmQgQ2VudG9zIDYuMiBjbGllbnQuDQo+ID4+PiANCj4gPj4+IFtyb290QHN0b3JlIH5dIyBjYXQg
L2V0Yy9leHBvcnRzIA0KPiA+Pj4gL2Rldi9zaG0JCQkJMTAuMTQ5LjAuMC8xNihydyxmc2lkPTEs
bm9fcm9vdF9zcXVhc2gsaW5zZWN1cmUpICAgIChJIGhhdmUgdHJpZWQgd2l0aCBub24gdGVtcGZz
IHRhcmdldHMgYWxzbykNCj4gPj4+IA0KPiA+Pj4gDQo+ID4+PiBbcm9vdEBub2RlMDAxIH5dIyBj
YXQgL2V0Yy9mc3RhYiANCj4gPj4+IHN0b3JlLmlibmV0Oi9kZXYvc2htICAgICAgICAgICAgIC9t
bnQgICAgICAgICAgICAgICAgIG5mcyAgICAgICAgICByZG1hLHBvcnQ9MjA1MCxkZWZhdWx0cyAw
IDANCj4gPj4+IA0KPiA+Pj4gDQo+ID4+PiBJIHdyb3RlIGEgbGl0dGxlIGZvciBsb29wIG9uZSBs
aW5lciB0aGF0IGRkJ2QgdGhlIGNlbnRvcyBuZXQgaW5zdGFsbCBpbWFnZSB0byBhIGZpbGUgY2Fs
bGVkICdoZWxsbycgdGhlbiBjaGVja3N1bW1lZCB0aGF0IGZpbGUuIEVhY2ggaXRlcmF0aW9uIHVz
ZXMgYSBkaWZmZXJlbnQgYmxvY2sgc2l6ZS4NCj4gPj4+IA0KPiA+Pj4gTm9uIERJUkVDVF9JTyBz
ZWVtcyB0byB3b3JrIGZpbmUuIERJUkVDVF9JTyB3aXRoIDUxMmJ5dGUsIDFLIGFuZCAySyBibG9j
ayBzaXplcyBnZXQgY29ycnVwdGVkLg0KPiA+PiANCj4gPj4gDQo+ID4+IFRoYXQgaXMgZXhwZWN0
ZWQgYmVoYXZpb3VyLiBESVJFQ1RfSU8gb3ZlciBSRE1BIG5lZWRzIHRvIGJlIHBhZ2UgYWxpZ25l
ZA0KPiA+PiBzbyB0aGF0IGl0IGNhbiB1c2UgdGhlIG1vcmUgZWZmaWNpZW50IFJETUEgUkVBRCBh
bmQgUkRNQSBXUklURSBtZW1vcnkNCj4gPj4gc2VtYW50aWNzIChpbnN0ZWFkIG9mIHRoZSBTRU5E
L1JFQ0VJVkUgY2hhbm5lbCBzZW1hbnRpY3MpLg0KPiA+IA0KPiA+IFNob3VsZG4ndCBzdWJwYWdl
IHJlcXVlc3RzIGZhaWwgdGhlbj8gIE9fRElSRUNUIGJsb2NrIHJlcXVlc3RzIGZhaWwgZm9yDQo+
ID4gc3Vic2VjdG9yIHdyaXRlcywgaW5zdGVhZCBvZiBjb3JydXB0aW5nIHlvdXIgZGF0YS4NCj4g
DQo+IEJ1dCBzaWxlbnQgZGF0YSBjb3JydXB0aW9uIGlzIHNvIG11Y2ggZnVuISENCg0KQSBjb3Vw
bGUgb2YgUkRNQSBmb2xrcyBhcmUgbG9va2luZyBpbnRvIHdoeSB0aGlzIGlzIGhhcHBlbmluZy4g
SSdtDQpob3BpbmcgdGhleSB3aWxsIGdldCBiYWNrIHRvIG1lIHNvb24uDQoNCi0tIA0KVHJvbmQg
TXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50YWluZXINCg0KTmV0QXBwDQpUcm9uZC5N
eWtsZWJ1c3RAbmV0YXBwLmNvbQ0Kd3d3Lm5ldGFwcC5jb20NCg0K

2012-09-04 12:04:34

by Myklebust, Trond

[permalink] [raw]
Subject: Re: NFS over RDMA small block DIRECT_IO bug

T24gVHVlLCAyMDEyLTA5LTA0IGF0IDExOjMxICswMjAwLCBBbmRyZXcgSG9sd2F5IHdyb3RlOg0K
PiBIZWxsby4NCj4gDQo+ICMgQXZpIEtpdml0eSBhdmkoYSlyZWRoYXQgcmVjb21tZW5kZWQgSSBj
b3B5IGt2bSBpbiBvbiB0aGlzLiBJdCB3b3VsZCBhbHNvIHNlZW0gcmVsZXZlbnQgdG8gbGlidmly
dC4gIw0KPiANCj4gSSBoYXZlIGEgQ2VudG9zIDYuMiBzZXJ2ZXIgYW5kIENlbnRvcyA2LjIgY2xp
ZW50Lg0KPiANCj4gW3Jvb3RAc3RvcmUgfl0jIGNhdCAvZXRjL2V4cG9ydHMgDQo+IC9kZXYvc2ht
CQkJCTEwLjE0OS4wLjAvMTYocncsZnNpZD0xLG5vX3Jvb3Rfc3F1YXNoLGluc2VjdXJlKSAgICAo
SSBoYXZlIHRyaWVkIHdpdGggbm9uIHRlbXBmcyB0YXJnZXRzIGFsc28pDQo+IA0KPiANCj4gW3Jv
b3RAbm9kZTAwMSB+XSMgY2F0IC9ldGMvZnN0YWIgDQo+IHN0b3JlLmlibmV0Oi9kZXYvc2htICAg
ICAgICAgICAgIC9tbnQgICAgICAgICAgICAgICAgIG5mcyAgICAgICAgICByZG1hLHBvcnQ9MjA1
MCxkZWZhdWx0cyAwIDANCj4gDQo+IA0KPiBJIHdyb3RlIGEgbGl0dGxlIGZvciBsb29wIG9uZSBs
aW5lciB0aGF0IGRkJ2QgdGhlIGNlbnRvcyBuZXQgaW5zdGFsbCBpbWFnZSB0byBhIGZpbGUgY2Fs
bGVkICdoZWxsbycgdGhlbiBjaGVja3N1bW1lZCB0aGF0IGZpbGUuIEVhY2ggaXRlcmF0aW9uIHVz
ZXMgYSBkaWZmZXJlbnQgYmxvY2sgc2l6ZS4NCj4gDQo+IE5vbiBESVJFQ1RfSU8gc2VlbXMgdG8g
d29yayBmaW5lLiBESVJFQ1RfSU8gd2l0aCA1MTJieXRlLCAxSyBhbmQgMksgYmxvY2sgc2l6ZXMg
Z2V0IGNvcnJ1cHRlZC4NCg0KDQpUaGF0IGlzIGV4cGVjdGVkIGJlaGF2aW91ci4gRElSRUNUX0lP
IG92ZXIgUkRNQSBuZWVkcyB0byBiZSBwYWdlIGFsaWduZWQNCnNvIHRoYXQgaXQgY2FuIHVzZSB0
aGUgbW9yZSBlZmZpY2llbnQgUkRNQSBSRUFEIGFuZCBSRE1BIFdSSVRFIG1lbW9yeQ0Kc2VtYW50
aWNzIChpbnN0ZWFkIG9mIHRoZSBTRU5EL1JFQ0VJVkUgY2hhbm5lbCBzZW1hbnRpY3MpLg0KDQo+
IEkgd2FudCB0byBydW4gbXkgS1ZNIGd1ZXN0cyBvbiB0b3Agb2YgTkZTIG92ZXIgUkRNQS4gTXkg
Z3Vlc3RzIGNhbm5vdCBjcmVhdGUgZmlsZXN5c3RlbXMuDQo+IA0KPiBUaGFua3MsDQo+IA0KPiBB
bmRyZXcuDQo+IA0KPiBidWcgcmVwb3J0OiBodHRwczovL2J1Z3ppbGxhLmxpbnV4LW5mcy5vcmcv
c2hvd19idWcuY2dpP2lkPTIyOA0KPiANCj4gW3Jvb3RAbm9kZTAwMSBtbnRdIyBmb3IgZiBpbiA1
MTIgMTAyNCAyMDQ4IDQwOTYgODE5MiAxNjM4NCAzMjc2OCA2NTUzNiAxMzEwNzI7IGRvIGRkIGJz
PSIkZiIgaWY9Q2VudE9TLTYuMy14ODZfNjQtbmV0aW5zdGFsbC5pc28gb2Y9aGVsbG8gaWZsYWc9
ZGlyZWN0IG9mbGFnPWRpcmVjdCAmJiBtZDVzdW0gaGVsbG8gJiYgcm0gLWYgaGVsbG87IGRvbmUN
Cj4gDQo+IDQwOTYwMCswIHJlY29yZHMgaW4NCj4gNDA5NjAwKzAgcmVjb3JkcyBvdXQNCj4gMjA5
NzE1MjAwIGJ5dGVzICgyMTAgTUIpIGNvcGllZCwgNjIuMzY0OSBzLCAzLjQgTUIvcw0KPiBhYWRk
MGZmZTNjOWRmYTM1ZDgzNTRlOTllY2FjOTI3NiAgaGVsbG8gLS0gNTEyIGJ5dGUgYmxvY2sgDQo+
IA0KPiAyMDQ4MDArMCByZWNvcmRzIGluDQo+IDIwNDgwMCswIHJlY29yZHMgb3V0DQo+IDIwOTcx
NTIwMCBieXRlcyAoMjEwIE1CKSBjb3BpZWQsIDQxLjM4NzYgcywgNS4xIE1CL3MNCj4gMzM2ZjZk
YTc4ZjkzZGFiNTkxZWRjMThkYTgxZjAwMmUgIGhlbGxvIC0tIDFLIGJsb2NrDQo+IA0KPiAxMDI0
MDArMCByZWNvcmRzIGluDQo+IDEwMjQwMCswIHJlY29yZHMgb3V0DQo+IDIwOTcxNTIwMCBieXRl
cyAoMjEwIE1CKSBjb3BpZWQsIDIxLjE3MTIgcywgOS45IE1CL3MNCj4gZjRjZWZlMGEwNWM5YjQ3
YmE2OGVmZmRiMTdkYzk1ZDYgIGhlbGxvIC0tIDJrIGJsb2NrDQo+IA0KPiA1MTIwMCswIHJlY29y
ZHMgaW4NCj4gNTEyMDArMCByZWNvcmRzIG91dA0KPiAyMDk3MTUyMDAgYnl0ZXMgKDIxMCBNQikg
Y29waWVkLCAxMC45NjMxIHMsIDE5LjEgTUIvcw0KPiA2OTAxMzg5MDhkZTUxNmI2ZTVkN2QxODBk
MDg1YzNmMyAgaGVsbG8gLS0gNGsgYmxvY2sNCj4gDQo+IDI1NjAwKzAgcmVjb3JkcyBpbg0KPiAy
NTYwMCswIHJlY29yZHMgb3V0DQo+IDIwOTcxNTIwMCBieXRlcyAoMjEwIE1CKSBjb3BpZWQsIDUu
NDEzNiBzLCAzOC43IE1CL3MNCj4gNjkwMTM4OTA4ZGU1MTZiNmU1ZDdkMTgwZDA4NWMzZjMgIGhl
bGxvDQo+IA0KPiAxMjgwMCswIHJlY29yZHMgaW4NCj4gMTI4MDArMCByZWNvcmRzIG91dA0KPiAy
MDk3MTUyMDAgYnl0ZXMgKDIxMCBNQikgY29waWVkLCAzLjE0NDggcywgNjYuNyBNQi9zDQo+IDY5
MDEzODkwOGRlNTE2YjZlNWQ3ZDE4MGQwODVjM2YzICBoZWxsbw0KPiANCj4gNjQwMCswIHJlY29y
ZHMgaW4NCj4gNjQwMCswIHJlY29yZHMgb3V0DQo+IDIwOTcxNTIwMCBieXRlcyAoMjEwIE1CKSBj
b3BpZWQsIDEuNzczMDQgcywgMTE4IE1CL3MNCj4gNjkwMTM4OTA4ZGU1MTZiNmU1ZDdkMTgwZDA4
NWMzZjMgIGhlbGxvDQo+IA0KPiAzMjAwKzAgcmVjb3JkcyBpbg0KPiAzMjAwKzAgcmVjb3JkcyBv
dXQNCj4gMjA5NzE1MjAwIGJ5dGVzICgyMTAgTUIpIGNvcGllZCwgMS40MzMxIHMsIDE0NiBNQi9z
DQo+IDY5MDEzODkwOGRlNTE2YjZlNWQ3ZDE4MGQwODVjM2YzICBoZWxsbw0KPiANCj4gMTYwMCsw
IHJlY29yZHMgaW4NCj4gMTYwMCswIHJlY29yZHMgb3V0DQo+IDIwOTcxNTIwMCBieXRlcyAoMjEw
IE1CKSBjb3BpZWQsIDAuOTIyMTY3IHMsIDIyNyBNQi9zDQo+IDY5MDEzODkwOGRlNTE2YjZlNWQ3
ZDE4MGQwODVjM2YzICBoZWxsbw0KPiANCj4gDQo+IC0tDQo+IFRvIHVuc3Vic2NyaWJlIGZyb20g
dGhpcyBsaXN0OiBzZW5kIHRoZSBsaW5lICJ1bnN1YnNjcmliZSBsaW51eC1uZnMiIGluDQo+IHRo
ZSBib2R5IG9mIGEgbWVzc2FnZSB0byBtYWpvcmRvbW9Admdlci5rZXJuZWwub3JnDQo+IE1vcmUg
bWFqb3Jkb21vIGluZm8gYXQgIGh0dHA6Ly92Z2VyLmtlcm5lbC5vcmcvbWFqb3Jkb21vLWluZm8u
aHRtbA0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWludGFpbmVy
DQoNCk5ldEFwcA0KVHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb20NCnd3dy5uZXRhcHAuY29tDQoN
Cg==

2012-09-05 14:02:52

by Avi Kivity

[permalink] [raw]
Subject: Re: NFS over RDMA small block DIRECT_IO bug

On 09/04/2012 03:04 PM, Myklebust, Trond wrote:
> On Tue, 2012-09-04 at 11:31 +0200, Andrew Holway wrote:
>> Hello.
>>
>> # Avi Kivity avi(a)redhat recommended I copy kvm in on this. It would also seem relevent to libvirt. #
>>
>> I have a Centos 6.2 server and Centos 6.2 client.
>>
>> [root@store ~]# cat /etc/exports
>> /dev/shm 10.149.0.0/16(rw,fsid=1,no_root_squash,insecure) (I have tried with non tempfs targets also)
>>
>>
>> [root@node001 ~]# cat /etc/fstab
>> store.ibnet:/dev/shm /mnt nfs rdma,port=2050,defaults 0 0
>>
>>
>> I wrote a little for loop one liner that dd'd the centos net install image to a file called 'hello' then checksummed that file. Each iteration uses a different block size.
>>
>> Non DIRECT_IO seems to work fine. DIRECT_IO with 512byte, 1K and 2K block sizes get corrupted.
>
>
> That is expected behaviour. DIRECT_IO over RDMA needs to be page aligned
> so that it can use the more efficient RDMA READ and RDMA WRITE memory
> semantics (instead of the SEND/RECEIVE channel semantics).

Shouldn't subpage requests fail then? O_DIRECT block requests fail for
subsector writes, instead of corrupting your data.

Hopefully this is documented somewhere.

--
error compiling committee.c: too many arguments to function

2012-09-19 15:50:21

by Steve Dickson

[permalink] [raw]
Subject: Re: NFS over RDMA small block DIRECT_IO bug



On 09/18/2012 10:03 AM, Andrew Holway wrote:
> Hi Steve,
>
> Do you think these patches will make their way into the redhat kernel sometime soon?
The process would start by opening a bz at bugzilla.redhat.com... If you like, you
can send me the pointer to the bz and I'll make sure it gets noticed...
>
> What is the state of support for NFS over RDMA support at redhat?
In theory its supported, but in reality that post is currently unnamed
which seems to be the case in upstream as well...

steved.

>
> Thanks,
>
> Andrew
>
>
> On Sep 11, 2012, at 7:03 PM, Steve Dickson wrote:
>
>>
>>
>> On 09/04/2012 05:31 AM, Andrew Holway wrote:
>>> Hello.
>>>
>>> # Avi Kivity avi(a)redhat recommended I copy kvm in on this. It would also seem relevent to libvirt. #
>>>
>>> I have a Centos 6.2 server and Centos 6.2 client.
>>>
>>> [root@store ~]# cat /etc/exports
>>> /dev/shm 10.149.0.0/16(rw,fsid=1,no_root_squash,insecure) (I have tried with non tempfs targets also)
>>>
>>>
>>> [root@node001 ~]# cat /etc/fstab
>>> store.ibnet:/dev/shm /mnt nfs rdma,port=2050,defaults 0 0
>>>
>>>
>>> I wrote a little for loop one liner that dd'd the centos net install image to a file called 'hello' then checksummed that file. Each iteration uses a different block size.
>>>
>>> Non DIRECT_IO seems to work fine. DIRECT_IO with 512byte, 1K and 2K block sizes get corrupted.
>>>
>>> I want to run my KVM guests on top of NFS over RDMA. My guests cannot create filesystems.
>>>
>>> Thanks,
>>>
>>> Andrew.
>>>
>>> bug report: https://bugzilla.linux-nfs.org/show_bug.cgi?id=228
>> Well it appears the RHEL6 kernels are lacking a couple patches that might
>> help with this....
>>
>> 5c635e09 RPCRDMA: Fix FRMR registration/invalidate handling.
>> 9b78145c xprtrdma: Remove assumption that each segment is <= PAGE_SIZE
>>
>> I can only image that Centos 6.2 might me lacking these too... ;-)
>>
>> steved.
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2012-09-06 10:14:51

by Andrew Holway

[permalink] [raw]
Subject: Re: NFS over RDMA small block DIRECT_IO bug


On Sep 5, 2012, at 4:02 PM, Avi Kivity wrote:

> On 09/04/2012 03:04 PM, Myklebust, Trond wrote:
>> On Tue, 2012-09-04 at 11:31 +0200, Andrew Holway wrote:
>>> Hello.
>>>
>>> # Avi Kivity avi(a)redhat recommended I copy kvm in on this. It would also seem relevent to libvirt. #
>>>
>>> I have a Centos 6.2 server and Centos 6.2 client.
>>>
>>> [root@store ~]# cat /etc/exports
>>> /dev/shm 10.149.0.0/16(rw,fsid=1,no_root_squash,insecure) (I have tried with non tempfs targets also)
>>>
>>>
>>> [root@node001 ~]# cat /etc/fstab
>>> store.ibnet:/dev/shm /mnt nfs rdma,port=2050,defaults 0 0
>>>
>>>
>>> I wrote a little for loop one liner that dd'd the centos net install image to a file called 'hello' then checksummed that file. Each iteration uses a different block size.
>>>
>>> Non DIRECT_IO seems to work fine. DIRECT_IO with 512byte, 1K and 2K block sizes get corrupted.
>>
>>
>> That is expected behaviour. DIRECT_IO over RDMA needs to be page aligned
>> so that it can use the more efficient RDMA READ and RDMA WRITE memory
>> semantics (instead of the SEND/RECEIVE channel semantics).
>
> Shouldn't subpage requests fail then? O_DIRECT block requests fail for
> subsector writes, instead of corrupting your data.

But silent data corruption is so much fun!!

2012-09-04 12:52:43

by Andrew Holway

[permalink] [raw]
Subject: Re: NFS over RDMA small block DIRECT_IO bug

>
> That is expected behaviour. DIRECT_IO over RDMA needs to be page aligned
> so that it can use the more efficient RDMA READ and RDMA WRITE memory
> semantics (instead of the SEND/RECEIVE channel semantics).

Yes, I think I am understanding that now.

I need to find a way of getting around the lib-virt issue.

http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg01570.html

Thanks,

Andrew


>
>> I want to run my KVM guests on top of NFS over RDMA. My guests cannot create filesystems.
>>
>> Thanks,
>>
>> Andrew.
>>
>> bug report: https://bugzilla.linux-nfs.org/show_bug.cgi?id=228
>>
>> [root@node001 mnt]# for f in 512 1024 2048 4096 8192 16384 32768 65536 131072; do dd bs="$f" if=CentOS-6.3-x86_64-netinstall.iso of=hello iflag=direct oflag=direct && md5sum hello && rm -f hello; done
>>
>> 409600+0 records in
>> 409600+0 records out
>> 209715200 bytes (210 MB) copied, 62.3649 s, 3.4 MB/s
>> aadd0ffe3c9dfa35d8354e99ecac9276 hello -- 512 byte block
>>
>> 204800+0 records in
>> 204800+0 records out
>> 209715200 bytes (210 MB) copied, 41.3876 s, 5.1 MB/s
>> 336f6da78f93dab591edc18da81f002e hello -- 1K block
>>
>> 102400+0 records in
>> 102400+0 records out
>> 209715200 bytes (210 MB) copied, 21.1712 s, 9.9 MB/s
>> f4cefe0a05c9b47ba68effdb17dc95d6 hello -- 2k block
>>
>> 51200+0 records in
>> 51200+0 records out
>> 209715200 bytes (210 MB) copied, 10.9631 s, 19.1 MB/s
>> 690138908de516b6e5d7d180d085c3f3 hello -- 4k block
>>
>> 25600+0 records in
>> 25600+0 records out
>> 209715200 bytes (210 MB) copied, 5.4136 s, 38.7 MB/s
>> 690138908de516b6e5d7d180d085c3f3 hello
>>
>> 12800+0 records in
>> 12800+0 records out
>> 209715200 bytes (210 MB) copied, 3.1448 s, 66.7 MB/s
>> 690138908de516b6e5d7d180d085c3f3 hello
>>
>> 6400+0 records in
>> 6400+0 records out
>> 209715200 bytes (210 MB) copied, 1.77304 s, 118 MB/s
>> 690138908de516b6e5d7d180d085c3f3 hello
>>
>> 3200+0 records in
>> 3200+0 records out
>> 209715200 bytes (210 MB) copied, 1.4331 s, 146 MB/s
>> 690138908de516b6e5d7d180d085c3f3 hello
>>
>> 1600+0 records in
>> 1600+0 records out
>> 209715200 bytes (210 MB) copied, 0.922167 s, 227 MB/s
>> 690138908de516b6e5d7d180d085c3f3 hello
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> [email protected]
> http://www.netapp.com
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>