2012-05-27 05:33:41

by Peng Tao

[permalink] [raw]
Subject: [PATCH 0/3] NFS41: make block layout driver work w/ NFS DIO changes

This applies to Trond's devel branch.

Peng Tao (3):
NFS41: add pnfs_dio_begin/dio_end
pnfsblock: call block plug in bl_dio_begin/end
pnfsblock: bail out unaligned DIO

fs/nfs/blocklayout/blocklayout.c | 35 ++++++++++++++++++++++++++++++
fs/nfs/direct.c | 43 ++++++++++++++++++++++++++++++++++---
fs/nfs/pnfs.h | 38 +++++++++++++++++++++++++++++++++
3 files changed, 112 insertions(+), 4 deletions(-)



2012-05-28 17:14:48

by Peng, Tao

[permalink] [raw]
Subject: Re: [PATCH 1/3] NFS41: add pnfs_dio_begin/dio_end

On Mon, May 28, 2012 at 6:51 PM, Boaz Harrosh <[email protected]> wrote:
> On 05/28/2012 01:44 PM, Boaz Harrosh wrote:
>
>> On 05/28/2012 07:13 AM, [email protected] wrote:
>>
>>>> -----Original Message-----
>>>> From: Myklebust, Trond [mailto:[email protected]]
>>>> Sent: Monday, May 28, 2012 11:43 AM
>>>> To: Peng, Tao
>>>> Cc: [email protected]; [email protected]
>>>> Subject: RE: [PATCH 1/3] NFS41: add pnfs_dio_begin/dio_end
>>>>
>>>> On Sun, 2012-05-27 at 22:30 -0400, [email protected] wrote:
>>>>> I'm afraid there is. There is no way to pass struct blk_plug around pg_init/pg_doio, unless we put
>>>> it in struct nfs_pageio_descriptor, which I think is more intrusive and less efficient as it is only
>>>> useful for block layout driver in DIO case.
>>>>
>>>> Then add a 'void *pg_layout_private' field to nfs_pageio_descriptor and
>>>> allocate the struct blk_plug dynamically.
>>>>
>>
>>> OK. I thought data structure change is more intrusive because it
>>> affects all layout drivers and generic NFS as well. But since you
>>> think it is OK, I will change it as you suggested.
>>>
>>
>>
>> I want a pg_layout_private in nfs_pageio_descriptor as well. I even wrote
>> the ML about it. And everyone agreed. (Just never had time to finish)
>>
>> So yes please do add it. it will have more users.
>>
>
>
> I forgot to say. You might want/need to pass this pg_layout_private
> back to LD->paglist_write/read. (If you need to, I will)
>
> (In fact one optimization I wanted for a long time is to pass
> nfs_pageio_descriptor to paglist_write/read directly instead of
> duplicating all it's members (back and forth). In a new invented
> structures where the worse is that the common code of read and write
> can't be common because it is not the same types.
> But that's for another patch)
>
I also plan to add void *layout_private and unsigned char moreio to
nfs_pageio_header structure, in order to pass necessary information to
pagelist_read/write. Are there any objections against doing so?

The alternative would be to extent pnfs_try_to_read/write_data() and
read/write_pagelist() APIs to pass in the same information.

Which one do you prefer? Please share your thoughts. Thank you very much!

Thanks,
Tao

> Ciao
> Boaz
>
>>
>>> Thanks,
>>> Tao
>>>
>>
>>
>> Thanks
>> Boaz
>
>

2012-05-27 05:33:56

by Peng Tao

[permalink] [raw]
Subject: [PATCH 2/3] pnfsblock: call block plug in bl_dio_begin/end

Signed-off-by: Peng Tao <[email protected]>
---
fs/nfs/blocklayout/blocklayout.c | 15 +++++++++++++++
1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index 7ae8a60..53cb450 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -996,6 +996,19 @@ bl_clear_layoutdriver(struct nfs_server *server)
return 0;
}

+static bool bl_dio_begin(struct inode *inode, const struct iovec *iov,
+ unsigned long nr_segs, loff_t pos,
+ struct blk_plug *plug)
+{
+ blk_start_plug(plug);
+ return true;
+}
+
+void bl_dio_end(struct blk_plug *plug)
+{
+ blk_finish_plug(plug);
+}
+
static const struct nfs_pageio_ops bl_pg_read_ops = {
.pg_init = pnfs_generic_pg_init_read,
.pg_test = pnfs_generic_pg_test,
@@ -1013,6 +1026,8 @@ static struct pnfs_layoutdriver_type blocklayout_type = {
.name = "LAYOUT_BLOCK_VOLUME",
.read_pagelist = bl_read_pagelist,
.write_pagelist = bl_write_pagelist,
+ .dio_begin = bl_dio_begin,
+ .dio_end = bl_dio_end,
.alloc_layout_hdr = bl_alloc_layout_hdr,
.free_layout_hdr = bl_free_layout_hdr,
.alloc_lseg = bl_alloc_lseg,
--
1.7.1.262.g5ef3d


2012-05-28 17:07:26

by Peng, Tao

[permalink] [raw]
Subject: Re: [PATCH 3/3] pnfsblock: bail out unaligned DIO

On Tue, May 29, 2012 at 12:33 AM, Myklebust, Trond
<[email protected]> wrote:
> On Mon, 2012-05-28 at 00:26 -0400, [email protected] wrote:
>> > -----Original Message-----
>> > From: Myklebust, Trond [mailto:[email protected]]
>> > Sent: Monday, May 28, 2012 11:45 AM
>> > To: Peng, Tao
>> > Cc: [email protected]; [email protected]
>> > Subject: RE: [PATCH 3/3] pnfsblock: bail out unaligned DIO
>> >
>> > On Sun, 2012-05-27 at 22:30 -0400, [email protected] wrote:
>> > > As explain in the other mail, it is necessary to have pnfs_dio_begin/end, so I prefer to do the test
>> > as early as possible, which is in pnfs_dio_begin.
>> >
>> > There is no pnfs_dio_begin/end.
>> OK. I will put DIO alignment tests inside pg_init.
>>
>> And any comments on for stable patch in the thread ([PATCH] pnfsblock: bail out page unaligned IO)? If you agree, I will base DIO changes on top of it to avoid conflicts.
>> There are reasons to test alignment at different place for buffer IO and DIO. For buffer IO, pg_init isn't the right place because we don't have nfs page there. For DIO, pg_test isn't the right place because we need to check cross page boundary.
>
> Since all pages in the struct nfs_pageio_descriptor are guaranteed to be
> contiguous, you really only need to check the first and last page in the
> series for alignment.
>
> pg_init() does take the first nfs_page request as its argument and so it
> should be possible to check the alignment of the first page in the
> series there.
> You can then check the alignment of the last page in your
> ->write_pagelist() and return PNFS_NOT_ATTEMPTED if appropriate.
>
Thanks. Will do it as you suggested.

> Note that all applications that use O_DIRECT are expected to use aligned
> memory, since that's what the open() manpage implies is the safe option
> for all filesystems. Unaligned memory is therefore not something that we
> need to optimise for.
>
do_blockdev_direct_IO() is checking memory alignment for direct IO. Is
there any difference between NFS and other FS here?

Cheers,
Tao

2012-05-28 04:13:47

by Peng, Tao

[permalink] [raw]
Subject: RE: [PATCH 1/3] NFS41: add pnfs_dio_begin/dio_end

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBNeWtsZWJ1c3QsIFRyb25kIFtt
YWlsdG86VHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb21dDQo+IFNlbnQ6IE1vbmRheSwgTWF5IDI4
LCAyMDEyIDExOjQzIEFNDQo+IFRvOiBQZW5nLCBUYW8NCj4gQ2M6IGJlcmd3b2xmQGdtYWlsLmNv
bTsgbGludXgtbmZzQHZnZXIua2VybmVsLm9yZw0KPiBTdWJqZWN0OiBSRTogW1BBVENIIDEvM10g
TkZTNDE6IGFkZCBwbmZzX2Rpb19iZWdpbi9kaW9fZW5kDQo+IA0KPiBPbiBTdW4sIDIwMTItMDUt
MjcgYXQgMjI6MzAgLTA0MDAsIHRhby5wZW5nQGVtYy5jb20gd3JvdGU6DQo+ID4gSSdtIGFmcmFp
ZCB0aGVyZSBpcy4gVGhlcmUgaXMgbm8gd2F5IHRvIHBhc3Mgc3RydWN0IGJsa19wbHVnIGFyb3Vu
ZCBwZ19pbml0L3BnX2RvaW8sIHVubGVzcyB3ZSBwdXQNCj4gaXQgaW4gc3RydWN0IG5mc19wYWdl
aW9fZGVzY3JpcHRvciwgd2hpY2ggSSB0aGluayBpcyBtb3JlIGludHJ1c2l2ZSBhbmQgbGVzcyBl
ZmZpY2llbnQgYXMgaXQgaXMgb25seQ0KPiB1c2VmdWwgZm9yIGJsb2NrIGxheW91dCBkcml2ZXIg
aW4gRElPIGNhc2UuDQo+IA0KPiBUaGVuIGFkZCBhICd2b2lkICpwZ19sYXlvdXRfcHJpdmF0ZScg
ZmllbGQgdG8gbmZzX3BhZ2Vpb19kZXNjcmlwdG9yIGFuZA0KPiBhbGxvY2F0ZSB0aGUgc3RydWN0
IGJsa19wbHVnIGR5bmFtaWNhbGx5Lg0KPiANCk9LLiBJIHRob3VnaHQgZGF0YSBzdHJ1Y3R1cmUg
Y2hhbmdlIGlzIG1vcmUgaW50cnVzaXZlIGJlY2F1c2UgaXQgYWZmZWN0cyBhbGwgbGF5b3V0IGRy
aXZlcnMgYW5kIGdlbmVyaWMgTkZTIGFzIHdlbGwuIEJ1dCBzaW5jZSB5b3UgdGhpbmsgaXQgaXMg
T0ssIEkgd2lsbCBjaGFuZ2UgaXQgYXMgeW91IHN1Z2dlc3RlZC4NCg0KVGhhbmtzLA0KVGFvDQoN
Cg==

2012-05-27 16:29:32

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH 1/3] NFS41: add pnfs_dio_begin/dio_end

T24gU3VuLCAyMDEyLTA1LTI3IGF0IDEzOjMyICswODAwLCBQZW5nIFRhbyB3cm90ZToNCj4gVG8g
YWxsb3cgbGF5b3V0IGRyaXZlciBzcGVjaWZpYyBwcmVwYXJhdGlvbiBmb3IgRElPLg0KPiANCj4g
U2lnbmVkLW9mZi1ieTogUGVuZyBUYW8gPHRhby5wZW5nQGVtYy5jb20+DQo+IC0tLQ0KPiAgZnMv
bmZzL2RpcmVjdC5jIHwgICA0MyArKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysr
KystLS0tDQo+ICBmcy9uZnMvcG5mcy5oICAgfCAgIDM4ICsrKysrKysrKysrKysrKysrKysrKysr
KysrKysrKysrKysrKysrDQo+ICAyIGZpbGVzIGNoYW5nZWQsIDc3IGluc2VydGlvbnMoKyksIDQg
ZGVsZXRpb25zKC0pDQo+IA0KPiBkaWZmIC0tZ2l0IGEvZnMvbmZzL2RpcmVjdC5jIGIvZnMvbmZz
L2RpcmVjdC5jDQo+IGluZGV4IGFkMjc3NWQuLjhlOWM4ZTEgMTAwNjQ0DQo+IC0tLSBhL2ZzL25m
cy9kaXJlY3QuYw0KPiArKysgYi9mcy9uZnMvZGlyZWN0LmMNCj4gQEAgLTM4Myw2ICszODMsMzcg
QEAgc3RhdGljIHNzaXplX3QgbmZzX2RpcmVjdF9yZWFkX3NjaGVkdWxlX3NlZ21lbnQoc3RydWN0
IG5mc19wYWdlaW9fZGVzY3JpcHRvciAqZGUNCj4gIAlyZXR1cm4gcmVzdWx0IDwgMCA/IChzc2l6
ZV90KSByZXN1bHQgOiAtRUZBVUxUOw0KPiAgfQ0KPiAgDQo+ICtzdGF0aWMgdm9pZCBuZnNfZGly
ZWN0X3JlYWRfaW5pdChzdHJ1Y3QgbmZzX3BhZ2Vpb19kZXNjcmlwdG9yICpwZ2lvLA0KPiArCQkJ
CSBzdHJ1Y3QgaW5vZGUgKmlub2RlLCBjb25zdCBzdHJ1Y3QgaW92ZWMgKmlvdiwNCj4gKwkJCQkg
dW5zaWduZWQgbG9uZyBucl9zZWdzLCBsb2ZmX3QgcG9zLA0KPiArCQkJCSBzdHJ1Y3QgYmxrX3Bs
dWcgKnBsdWcpDQo+ICt7DQo+ICsJaWYgKHBuZnNfZGlvX2JlZ2luKGlub2RlLCBpb3YsIG5yX3Nl
Z3MsIHBvcywgcGx1ZykpDQo+ICsJCXBuZnNfcGFnZWlvX2luaXRfcmVhZChwZ2lvLCBpbm9kZSwN
Cj4gKwkJCQkgICAgICAmbmZzX2RpcmVjdF9yZWFkX2NvbXBsZXRpb25fb3BzKTsNCj4gKwllbHNl
DQo+ICsJCW5mc19wYWdlaW9faW5pdF9yZWFkX21kcyhwZ2lvLCBpbm9kZSwNCj4gKwkJCQkJICZu
ZnNfZGlyZWN0X3JlYWRfY29tcGxldGlvbl9vcHMpOw0KPiArfQ0KPiArDQo+ICtzdGF0aWMgdm9p
ZCBuZnNfZGlyZWN0X3dyaXRlX2luaXQoc3RydWN0IG5mc19wYWdlaW9fZGVzY3JpcHRvciAqcGdp
bywNCj4gKwkJCQkgIHN0cnVjdCBpbm9kZSAqaW5vZGUsIGNvbnN0IHN0cnVjdCBpb3ZlYyAqaW92
LA0KPiArCQkJCSAgdW5zaWduZWQgbG9uZyBucl9zZWdzLCBsb2ZmX3QgcG9zLA0KPiArCQkJCSAg
c3RydWN0IGJsa19wbHVnICpwbHVnKQ0KPiArew0KPiArCWlmIChwbmZzX2Rpb19iZWdpbihpbm9k
ZSwgaW92LCBucl9zZWdzLCBwb3MsIHBsdWcpKQ0KPiArCQlwbmZzX3BhZ2Vpb19pbml0X3dyaXRl
KHBnaW8sIGlub2RlLCBGTFVTSF9DT05EX1NUQUJMRSwNCj4gKwkJCQkgICAgICAgJm5mc19kaXJl
Y3Rfd3JpdGVfY29tcGxldGlvbl9vcHMpOw0KPiArCWVsc2UNCj4gKwkJbmZzX3BhZ2Vpb19pbml0
X3dyaXRlX21kcyhwZ2lvLCBpbm9kZSwgRkxVU0hfQ09ORF9TVEFCTEUsDQo+ICsJCQkJCSAgJm5m
c19kaXJlY3Rfd3JpdGVfY29tcGxldGlvbl9vcHMpOw0KPiArfQ0KPiArDQo+ICtzdGF0aWMgdm9p
ZCBuZnNfZGlvX2VuZChzdHJ1Y3QgaW5vZGUgKmlub2RlLCBzdHJ1Y3QgYmxrX3BsdWcgKnBsdWcp
DQo+ICt7DQo+ICsJcG5mc19kaW9fZW5kKGlub2RlLCBwbHVnKTsNCj4gK30NCj4gKw0KPiAgc3Rh
dGljIHNzaXplX3QgbmZzX2RpcmVjdF9yZWFkX3NjaGVkdWxlX2lvdmVjKHN0cnVjdCBuZnNfZGly
ZWN0X3JlcSAqZHJlcSwNCj4gIAkJCQkJICAgICAgY29uc3Qgc3RydWN0IGlvdmVjICppb3YsDQo+
ICAJCQkJCSAgICAgIHVuc2lnbmVkIGxvbmcgbnJfc2VncywNCj4gQEAgLTM5Miw5ICs0MjMsMTAg
QEAgc3RhdGljIHNzaXplX3QgbmZzX2RpcmVjdF9yZWFkX3NjaGVkdWxlX2lvdmVjKHN0cnVjdCBu
ZnNfZGlyZWN0X3JlcSAqZHJlcSwNCj4gIAlzc2l6ZV90IHJlc3VsdCA9IC1FSU5WQUw7DQo+ICAJ
c2l6ZV90IHJlcXVlc3RlZF9ieXRlcyA9IDA7DQo+ICAJdW5zaWduZWQgbG9uZyBzZWc7DQo+ICsJ
c3RydWN0IGJsa19wbHVnIHBsdWc7DQo+ICsNCj4gKwluZnNfZGlyZWN0X3JlYWRfaW5pdCgmZGVz
YywgZHJlcS0+aW5vZGUsIGlvdiwgbnJfc2VncywgcG9zLCAmcGx1Zyk7DQo+ICANCj4gLQluZnNf
cGFnZWlvX2luaXRfcmVhZCgmZGVzYywgZHJlcS0+aW5vZGUsDQo+IC0JCQkgICAgICZuZnNfZGly
ZWN0X3JlYWRfY29tcGxldGlvbl9vcHMpOw0KPiAgCWdldF9kcmVxKGRyZXEpOw0KPiAgCWRlc2Mu
cGdfZHJlcSA9IGRyZXE7DQo+ICANCj4gQEAgLTQxMCw2ICs0NDIsNyBAQCBzdGF0aWMgc3NpemVf
dCBuZnNfZGlyZWN0X3JlYWRfc2NoZWR1bGVfaW92ZWMoc3RydWN0IG5mc19kaXJlY3RfcmVxICpk
cmVxLA0KPiAgCX0NCj4gIA0KPiAgCW5mc19wYWdlaW9fY29tcGxldGUoJmRlc2MpOw0KPiArCW5m
c19kaW9fZW5kKGRyZXEtPmlub2RlLCAmcGx1Zyk7DQoNCldoeSBjYW4ndCB5b3UganVzdCBwdXQg
dGhpcyBpbiBwZ19pbml0L3BnX2RvaW8/IFRoZXJlIGlzIGFscmVhZHkgYSBmbGFnDQppbiB0aGUg
bmZzX3BhZ2Vpb19kZXNjcmlwdG9yIHRoYXQgdGVsbHMgeW91IGlmIG1vcmUgaS9vIGlzIGR1ZSAo
c2VlDQpwZ19tb3JlaW8pLg0KVGhlcmUgc2hvdWxkIGJlIG5vIG5lZWQgdG8gYWRkIGV4dHJhIGNh
bGxiYWNrcyBpbiB0aGUgZ2VuZXJpYyBjb2RlIGZvcg0KYW55IG9mIHRoaXMuDQoNCi0tIA0KVHJv
bmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50YWluZXINCg0KTmV0QXBwDQpUcm9u
ZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbQ0Kd3d3Lm5ldGFwcC5jb20NCg0K

2012-05-28 02:30:57

by Peng, Tao

[permalink] [raw]
Subject: RE: [PATCH 1/3] NFS41: add pnfs_dio_begin/dio_end

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBsaW51eC1uZnMtb3duZXJAdmdl
ci5rZXJuZWwub3JnIFttYWlsdG86bGludXgtbmZzLW93bmVyQHZnZXIua2VybmVsLm9yZ10gT24g
QmVoYWxmIE9mIE15a2xlYnVzdCwNCj4gVHJvbmQNCj4gU2VudDogTW9uZGF5LCBNYXkgMjgsIDIw
MTIgMTI6MzAgQU0NCj4gVG86IFBlbmcgVGFvDQo+IENjOiBsaW51eC1uZnNAdmdlci5rZXJuZWwu
b3JnOyBQZW5nLCBUYW8NCj4gU3ViamVjdDogUmU6IFtQQVRDSCAxLzNdIE5GUzQxOiBhZGQgcG5m
c19kaW9fYmVnaW4vZGlvX2VuZA0KPiANCj4gT24gU3VuLCAyMDEyLTA1LTI3IGF0IDEzOjMyICsw
ODAwLCBQZW5nIFRhbyB3cm90ZToNCj4gPiBUbyBhbGxvdyBsYXlvdXQgZHJpdmVyIHNwZWNpZmlj
IHByZXBhcmF0aW9uIGZvciBESU8uDQo+ID4NCj4gPiBTaWduZWQtb2ZmLWJ5OiBQZW5nIFRhbyA8
dGFvLnBlbmdAZW1jLmNvbT4NCj4gPiAtLS0NCj4gPiAgZnMvbmZzL2RpcmVjdC5jIHwgICA0MyAr
KysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKystLS0tDQo+ID4gIGZzL25mcy9w
bmZzLmggICB8ICAgMzggKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysNCj4g
PiAgMiBmaWxlcyBjaGFuZ2VkLCA3NyBpbnNlcnRpb25zKCspLCA0IGRlbGV0aW9ucygtKQ0KPiA+
DQo+ID4gZGlmZiAtLWdpdCBhL2ZzL25mcy9kaXJlY3QuYyBiL2ZzL25mcy9kaXJlY3QuYw0KPiA+
IGluZGV4IGFkMjc3NWQuLjhlOWM4ZTEgMTAwNjQ0DQo+ID4gLS0tIGEvZnMvbmZzL2RpcmVjdC5j
DQo+ID4gKysrIGIvZnMvbmZzL2RpcmVjdC5jDQo+ID4gQEAgLTM4Myw2ICszODMsMzcgQEAgc3Rh
dGljIHNzaXplX3QgbmZzX2RpcmVjdF9yZWFkX3NjaGVkdWxlX3NlZ21lbnQoc3RydWN0IG5mc19w
YWdlaW9fZGVzY3JpcHRvcg0KPiAqZGUNCj4gPiAgCXJldHVybiByZXN1bHQgPCAwID8gKHNzaXpl
X3QpIHJlc3VsdCA6IC1FRkFVTFQ7DQo+ID4gIH0NCj4gPg0KPiA+ICtzdGF0aWMgdm9pZCBuZnNf
ZGlyZWN0X3JlYWRfaW5pdChzdHJ1Y3QgbmZzX3BhZ2Vpb19kZXNjcmlwdG9yICpwZ2lvLA0KPiA+
ICsJCQkJIHN0cnVjdCBpbm9kZSAqaW5vZGUsIGNvbnN0IHN0cnVjdCBpb3ZlYyAqaW92LA0KPiA+
ICsJCQkJIHVuc2lnbmVkIGxvbmcgbnJfc2VncywgbG9mZl90IHBvcywNCj4gPiArCQkJCSBzdHJ1
Y3QgYmxrX3BsdWcgKnBsdWcpDQo+ID4gK3sNCj4gPiArCWlmIChwbmZzX2Rpb19iZWdpbihpbm9k
ZSwgaW92LCBucl9zZWdzLCBwb3MsIHBsdWcpKQ0KPiA+ICsJCXBuZnNfcGFnZWlvX2luaXRfcmVh
ZChwZ2lvLCBpbm9kZSwNCj4gPiArCQkJCSAgICAgICZuZnNfZGlyZWN0X3JlYWRfY29tcGxldGlv
bl9vcHMpOw0KPiA+ICsJZWxzZQ0KPiA+ICsJCW5mc19wYWdlaW9faW5pdF9yZWFkX21kcyhwZ2lv
LCBpbm9kZSwNCj4gPiArCQkJCQkgJm5mc19kaXJlY3RfcmVhZF9jb21wbGV0aW9uX29wcyk7DQo+
ID4gK30NCj4gPiArDQo+ID4gK3N0YXRpYyB2b2lkIG5mc19kaXJlY3Rfd3JpdGVfaW5pdChzdHJ1
Y3QgbmZzX3BhZ2Vpb19kZXNjcmlwdG9yICpwZ2lvLA0KPiA+ICsJCQkJICBzdHJ1Y3QgaW5vZGUg
Kmlub2RlLCBjb25zdCBzdHJ1Y3QgaW92ZWMgKmlvdiwNCj4gPiArCQkJCSAgdW5zaWduZWQgbG9u
ZyBucl9zZWdzLCBsb2ZmX3QgcG9zLA0KPiA+ICsJCQkJICBzdHJ1Y3QgYmxrX3BsdWcgKnBsdWcp
DQo+ID4gK3sNCj4gPiArCWlmIChwbmZzX2Rpb19iZWdpbihpbm9kZSwgaW92LCBucl9zZWdzLCBw
b3MsIHBsdWcpKQ0KPiA+ICsJCXBuZnNfcGFnZWlvX2luaXRfd3JpdGUocGdpbywgaW5vZGUsIEZM
VVNIX0NPTkRfU1RBQkxFLA0KPiA+ICsJCQkJICAgICAgICZuZnNfZGlyZWN0X3dyaXRlX2NvbXBs
ZXRpb25fb3BzKTsNCj4gPiArCWVsc2UNCj4gPiArCQluZnNfcGFnZWlvX2luaXRfd3JpdGVfbWRz
KHBnaW8sIGlub2RlLCBGTFVTSF9DT05EX1NUQUJMRSwNCj4gPiArCQkJCQkgICZuZnNfZGlyZWN0
X3dyaXRlX2NvbXBsZXRpb25fb3BzKTsNCj4gPiArfQ0KPiA+ICsNCj4gPiArc3RhdGljIHZvaWQg
bmZzX2Rpb19lbmQoc3RydWN0IGlub2RlICppbm9kZSwgc3RydWN0IGJsa19wbHVnICpwbHVnKQ0K
PiA+ICt7DQo+ID4gKwlwbmZzX2Rpb19lbmQoaW5vZGUsIHBsdWcpOw0KPiA+ICt9DQo+ID4gKw0K
PiA+ICBzdGF0aWMgc3NpemVfdCBuZnNfZGlyZWN0X3JlYWRfc2NoZWR1bGVfaW92ZWMoc3RydWN0
IG5mc19kaXJlY3RfcmVxICpkcmVxLA0KPiA+ICAJCQkJCSAgICAgIGNvbnN0IHN0cnVjdCBpb3Zl
YyAqaW92LA0KPiA+ICAJCQkJCSAgICAgIHVuc2lnbmVkIGxvbmcgbnJfc2VncywNCj4gPiBAQCAt
MzkyLDkgKzQyMywxMCBAQCBzdGF0aWMgc3NpemVfdCBuZnNfZGlyZWN0X3JlYWRfc2NoZWR1bGVf
aW92ZWMoc3RydWN0IG5mc19kaXJlY3RfcmVxICpkcmVxLA0KPiA+ICAJc3NpemVfdCByZXN1bHQg
PSAtRUlOVkFMOw0KPiA+ICAJc2l6ZV90IHJlcXVlc3RlZF9ieXRlcyA9IDA7DQo+ID4gIAl1bnNp
Z25lZCBsb25nIHNlZzsNCj4gPiArCXN0cnVjdCBibGtfcGx1ZyBwbHVnOw0KPiA+ICsNCj4gPiAr
CW5mc19kaXJlY3RfcmVhZF9pbml0KCZkZXNjLCBkcmVxLT5pbm9kZSwgaW92LCBucl9zZWdzLCBw
b3MsICZwbHVnKTsNCj4gPg0KPiA+IC0JbmZzX3BhZ2Vpb19pbml0X3JlYWQoJmRlc2MsIGRyZXEt
Pmlub2RlLA0KPiA+IC0JCQkgICAgICZuZnNfZGlyZWN0X3JlYWRfY29tcGxldGlvbl9vcHMpOw0K
PiA+ICAJZ2V0X2RyZXEoZHJlcSk7DQo+ID4gIAlkZXNjLnBnX2RyZXEgPSBkcmVxOw0KPiA+DQo+
ID4gQEAgLTQxMCw2ICs0NDIsNyBAQCBzdGF0aWMgc3NpemVfdCBuZnNfZGlyZWN0X3JlYWRfc2No
ZWR1bGVfaW92ZWMoc3RydWN0IG5mc19kaXJlY3RfcmVxICpkcmVxLA0KPiA+ICAJfQ0KPiA+DQo+
ID4gIAluZnNfcGFnZWlvX2NvbXBsZXRlKCZkZXNjKTsNCj4gPiArCW5mc19kaW9fZW5kKGRyZXEt
Pmlub2RlLCAmcGx1Zyk7DQo+IA0KPiBXaHkgY2FuJ3QgeW91IGp1c3QgcHV0IHRoaXMgaW4gcGdf
aW5pdC9wZ19kb2lvPyBUaGVyZSBpcyBhbHJlYWR5IGEgZmxhZw0KPiBpbiB0aGUgbmZzX3BhZ2Vp
b19kZXNjcmlwdG9yIHRoYXQgdGVsbHMgeW91IGlmIG1vcmUgaS9vIGlzIGR1ZSAoc2VlDQo+IHBn
X21vcmVpbykuDQo+IFRoZXJlIHNob3VsZCBiZSBubyBuZWVkIHRvIGFkZCBleHRyYSBjYWxsYmFj
a3MgaW4gdGhlIGdlbmVyaWMgY29kZSBmb3INCj4gYW55IG9mIHRoaXMuDQo+IA0KSSdtIGFmcmFp
ZCB0aGVyZSBpcy4gVGhlcmUgaXMgbm8gd2F5IHRvIHBhc3Mgc3RydWN0IGJsa19wbHVnIGFyb3Vu
ZCBwZ19pbml0L3BnX2RvaW8sIHVubGVzcyB3ZSBwdXQgaXQgaW4gc3RydWN0IG5mc19wYWdlaW9f
ZGVzY3JpcHRvciwgd2hpY2ggSSB0aGluayBpcyBtb3JlIGludHJ1c2l2ZSBhbmQgbGVzcyBlZmZp
Y2llbnQgYXMgaXQgaXMgb25seSB1c2VmdWwgZm9yIGJsb2NrIGxheW91dCBkcml2ZXIgaW4gRElP
IGNhc2UuDQoNCg0KVGhhbmtzLA0KVGFvDQoNCg==

2012-05-28 16:51:10

by Peng, Tao

[permalink] [raw]
Subject: Re: [PATCH 3/3] pnfsblock: bail out unaligned DIO

On Mon, May 28, 2012 at 7:26 PM, Boaz Harrosh <[email protected]> wrote:
> On 05/28/2012 05:30 AM, [email protected] wrote:
>
>>> -----Original Message-----
>>> From: Myklebust, Trond [mailto:[email protected]]
>
> <>
>
>>> Also, why do you consider it to be direct i/o specific? If the
>>> application is using byte range locking, and the locks aren't page/block
>>> aligned then you are in the same position of having to deal with partial
>>> page writes even in the read/write from page cache situation.
>
>
>> You are right about byte range locking + buffered IO, and it should
>> be fixed in pg_test with bellow patch and it could be a stable
>> candidate.
>
>
> What?? please explain. It sounds like you are saying that there is a
> very *serious* bug in current block-layout.
>
> From my experiment I know that lots and lots of IO is done none-paged
> aligned even in buffered IO. Actually NFS goes to great length not to do
> the usual read-modify-write per page, but keeps the byte range that
> was written per page and only RPCs the exact offset-length of the modification.
> Because by definition NFS is byte aligned IO, not "blocks" or "sectors".
>
> Please explain what happens now. Is it a data corruption? Or just
> performance slowness.
>
> I don't understand. Don't you do the proper read-copy-modify-write that's
> mandated by block layout RFC? byte aligned? And what are sectors and PAGES
> got to do with it? I thought all IO must be "block" aligned.
>
block layout code assumes all IO passed to LD is page aligned. For
aligned IO, if extent is initialized, we don't need to write in block
size. And if extent is not initialized, code takes care to zero extra
sectors and handle copy on write extents properly as well. The
statement "all IO must be block aligned" is only valid for
uninitialized extents and it is strictly followed.

> In objects-layout we have even worse alignment constrains with raid5
> (stripe_size alignment). It was needed to do a (very simple BTW)
> read-modify-write. Involving not just partial pages but also full pages read.
> BTW we read into the page-cache the surrounding pages, so not to read them multiple
> times.
>
>> It is different from DIO case because for DIO we have to
>> be sure each page is blocksize aligned. And it can't easily be done
>> in pg_test because in pg_test we only have one nfs_page to test
>> against.
>>
>
> <snip>
>
>> diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
>> index 7ae8a60..a84a0da 100644
>> --- a/fs/nfs/blocklayout/blocklayout.c
>> +++ b/fs/nfs/blocklayout/blocklayout.c
>> @@ -925,6 +925,18 @@ nfs4_blk_get_deviceinfo(struct nfs_server *server, const struct nfs_fh *fh,
>>       return rv;
>>  }
>>
>> +static bool
>> +bl_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
>> +        struct nfs_page *req)
>> +{
>> +     /* Bail out page unligned IO */
>> +     if (req->wb_offset || req->wb_pgbase ||
>> +         req->wb_bytes != PAGE_CACHE_SIZE)
>> +             return false;
>> +
>
>
> This is very serious. Not many applications will currently pass
> this test. (And hence will not do direct IO)
>
> What happens today without this patch?
>
block layout IO path assumes IO is page aligned in many places.
Without the patch, it is likely to cause data corruption when
unaligned IO is passed in to LD. E.g., Page will be submitted to block
layer entirely even if only part of its data is valid. Therefore if
application does byte range locking + partial page write, garbage data
will get written and cause data corruption.

In most buffered write cases, nfs_write_begin takes case of starting
read-modify-write cycle for partial page writes. Please see
nfs_want_read_modify_write(). I guess that's the reason we pass most
test cases. But it is not always the case. And when it isn't, it may
cause data corruption.

I'm really sorry to leave such serious bug here.

Thanks,
Tao
>> +     return pnfs_generic_pg_test(pgio, prev, req);
>> +}
>> +
>
>
> <>
>
> Thanks
> Boaz

2012-05-28 17:25:00

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [PATCH 1/3] NFS41: add pnfs_dio_begin/dio_end

On 05/28/2012 08:14 PM, Peng Tao wrote:

> On Mon, May 28, 2012 at 6:51 PM, Boaz Harrosh <[email protected]> wrote:
>> On 05/28/2012 01:44 PM, Boaz Harrosh wrote:
>>
>>> On 05/28/2012 07:13 AM, [email protected] wrote:
>>>
>>>>> -----Original Message-----
>>>>> From: Myklebust, Trond [mailto:[email protected]]
>>>>> Sent: Monday, May 28, 2012 11:43 AM
>>>>> To: Peng, Tao
>>>>> Cc: [email protected]; [email protected]
>>>>> Subject: RE: [PATCH 1/3] NFS41: add pnfs_dio_begin/dio_end
>>>>>
>>>>> On Sun, 2012-05-27 at 22:30 -0400, [email protected] wrote:
>>>>>> I'm afraid there is. There is no way to pass struct blk_plug around pg_init/pg_doio, unless we put
>>>>> it in struct nfs_pageio_descriptor, which I think is more intrusive and less efficient as it is only
>>>>> useful for block layout driver in DIO case.
>>>>>
>>>>> Then add a 'void *pg_layout_private' field to nfs_pageio_descriptor and
>>>>> allocate the struct blk_plug dynamically.
>>>>>
>>>
>>>> OK. I thought data structure change is more intrusive because it
>>>> affects all layout drivers and generic NFS as well. But since you
>>>> think it is OK, I will change it as you suggested.
>>>>
>>>
>>>
>>> I want a pg_layout_private in nfs_pageio_descriptor as well. I even wrote
>>> the ML about it. And everyone agreed. (Just never had time to finish)
>>>
>>> So yes please do add it. it will have more users.
>>>
>>
>>
>> I forgot to say. You might want/need to pass this pg_layout_private
>> back to LD->paglist_write/read. (If you need to, I will)
>>
>> (In fact one optimization I wanted for a long time is to pass
>> nfs_pageio_descriptor to paglist_write/read directly instead of
>> duplicating all it's members (back and forth). In a new invented
>> structures where the worse is that the common code of read and write
>> can't be common because it is not the same types.
>> But that's for another patch)
>>
> I also plan to add void *layout_private and unsigned char moreio


you mean bool moreio. What is it?

> to
> nfs_pageio_header structure, in order to pass necessary information to
> pagelist_read/write. Are there any objections against doing so?
>


Yes please add it to the passed structure

> The alternative would be to extent pnfs_try_to_read/write_data() and
> read/write_pagelist() APIs to pass in the same information.
>


Don't add new parameters to the functions. Add new fields to the passed
structures.

> Which one do you prefer? Please share your thoughts. Thank you very much!
>
> Thanks,
> Tao
>
>> Ciao
>> Boaz
>>
>>>
>>>> Thanks,
>>>> Tao
>>>>
>>>
>>>
>>> Thanks
>>> Boaz
>>
>>



2012-05-28 03:45:21

by Myklebust, Trond

[permalink] [raw]
Subject: RE: [PATCH 3/3] pnfsblock: bail out unaligned DIO

T24gU3VuLCAyMDEyLTA1LTI3IGF0IDIyOjMwIC0wNDAwLCB0YW8ucGVuZ0BlbWMuY29tIHdyb3Rl
Og0KPiBBcyBleHBsYWluIGluIHRoZSBvdGhlciBtYWlsLCBpdCBpcyBuZWNlc3NhcnkgdG8gaGF2
ZSBwbmZzX2Rpb19iZWdpbi9lbmQsIHNvIEkgcHJlZmVyIHRvIGRvIHRoZSB0ZXN0IGFzIGVhcmx5
IGFzIHBvc3NpYmxlLCB3aGljaCBpcyBpbiBwbmZzX2Rpb19iZWdpbi4NCg0KVGhlcmUgaXMgbm8g
cG5mc19kaW9fYmVnaW4vZW5kLg0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNs
aWVudCBtYWludGFpbmVyDQoNCk5ldEFwcA0KVHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb20NCnd3
dy5uZXRhcHAuY29tDQoNCg==

2012-05-27 16:38:39

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH 3/3] pnfsblock: bail out unaligned DIO

T24gU3VuLCAyMDEyLTA1LTI3IGF0IDEzOjMzICswODAwLCBQZW5nIFRhbyB3cm90ZToNCj4gU2ln
bmVkLW9mZi1ieTogUGVuZyBUYW8gPHRhby5wZW5nQGVtYy5jb20+DQo+IC0tLQ0KPiAgZnMvbmZz
L2Jsb2NrbGF5b3V0L2Jsb2NrbGF5b3V0LmMgfCAgIDIwICsrKysrKysrKysrKysrKysrKysrDQo+
ICAxIGZpbGVzIGNoYW5nZWQsIDIwIGluc2VydGlvbnMoKyksIDAgZGVsZXRpb25zKC0pDQo+IA0K
PiBkaWZmIC0tZ2l0IGEvZnMvbmZzL2Jsb2NrbGF5b3V0L2Jsb2NrbGF5b3V0LmMgYi9mcy9uZnMv
YmxvY2tsYXlvdXQvYmxvY2tsYXlvdXQuYw0KPiBpbmRleCA1M2NiNDUwLi5jZGI4N2E5IDEwMDY0
NA0KPiAtLS0gYS9mcy9uZnMvYmxvY2tsYXlvdXQvYmxvY2tsYXlvdXQuYw0KPiArKysgYi9mcy9u
ZnMvYmxvY2tsYXlvdXQvYmxvY2tsYXlvdXQuYw0KPiBAQCAtMTAwMCw3ICsxMDAwLDI3IEBAIHN0
YXRpYyBib29sIGJsX2Rpb19iZWdpbihzdHJ1Y3QgaW5vZGUgKmlub2RlLCBjb25zdCBzdHJ1Y3Qg
aW92ZWMgKmlvdiwNCj4gIAkJCSB1bnNpZ25lZCBsb25nIG5yX3NlZ3MsIGxvZmZfdCBwb3MsDQo+
ICAJCQkgc3RydWN0IGJsa19wbHVnICpwbHVnKQ0KPiAgew0KPiArCXVuc2lnbmVkIGJsa21hc2sg
PSBORlNfU0VSVkVSKGlub2RlKS0+cG5mc19ibGtzaXplIC0gMTsNCj4gKwlzaXplX3QgY291bnQ7
DQo+ICsJaW50IHNlZzsNCj4gKwl1bnNpZ25lZCBsb25nIGFkZHI7DQo+ICsNCj4gIAlibGtfc3Rh
cnRfcGx1ZyhwbHVnKTsNCj4gKw0KPiArCS8qIE9ubHkgYWxsb3cgYmxrc2l6ZWQgRElPIGZvciBu
b3cuDQo+ICsJICogSW4gdGhlb3J5IHdlIGNhbiBoYW5kbGUgcGFnZSBhbGlnbmVkIERJTyBpbiBj
dXJyZW50IGJsb2NrIGxheW91dA0KPiArCSAqIHJlYWQvd3JpdGUgY29kZSwgYnV0IGl0IHdvdWxk
IHJlcXVpcmUgc2VyaWFsaXphdGlvbiBiZXR3ZWVuDQo+ICsJICogY29uY3VycmVudCB3cml0ZXJz
IGFuZCBpdCBpcyBmYXIgbGVzcyBlZmZlY2llbnQgdGhhbiBqdXN0IHNlbmQgSU8NCj4gKwkgKiB0
byBNRFMuDQo+ICsJICovDQo+ICsJaWYgKHBvcyAmIGJsa21hc2spDQo+ICsJCXJldHVybiBmYWxz
ZTsNCj4gKwlmb3IgKHNlZyA9IDA7IHNlZyA8IG5yX3NlZ3M7IHNlZysrKSB7DQo+ICsJCWFkZHIg
PSAodW5zaWduZWQgbG9uZylpb3Zbc2VnXS5pb3ZfYmFzZTsNCj4gKwkJY291bnQgPSBpb3Zbc2Vn
XS5pb3ZfbGVuOw0KPiArCQlpZiAodW5saWtlbHkoKGFkZHIgJiBibGttYXNrKSB8fCAoY291bnQg
JiBibGttYXNrKSkpDQo+ICsJCQlyZXR1cm4gZmFsc2U7DQo+ICsJfQ0KPiAgCXJldHVybiB0cnVl
Ow0KPiAgfQ0KDQpBZ2FpbiwgdGhpcyBjYW4gYW5kIHNob3VsZCBnbyBpbiB0aGUgZXhpc3Rpbmcg
bmZzX3BhZ2Vpb19vcHMgZWl0aGVyIGluDQp0aGUgcGdfaW5pdCBvciBpbiB0aGUgcGdfdGVzdC4N
Cg0KQWxzbywgd2h5IGRvIHlvdSBjb25zaWRlciBpdCB0byBiZSBkaXJlY3QgaS9vIHNwZWNpZmlj
PyBJZiB0aGUNCmFwcGxpY2F0aW9uIGlzIHVzaW5nIGJ5dGUgcmFuZ2UgbG9ja2luZywgYW5kIHRo
ZSBsb2NrcyBhcmVuJ3QgcGFnZS9ibG9jaw0KYWxpZ25lZCB0aGVuIHlvdSBhcmUgaW4gdGhlIHNh
bWUgcG9zaXRpb24gb2YgaGF2aW5nIHRvIGRlYWwgd2l0aCBwYXJ0aWFsDQpwYWdlIHdyaXRlcyBl
dmVuIGluIHRoZSByZWFkL3dyaXRlIGZyb20gcGFnZSBjYWNoZSBzaXR1YXRpb24uDQoNCi0tIA0K
VHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50YWluZXINCg0KTmV0QXBwDQpU
cm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbQ0Kd3d3Lm5ldGFwcC5jb20NCg0K

2012-05-28 11:26:41

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [PATCH 3/3] pnfsblock: bail out unaligned DIO

On 05/28/2012 05:30 AM, [email protected] wrote:

>> -----Original Message-----
>> From: Myklebust, Trond [mailto:[email protected]]

<>

>> Also, why do you consider it to be direct i/o specific? If the
>> application is using byte range locking, and the locks aren't page/block
>> aligned then you are in the same position of having to deal with partial
>> page writes even in the read/write from page cache situation.


> You are right about byte range locking + buffered IO, and it should
> be fixed in pg_test with bellow patch and it could be a stable
> candidate.


What?? please explain. It sounds like you are saying that there is a
very *serious* bug in current block-layout.

>From my experiment I know that lots and lots of IO is done none-paged
aligned even in buffered IO. Actually NFS goes to great length not to do
the usual read-modify-write per page, but keeps the byte range that
was written per page and only RPCs the exact offset-length of the modification.
Because by definition NFS is byte aligned IO, not "blocks" or "sectors".

Please explain what happens now. Is it a data corruption? Or just
performance slowness.

I don't understand. Don't you do the proper read-copy-modify-write that's
mandated by block layout RFC? byte aligned? And what are sectors and PAGES
got to do with it? I thought all IO must be "block" aligned.

In objects-layout we have even worse alignment constrains with raid5
(stripe_size alignment). It was needed to do a (very simple BTW)
read-modify-write. Involving not just partial pages but also full pages read.
BTW we read into the page-cache the surrounding pages, so not to read them multiple
times.

> It is different from DIO case because for DIO we have to
> be sure each page is blocksize aligned. And it can't easily be done
> in pg_test because in pg_test we only have one nfs_page to test
> against.
>

<snip>

> diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
> index 7ae8a60..a84a0da 100644
> --- a/fs/nfs/blocklayout/blocklayout.c
> +++ b/fs/nfs/blocklayout/blocklayout.c
> @@ -925,6 +925,18 @@ nfs4_blk_get_deviceinfo(struct nfs_server *server, const struct nfs_fh *fh,
> return rv;
> }
>
> +static bool
> +bl_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
> + struct nfs_page *req)
> +{
> + /* Bail out page unligned IO */
> + if (req->wb_offset || req->wb_pgbase ||
> + req->wb_bytes != PAGE_CACHE_SIZE)
> + return false;
> +


This is very serious. Not many applications will currently pass
this test. (And hence will not do direct IO)

What happens today without this patch?

> + return pnfs_generic_pg_test(pgio, prev, req);
> +}
> +


<>

Thanks
Boaz

2012-05-28 03:42:58

by Myklebust, Trond

[permalink] [raw]
Subject: RE: [PATCH 1/3] NFS41: add pnfs_dio_begin/dio_end

T24gU3VuLCAyMDEyLTA1LTI3IGF0IDIyOjMwIC0wNDAwLCB0YW8ucGVuZ0BlbWMuY29tIHdyb3Rl
Og0KPiBJJ20gYWZyYWlkIHRoZXJlIGlzLiBUaGVyZSBpcyBubyB3YXkgdG8gcGFzcyBzdHJ1Y3Qg
YmxrX3BsdWcgYXJvdW5kIHBnX2luaXQvcGdfZG9pbywgdW5sZXNzIHdlIHB1dCBpdCBpbiBzdHJ1
Y3QgbmZzX3BhZ2Vpb19kZXNjcmlwdG9yLCB3aGljaCBJIHRoaW5rIGlzIG1vcmUgaW50cnVzaXZl
IGFuZCBsZXNzIGVmZmljaWVudCBhcyBpdCBpcyBvbmx5IHVzZWZ1bCBmb3IgYmxvY2sgbGF5b3V0
IGRyaXZlciBpbiBESU8gY2FzZS4NCg0KVGhlbiBhZGQgYSAndm9pZCAqcGdfbGF5b3V0X3ByaXZh
dGUnIGZpZWxkIHRvIG5mc19wYWdlaW9fZGVzY3JpcHRvciBhbmQNCmFsbG9jYXRlIHRoZSBzdHJ1
Y3QgYmxrX3BsdWcgZHluYW1pY2FsbHkuDQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBO
RlMgY2xpZW50IG1haW50YWluZXINCg0KTmV0QXBwDQpUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNv
bQ0Kd3d3Lm5ldGFwcC5jb20NCg0K

2012-05-28 10:52:26

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [PATCH 1/3] NFS41: add pnfs_dio_begin/dio_end

On 05/28/2012 01:44 PM, Boaz Harrosh wrote:

> On 05/28/2012 07:13 AM, [email protected] wrote:
>
>>> -----Original Message-----
>>> From: Myklebust, Trond [mailto:[email protected]]
>>> Sent: Monday, May 28, 2012 11:43 AM
>>> To: Peng, Tao
>>> Cc: [email protected]; [email protected]
>>> Subject: RE: [PATCH 1/3] NFS41: add pnfs_dio_begin/dio_end
>>>
>>> On Sun, 2012-05-27 at 22:30 -0400, [email protected] wrote:
>>>> I'm afraid there is. There is no way to pass struct blk_plug around pg_init/pg_doio, unless we put
>>> it in struct nfs_pageio_descriptor, which I think is more intrusive and less efficient as it is only
>>> useful for block layout driver in DIO case.
>>>
>>> Then add a 'void *pg_layout_private' field to nfs_pageio_descriptor and
>>> allocate the struct blk_plug dynamically.
>>>
>
>> OK. I thought data structure change is more intrusive because it
>> affects all layout drivers and generic NFS as well. But since you
>> think it is OK, I will change it as you suggested.
>>
>
>
> I want a pg_layout_private in nfs_pageio_descriptor as well. I even wrote
> the ML about it. And everyone agreed. (Just never had time to finish)
>
> So yes please do add it. it will have more users.
>


I forgot to say. You might want/need to pass this pg_layout_private
back to LD->paglist_write/read. (If you need to, I will)

(In fact one optimization I wanted for a long time is to pass
nfs_pageio_descriptor to paglist_write/read directly instead of
duplicating all it's members (back and forth). In a new invented
structures where the worse is that the common code of read and write
can't be common because it is not the same types.
But that's for another patch)

Ciao
Boaz

>
>> Thanks,
>> Tao
>>
>
>
> Thanks
> Boaz



2012-05-28 10:44:31

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [PATCH 1/3] NFS41: add pnfs_dio_begin/dio_end

On 05/28/2012 07:13 AM, [email protected] wrote:

>> -----Original Message-----
>> From: Myklebust, Trond [mailto:[email protected]]
>> Sent: Monday, May 28, 2012 11:43 AM
>> To: Peng, Tao
>> Cc: [email protected]; [email protected]
>> Subject: RE: [PATCH 1/3] NFS41: add pnfs_dio_begin/dio_end
>>
>> On Sun, 2012-05-27 at 22:30 -0400, [email protected] wrote:
>>> I'm afraid there is. There is no way to pass struct blk_plug around pg_init/pg_doio, unless we put
>> it in struct nfs_pageio_descriptor, which I think is more intrusive and less efficient as it is only
>> useful for block layout driver in DIO case.
>>
>> Then add a 'void *pg_layout_private' field to nfs_pageio_descriptor and
>> allocate the struct blk_plug dynamically.
>>

> OK. I thought data structure change is more intrusive because it
> affects all layout drivers and generic NFS as well. But since you
> think it is OK, I will change it as you suggested.
>


I want a pg_layout_private in nfs_pageio_descriptor as well. I even wrote
the ML about it. And everyone agreed. (Just never had time to finish)

So yes please do add it. it will have more users.


> Thanks,
> Tao
>


Thanks
Boaz

2012-05-28 17:37:11

by Peng, Tao

[permalink] [raw]
Subject: Re: [PATCH 1/3] NFS41: add pnfs_dio_begin/dio_end

On Tue, May 29, 2012 at 1:24 AM, Boaz Harrosh <[email protected]> wrote:
> On 05/28/2012 08:14 PM, Peng Tao wrote:
>
>> On Mon, May 28, 2012 at 6:51 PM, Boaz Harrosh <[email protected]> wrote:
>>> On 05/28/2012 01:44 PM, Boaz Harrosh wrote:
>>>
>>>> On 05/28/2012 07:13 AM, [email protected] wrote:
>>>>
>>>>>> -----Original Message-----
>>>>>> From: Myklebust, Trond [mailto:[email protected]]
>>>>>> Sent: Monday, May 28, 2012 11:43 AM
>>>>>> To: Peng, Tao
>>>>>> Cc: [email protected]; [email protected]
>>>>>> Subject: RE: [PATCH 1/3] NFS41: add pnfs_dio_begin/dio_end
>>>>>>
>>>>>> On Sun, 2012-05-27 at 22:30 -0400, [email protected] wrote:
>>>>>>> I'm afraid there is. There is no way to pass struct blk_plug around pg_init/pg_doio, unless we put
>>>>>> it in struct nfs_pageio_descriptor, which I think is more intrusive and less efficient as it is only
>>>>>> useful for block layout driver in DIO case.
>>>>>>
>>>>>> Then add a 'void *pg_layout_private' field to nfs_pageio_descriptor and
>>>>>> allocate the struct blk_plug dynamically.
>>>>>>
>>>>
>>>>> OK. I thought data structure change is more intrusive because it
>>>>> affects all layout drivers and generic NFS as well. But since you
>>>>> think it is OK, I will change it as you suggested.
>>>>>
>>>>
>>>>
>>>> I want a pg_layout_private in nfs_pageio_descriptor as well. I even wrote
>>>> the ML about it. And everyone agreed. (Just never had time to finish)
>>>>
>>>> So yes please do add it. it will have more users.
>>>>
>>>
>>>
>>> I forgot to say. You might want/need to pass this pg_layout_private
>>> back to LD->paglist_write/read. (If you need to, I will)
>>>
>>> (In fact one optimization I wanted for a long time is to pass
>>> nfs_pageio_descriptor to paglist_write/read directly instead of
>>> duplicating all it's members (back and forth). In a new invented
>>> structures where the worse is that the common code of read and write
>>> can't be common because it is not the same types.
>>> But that's for another patch)
>>>
>> I also plan to add void *layout_private and unsigned char moreio
>
>
> you mean bool moreio. What is it?
>
Yes. It will be assigned as pgio->pg_moreio before calling
LD->read/write_pagelist, to tell LD if more IO is coming. block layout
driver needs it to determine when to call blk_finish_plug().

>> to
>> nfs_pageio_header structure, in order to pass necessary information to
>> pagelist_read/write. Are there any objections against doing so?
>>
>
>
> Yes please add it to the passed structure
>
>> The alternative would be to extent pnfs_try_to_read/write_data() and
>> read/write_pagelist() APIs to pass in the same information.
>>
>
>
> Don't add new parameters to the functions. Add new fields to the passed
> structures.
>
Got it.

Thanks,
Tao

2012-05-27 05:33:52

by Peng Tao

[permalink] [raw]
Subject: [PATCH 1/3] NFS41: add pnfs_dio_begin/dio_end

To allow layout driver specific preparation for DIO.

Signed-off-by: Peng Tao <[email protected]>
---
fs/nfs/direct.c | 43 +++++++++++++++++++++++++++++++++++++++----
fs/nfs/pnfs.h | 38 ++++++++++++++++++++++++++++++++++++++
2 files changed, 77 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index ad2775d..8e9c8e1 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -383,6 +383,37 @@ static ssize_t nfs_direct_read_schedule_segment(struct nfs_pageio_descriptor *de
return result < 0 ? (ssize_t) result : -EFAULT;
}

+static void nfs_direct_read_init(struct nfs_pageio_descriptor *pgio,
+ struct inode *inode, const struct iovec *iov,
+ unsigned long nr_segs, loff_t pos,
+ struct blk_plug *plug)
+{
+ if (pnfs_dio_begin(inode, iov, nr_segs, pos, plug))
+ pnfs_pageio_init_read(pgio, inode,
+ &nfs_direct_read_completion_ops);
+ else
+ nfs_pageio_init_read_mds(pgio, inode,
+ &nfs_direct_read_completion_ops);
+}
+
+static void nfs_direct_write_init(struct nfs_pageio_descriptor *pgio,
+ struct inode *inode, const struct iovec *iov,
+ unsigned long nr_segs, loff_t pos,
+ struct blk_plug *plug)
+{
+ if (pnfs_dio_begin(inode, iov, nr_segs, pos, plug))
+ pnfs_pageio_init_write(pgio, inode, FLUSH_COND_STABLE,
+ &nfs_direct_write_completion_ops);
+ else
+ nfs_pageio_init_write_mds(pgio, inode, FLUSH_COND_STABLE,
+ &nfs_direct_write_completion_ops);
+}
+
+static void nfs_dio_end(struct inode *inode, struct blk_plug *plug)
+{
+ pnfs_dio_end(inode, plug);
+}
+
static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
const struct iovec *iov,
unsigned long nr_segs,
@@ -392,9 +423,10 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
ssize_t result = -EINVAL;
size_t requested_bytes = 0;
unsigned long seg;
+ struct blk_plug plug;
+
+ nfs_direct_read_init(&desc, dreq->inode, iov, nr_segs, pos, &plug);

- nfs_pageio_init_read(&desc, dreq->inode,
- &nfs_direct_read_completion_ops);
get_dreq(dreq);
desc.pg_dreq = dreq;

@@ -410,6 +442,7 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
}

nfs_pageio_complete(&desc);
+ nfs_dio_end(dreq->inode, &plug);

/*
* If no bytes were started, return the error, and let the
@@ -776,9 +809,10 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
ssize_t result = 0;
size_t requested_bytes = 0;
unsigned long seg;
+ struct blk_plug plug;
+
+ nfs_direct_write_init(&desc, inode, iov, nr_segs, pos, &plug);

- nfs_pageio_init_write(&desc, inode, FLUSH_COND_STABLE,
- &nfs_direct_write_completion_ops);
desc.pg_dreq = dreq;
get_dreq(dreq);
atomic_inc(&inode->i_dio_count);
@@ -794,6 +828,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
pos += vec->iov_len;
}
nfs_pageio_complete(&desc);
+ nfs_dio_end(inode, &plug);
NFS_I(dreq->inode)->write_io += desc.pg_bytes_written;

/*
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 29fd23c..b4c7139 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -32,6 +32,7 @@

#include <linux/nfs_fs.h>
#include <linux/nfs_page.h>
+#include <linux/blkdev.h>

enum {
NFS_LSEG_VALID = 0, /* cleared when lseg is recalled/returned */
@@ -110,6 +111,11 @@ struct pnfs_layoutdriver_type {
int how,
struct nfs_commit_info *cinfo);

+ bool (*dio_begin) (struct inode *inode, const struct iovec *iov,
+ unsigned long nr_segs, loff_t pos,
+ struct blk_plug *plug);
+ void (*dio_end) (struct blk_plug *plug);
+
/*
* Return PNFS_ATTEMPTED to indicate the layout code has attempted
* I/O, else return PNFS_NOT_ATTEMPTED to fall back to normal NFS
@@ -369,6 +375,27 @@ pnfs_use_threshold(struct nfs4_threshold **dst, struct nfs4_threshold *src,
nfss->pnfs_curr_ld->id == src->l_type);
}

+static inline bool
+pnfs_dio_begin(struct inode *inode, const struct iovec *iov,
+ unsigned long nr_segs, loff_t pos, struct blk_plug *plug)
+{
+ struct pnfs_layoutdriver_type *ld = NFS_SERVER(inode)->pnfs_curr_ld;
+
+ if (ld == NULL)
+ return false;
+ if (ld->dio_begin == NULL)
+ return true;
+ return ld->dio_begin(inode, iov, nr_segs, pos, plug);
+}
+
+static inline void pnfs_dio_end(struct inode *inode, struct blk_plug *plug)
+{
+ struct pnfs_layoutdriver_type *ld = NFS_SERVER(inode)->pnfs_curr_ld;
+
+ if (ld != NULL && ld->dio_end != NULL)
+ ld->dio_end(plug);
+}
+
#ifdef NFS_DEBUG
void nfs4_print_deviceid(const struct nfs4_deviceid *dev_id);
#else
@@ -506,6 +533,17 @@ static inline struct nfs4_threshold *pnfs_mdsthreshold_alloc(void)
return NULL;
}

+static inline bool
+pnfs_dio_begin(struct inode *inode, const struct iovec *iov,
+ unsigned long nr_segs, loff_t pos, struct blk_plug *plug)
+{
+ return false;
+}
+
+static inline void pnfs_dio_end(struct inode *inode, struct blk_plug *plug)
+{
+}
+
#endif /* CONFIG_NFS_V4_1 */

#endif /* FS_NFS_PNFS_H */
--
1.7.1.262.g5ef3d


2012-05-28 17:10:39

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [PATCH 3/3] pnfsblock: bail out unaligned DIO

On 05/28/2012 07:50 PM, Peng Tao wrote:

>>> +static bool
>>> +bl_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
>>> + struct nfs_page *req)
>>> +{
>>> + /* Bail out page unligned IO */
>>> + if (req->wb_offset || req->wb_pgbase ||
>>> + req->wb_bytes != PAGE_CACHE_SIZE)
>>> + return false;
>>> +
>>


But are you sure these above carry the corrected offsets/sizes.

I know from objlayout that these happen all the time. Is your
nfs_want_read_modify_write() fixes the original offset as well
as >wb_bytes to include more then submitted IO?

I believe you, but just make sure.

Perhaps a big fat dprint, so we can understand: "why the hell my
IO goes through MDS"

What you are saying is that this should not show up at all in prints
unless these corner cases you haven't tested for.

And do you have a test that fails which this patch now fixes?

>>
>> This is very serious. Not many applications will currently pass
>> this test. (And hence will not do direct IO)
>>
>> What happens today without this patch?
>>
> block layout IO path assumes IO is page aligned in many places.
> Without the patch, it is likely to cause data corruption when
> unaligned IO is passed in to LD. E.g., Page will be submitted to block
> layer entirely even if only part of its data is valid. Therefore if
> application does byte range locking + partial page write, garbage data
> will get written and cause data corruption.
>
> In most buffered write cases, nfs_write_begin takes case of starting
> read-modify-write cycle for partial page writes. Please see
> nfs_want_read_modify_write(). I guess that's the reason we pass most
> test cases. But it is not always the case. And when it isn't, it may
> cause data corruption.
>
> I'm really sorry to leave such serious bug here.
>


Yes please make a small as possible fix to send to @stable.

> Thanks,
> Tao


Thanks
Boaz


2012-05-28 16:33:47

by Myklebust, Trond

[permalink] [raw]
Subject: RE: [PATCH 3/3] pnfsblock: bail out unaligned DIO

T24gTW9uLCAyMDEyLTA1LTI4IGF0IDAwOjI2IC0wNDAwLCB0YW8ucGVuZ0BlbWMuY29tIHdyb3Rl
Og0KPiA+IC0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+ID4gRnJvbTogTXlrbGVidXN0LCBU
cm9uZCBbbWFpbHRvOlRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tXQ0KPiA+IFNlbnQ6IE1vbmRh
eSwgTWF5IDI4LCAyMDEyIDExOjQ1IEFNDQo+ID4gVG86IFBlbmcsIFRhbw0KPiA+IENjOiBiZXJn
d29sZkBnbWFpbC5jb207IGxpbnV4LW5mc0B2Z2VyLmtlcm5lbC5vcmcNCj4gPiBTdWJqZWN0OiBS
RTogW1BBVENIIDMvM10gcG5mc2Jsb2NrOiBiYWlsIG91dCB1bmFsaWduZWQgRElPDQo+ID4gDQo+
ID4gT24gU3VuLCAyMDEyLTA1LTI3IGF0IDIyOjMwIC0wNDAwLCB0YW8ucGVuZ0BlbWMuY29tIHdy
b3RlOg0KPiA+ID4gQXMgZXhwbGFpbiBpbiB0aGUgb3RoZXIgbWFpbCwgaXQgaXMgbmVjZXNzYXJ5
IHRvIGhhdmUgcG5mc19kaW9fYmVnaW4vZW5kLCBzbyBJIHByZWZlciB0byBkbyB0aGUgdGVzdA0K
PiA+IGFzIGVhcmx5IGFzIHBvc3NpYmxlLCB3aGljaCBpcyBpbiBwbmZzX2Rpb19iZWdpbi4NCj4g
PiANCj4gPiBUaGVyZSBpcyBubyBwbmZzX2Rpb19iZWdpbi9lbmQuDQo+IE9LLiBJIHdpbGwgcHV0
IERJTyBhbGlnbm1lbnQgdGVzdHMgaW5zaWRlIHBnX2luaXQuDQo+IA0KPiBBbmQgYW55IGNvbW1l
bnRzIG9uIGZvciBzdGFibGUgcGF0Y2ggaW4gdGhlIHRocmVhZCAoW1BBVENIXSBwbmZzYmxvY2s6
IGJhaWwgb3V0IHBhZ2UgdW5hbGlnbmVkIElPKT8gSWYgeW91IGFncmVlLCBJIHdpbGwgYmFzZSBE
SU8gY2hhbmdlcyBvbiB0b3Agb2YgaXQgdG8gYXZvaWQgY29uZmxpY3RzLg0KPiBUaGVyZSBhcmUg
cmVhc29ucyB0byB0ZXN0IGFsaWdubWVudCBhdCBkaWZmZXJlbnQgcGxhY2UgZm9yIGJ1ZmZlciBJ
TyBhbmQgRElPLiBGb3IgYnVmZmVyIElPLCBwZ19pbml0IGlzbid0IHRoZSByaWdodCBwbGFjZSBi
ZWNhdXNlIHdlIGRvbid0IGhhdmUgbmZzIHBhZ2UgdGhlcmUuIEZvciBESU8sIHBnX3Rlc3QgaXNu
J3QgdGhlIHJpZ2h0IHBsYWNlIGJlY2F1c2Ugd2UgbmVlZCB0byBjaGVjayBjcm9zcyBwYWdlIGJv
dW5kYXJ5Lg0KDQpTaW5jZSBhbGwgcGFnZXMgaW4gdGhlIHN0cnVjdCBuZnNfcGFnZWlvX2Rlc2Ny
aXB0b3IgYXJlIGd1YXJhbnRlZWQgdG8gYmUNCmNvbnRpZ3VvdXMsIHlvdSByZWFsbHkgb25seSBu
ZWVkIHRvIGNoZWNrIHRoZSBmaXJzdCBhbmQgbGFzdCBwYWdlIGluIHRoZQ0Kc2VyaWVzIGZvciBh
bGlnbm1lbnQuDQoNCnBnX2luaXQoKSBkb2VzIHRha2UgdGhlIGZpcnN0IG5mc19wYWdlIHJlcXVl
c3QgYXMgaXRzIGFyZ3VtZW50IGFuZCBzbyBpdA0Kc2hvdWxkIGJlIHBvc3NpYmxlIHRvIGNoZWNr
IHRoZSBhbGlnbm1lbnQgb2YgdGhlIGZpcnN0IHBhZ2UgaW4gdGhlDQpzZXJpZXMgdGhlcmUuDQpZ
b3UgY2FuIHRoZW4gY2hlY2sgdGhlIGFsaWdubWVudCBvZiB0aGUgbGFzdCBwYWdlIGluIHlvdXIN
Ci0+d3JpdGVfcGFnZWxpc3QoKSBhbmQgcmV0dXJuIFBORlNfTk9UX0FUVEVNUFRFRCBpZiBhcHBy
b3ByaWF0ZS4NCg0KTm90ZSB0aGF0IGFsbCBhcHBsaWNhdGlvbnMgdGhhdCB1c2UgT19ESVJFQ1Qg
YXJlIGV4cGVjdGVkIHRvIHVzZSBhbGlnbmVkDQptZW1vcnksIHNpbmNlIHRoYXQncyB3aGF0IHRo
ZSBvcGVuKCkgbWFucGFnZSBpbXBsaWVzIGlzIHRoZSBzYWZlIG9wdGlvbg0KZm9yIGFsbCBmaWxl
c3lzdGVtcy4gVW5hbGlnbmVkIG1lbW9yeSBpcyB0aGVyZWZvcmUgbm90IHNvbWV0aGluZyB0aGF0
IHdlDQpuZWVkIHRvIG9wdGltaXNlIGZvci4NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4
IE5GUyBjbGllbnQgbWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAu
Y29tDQp3d3cubmV0YXBwLmNvbQ0KDQo=

2012-05-28 04:26:47

by Peng, Tao

[permalink] [raw]
Subject: RE: [PATCH 3/3] pnfsblock: bail out unaligned DIO

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBNeWtsZWJ1c3QsIFRyb25kIFtt
YWlsdG86VHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb21dDQo+IFNlbnQ6IE1vbmRheSwgTWF5IDI4
LCAyMDEyIDExOjQ1IEFNDQo+IFRvOiBQZW5nLCBUYW8NCj4gQ2M6IGJlcmd3b2xmQGdtYWlsLmNv
bTsgbGludXgtbmZzQHZnZXIua2VybmVsLm9yZw0KPiBTdWJqZWN0OiBSRTogW1BBVENIIDMvM10g
cG5mc2Jsb2NrOiBiYWlsIG91dCB1bmFsaWduZWQgRElPDQo+IA0KPiBPbiBTdW4sIDIwMTItMDUt
MjcgYXQgMjI6MzAgLTA0MDAsIHRhby5wZW5nQGVtYy5jb20gd3JvdGU6DQo+ID4gQXMgZXhwbGFp
biBpbiB0aGUgb3RoZXIgbWFpbCwgaXQgaXMgbmVjZXNzYXJ5IHRvIGhhdmUgcG5mc19kaW9fYmVn
aW4vZW5kLCBzbyBJIHByZWZlciB0byBkbyB0aGUgdGVzdA0KPiBhcyBlYXJseSBhcyBwb3NzaWJs
ZSwgd2hpY2ggaXMgaW4gcG5mc19kaW9fYmVnaW4uDQo+IA0KPiBUaGVyZSBpcyBubyBwbmZzX2Rp
b19iZWdpbi9lbmQuDQpPSy4gSSB3aWxsIHB1dCBESU8gYWxpZ25tZW50IHRlc3RzIGluc2lkZSBw
Z19pbml0Lg0KDQpBbmQgYW55IGNvbW1lbnRzIG9uIGZvciBzdGFibGUgcGF0Y2ggaW4gdGhlIHRo
cmVhZCAoW1BBVENIXSBwbmZzYmxvY2s6IGJhaWwgb3V0IHBhZ2UgdW5hbGlnbmVkIElPKT8gSWYg
eW91IGFncmVlLCBJIHdpbGwgYmFzZSBESU8gY2hhbmdlcyBvbiB0b3Agb2YgaXQgdG8gYXZvaWQg
Y29uZmxpY3RzLg0KVGhlcmUgYXJlIHJlYXNvbnMgdG8gdGVzdCBhbGlnbm1lbnQgYXQgZGlmZmVy
ZW50IHBsYWNlIGZvciBidWZmZXIgSU8gYW5kIERJTy4gRm9yIGJ1ZmZlciBJTywgcGdfaW5pdCBp
c24ndCB0aGUgcmlnaHQgcGxhY2UgYmVjYXVzZSB3ZSBkb24ndCBoYXZlIG5mcyBwYWdlIHRoZXJl
LiBGb3IgRElPLCBwZ190ZXN0IGlzbid0IHRoZSByaWdodCBwbGFjZSBiZWNhdXNlIHdlIG5lZWQg
dG8gY2hlY2sgY3Jvc3MgcGFnZSBib3VuZGFyeS4NCg0KVGhhbmtzLA0KVGFvDQoNCg==

2012-05-27 05:34:00

by Peng Tao

[permalink] [raw]
Subject: [PATCH 3/3] pnfsblock: bail out unaligned DIO

Signed-off-by: Peng Tao <[email protected]>
---
fs/nfs/blocklayout/blocklayout.c | 20 ++++++++++++++++++++
1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index 53cb450..cdb87a9 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -1000,7 +1000,27 @@ static bool bl_dio_begin(struct inode *inode, const struct iovec *iov,
unsigned long nr_segs, loff_t pos,
struct blk_plug *plug)
{
+ unsigned blkmask = NFS_SERVER(inode)->pnfs_blksize - 1;
+ size_t count;
+ int seg;
+ unsigned long addr;
+
blk_start_plug(plug);
+
+ /* Only allow blksized DIO for now.
+ * In theory we can handle page aligned DIO in current block layout
+ * read/write code, but it would require serialization between
+ * concurrent writers and it is far less effecient than just send IO
+ * to MDS.
+ */
+ if (pos & blkmask)
+ return false;
+ for (seg = 0; seg < nr_segs; seg++) {
+ addr = (unsigned long)iov[seg].iov_base;
+ count = iov[seg].iov_len;
+ if (unlikely((addr & blkmask) || (count & blkmask)))
+ return false;
+ }
return true;
}

--
1.7.1.262.g5ef3d


2012-05-28 17:44:26

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH 1/3] NFS41: add pnfs_dio_begin/dio_end

T24gVHVlLCAyMDEyLTA1LTI5IGF0IDAxOjM2ICswODAwLCBQZW5nIFRhbyB3cm90ZToNCj4gT24g
VHVlLCBNYXkgMjksIDIwMTIgYXQgMToyNCBBTSwgQm9heiBIYXJyb3NoIDxiaGFycm9zaEBwYW5h
c2FzLmNvbT4gd3JvdGU6DQo+ID4gT24gMDUvMjgvMjAxMiAwODoxNCBQTSwgUGVuZyBUYW8gd3Jv
dGU6DQo+ID4gRG9uJ3QgYWRkIG5ldyBwYXJhbWV0ZXJzIHRvIHRoZSBmdW5jdGlvbnMuIEFkZCBu
ZXcgZmllbGRzIHRvIHRoZSBwYXNzZWQNCj4gPiBzdHJ1Y3R1cmVzLg0KPiA+DQo+IEdvdCBpdC4N
Cg0KVGhlIG90aGVyIHRoaW5nIGlzIHRoYXQgc2luY2Ugc3RydWN0IG5mc19wYWdlaW9fZGVzY3Jp
cHRvciBpcyBhbGxvY2F0ZWQNCm9uIHRoZSBzdGFjaywgbGV0J3MgcGxlYXNlIG5vdCBncm93IGl0
IHRvbyBtdWNoIGZ1cnRoZXIuIElmIHlvdSBuZWVkIHRvDQphZGQgc3RydWN0dXJlcyB0aGF0IGFy
ZSBwcml2YXRlIHRvIHlvdXIgbGF5b3V0IGRyaXZlciwgdGhlbiBwbGVhc2UNCmFsbG9jYXRlIHRo
b3NlIHN0cnVjdHVyZXMgZHluYW1pY2FsbHksIGFuZCB1c2UgYSB2b2lkICpwZ19sYXlvdXRfcHJp
dmF0ZQ0KdG8gcG9pbnQgdG8gdGhlbS4NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5G
UyBjbGllbnQgbWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29t
DQp3d3cubmV0YXBwLmNvbQ0KDQo=

2012-05-28 02:30:49

by Peng, Tao

[permalink] [raw]
Subject: RE: [PATCH 3/3] pnfsblock: bail out unaligned DIO

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBNeWtsZWJ1c3QsIFRyb25kIFtt
YWlsdG86VHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb21dDQo+IFNlbnQ6IE1vbmRheSwgTWF5IDI4
LCAyMDEyIDEyOjM5IEFNDQo+IFRvOiBQZW5nIFRhbw0KPiBDYzogbGludXgtbmZzQHZnZXIua2Vy
bmVsLm9yZzsgUGVuZywgVGFvDQo+IFN1YmplY3Q6IFJlOiBbUEFUQ0ggMy8zXSBwbmZzYmxvY2s6
IGJhaWwgb3V0IHVuYWxpZ25lZCBESU8NCj4gDQo+IE9uIFN1biwgMjAxMi0wNS0yNyBhdCAxMzoz
MyArMDgwMCwgUGVuZyBUYW8gd3JvdGU6DQo+ID4gU2lnbmVkLW9mZi1ieTogUGVuZyBUYW8gPHRh
by5wZW5nQGVtYy5jb20+DQo+ID4gLS0tDQo+ID4gIGZzL25mcy9ibG9ja2xheW91dC9ibG9ja2xh
eW91dC5jIHwgICAyMCArKysrKysrKysrKysrKysrKysrKw0KPiA+ICAxIGZpbGVzIGNoYW5nZWQs
IDIwIGluc2VydGlvbnMoKyksIDAgZGVsZXRpb25zKC0pDQo+ID4NCj4gPiBkaWZmIC0tZ2l0IGEv
ZnMvbmZzL2Jsb2NrbGF5b3V0L2Jsb2NrbGF5b3V0LmMgYi9mcy9uZnMvYmxvY2tsYXlvdXQvYmxv
Y2tsYXlvdXQuYw0KPiA+IGluZGV4IDUzY2I0NTAuLmNkYjg3YTkgMTAwNjQ0DQo+ID4gLS0tIGEv
ZnMvbmZzL2Jsb2NrbGF5b3V0L2Jsb2NrbGF5b3V0LmMNCj4gPiArKysgYi9mcy9uZnMvYmxvY2ts
YXlvdXQvYmxvY2tsYXlvdXQuYw0KPiA+IEBAIC0xMDAwLDcgKzEwMDAsMjcgQEAgc3RhdGljIGJv
b2wgYmxfZGlvX2JlZ2luKHN0cnVjdCBpbm9kZSAqaW5vZGUsIGNvbnN0IHN0cnVjdCBpb3ZlYyAq
aW92LA0KPiA+ICAJCQkgdW5zaWduZWQgbG9uZyBucl9zZWdzLCBsb2ZmX3QgcG9zLA0KPiA+ICAJ
CQkgc3RydWN0IGJsa19wbHVnICpwbHVnKQ0KPiA+ICB7DQo+ID4gKwl1bnNpZ25lZCBibGttYXNr
ID0gTkZTX1NFUlZFUihpbm9kZSktPnBuZnNfYmxrc2l6ZSAtIDE7DQo+ID4gKwlzaXplX3QgY291
bnQ7DQo+ID4gKwlpbnQgc2VnOw0KPiA+ICsJdW5zaWduZWQgbG9uZyBhZGRyOw0KPiA+ICsNCj4g
PiAgCWJsa19zdGFydF9wbHVnKHBsdWcpOw0KPiA+ICsNCj4gPiArCS8qIE9ubHkgYWxsb3cgYmxr
c2l6ZWQgRElPIGZvciBub3cuDQo+ID4gKwkgKiBJbiB0aGVvcnkgd2UgY2FuIGhhbmRsZSBwYWdl
IGFsaWduZWQgRElPIGluIGN1cnJlbnQgYmxvY2sgbGF5b3V0DQo+ID4gKwkgKiByZWFkL3dyaXRl
IGNvZGUsIGJ1dCBpdCB3b3VsZCByZXF1aXJlIHNlcmlhbGl6YXRpb24gYmV0d2Vlbg0KPiA+ICsJ
ICogY29uY3VycmVudCB3cml0ZXJzIGFuZCBpdCBpcyBmYXIgbGVzcyBlZmZlY2llbnQgdGhhbiBq
dXN0IHNlbmQgSU8NCj4gPiArCSAqIHRvIE1EUy4NCj4gPiArCSAqLw0KPiA+ICsJaWYgKHBvcyAm
IGJsa21hc2spDQo+ID4gKwkJcmV0dXJuIGZhbHNlOw0KPiA+ICsJZm9yIChzZWcgPSAwOyBzZWcg
PCBucl9zZWdzOyBzZWcrKykgew0KPiA+ICsJCWFkZHIgPSAodW5zaWduZWQgbG9uZylpb3Zbc2Vn
XS5pb3ZfYmFzZTsNCj4gPiArCQljb3VudCA9IGlvdltzZWddLmlvdl9sZW47DQo+ID4gKwkJaWYg
KHVubGlrZWx5KChhZGRyICYgYmxrbWFzaykgfHwgKGNvdW50ICYgYmxrbWFzaykpKQ0KPiA+ICsJ
CQlyZXR1cm4gZmFsc2U7DQo+ID4gKwl9DQo+ID4gIAlyZXR1cm4gdHJ1ZTsNCj4gPiAgfQ0KPiAN
Cj4gQWdhaW4sIHRoaXMgY2FuIGFuZCBzaG91bGQgZ28gaW4gdGhlIGV4aXN0aW5nIG5mc19wYWdl
aW9fb3BzIGVpdGhlciBpbg0KPiB0aGUgcGdfaW5pdCBvciBpbiB0aGUgcGdfdGVzdC4NCj4gDQpB
cyBleHBsYWluIGluIHRoZSBvdGhlciBtYWlsLCBpdCBpcyBuZWNlc3NhcnkgdG8gaGF2ZSBwbmZz
X2Rpb19iZWdpbi9lbmQsIHNvIEkgcHJlZmVyIHRvIGRvIHRoZSB0ZXN0IGFzIGVhcmx5IGFzIHBv
c3NpYmxlLCB3aGljaCBpcyBpbiBwbmZzX2Rpb19iZWdpbi4NCg0KPiBBbHNvLCB3aHkgZG8geW91
IGNvbnNpZGVyIGl0IHRvIGJlIGRpcmVjdCBpL28gc3BlY2lmaWM/IElmIHRoZQ0KPiBhcHBsaWNh
dGlvbiBpcyB1c2luZyBieXRlIHJhbmdlIGxvY2tpbmcsIGFuZCB0aGUgbG9ja3MgYXJlbid0IHBh
Z2UvYmxvY2sNCj4gYWxpZ25lZCB0aGVuIHlvdSBhcmUgaW4gdGhlIHNhbWUgcG9zaXRpb24gb2Yg
aGF2aW5nIHRvIGRlYWwgd2l0aCBwYXJ0aWFsDQo+IHBhZ2Ugd3JpdGVzIGV2ZW4gaW4gdGhlIHJl
YWQvd3JpdGUgZnJvbSBwYWdlIGNhY2hlIHNpdHVhdGlvbi4NCllvdSBhcmUgcmlnaHQgYWJvdXQg
Ynl0ZSByYW5nZSBsb2NraW5nICsgYnVmZmVyZWQgSU8sIGFuZCBpdCBzaG91bGQgYmUgZml4ZWQg
aW4gcGdfdGVzdCB3aXRoIGJlbGxvdyBwYXRjaCBhbmQgaXQgY291bGQgYmUgYSBzdGFibGUgY2Fu
ZGlkYXRlLiBJdCBpcyBkaWZmZXJlbnQgZnJvbSBESU8gY2FzZSBiZWNhdXNlIGZvciBESU8gd2Ug
aGF2ZSB0byBiZSBzdXJlIGVhY2ggcGFnZSBpcyBibG9ja3NpemUgYWxpZ25lZC4gQW5kIGl0IGNh
bid0IGVhc2lseSBiZSBkb25lIGluIHBnX3Rlc3QgYmVjYXVzZSBpbiBwZ190ZXN0IHdlIG9ubHkg
aGF2ZSBvbmUgbmZzX3BhZ2UgdG8gdGVzdCBhZ2FpbnN0Lg0KDQpUaGFua3MgZm9yIHJldmlld2lu
Zy4NCg0KQ2hlZXJzLA0KVGFvDQoNCkZyb20gMTRhNGQ1ODNmYWU4ZWUwMWMzNGZkMTdkOTIzMDAx
YWFkYTFkNDI2ZCBNb24gU2VwIDE3IDAwOjAwOjAwIDIwMDENCkZyb206IFBlbmcgVGFvIDxiZXJn
d29sZkBnbWFpbC5jb20+DQpEYXRlOiBGcmksIDI1IE1heSAyMDEyIDIxOjI1OjA3ICswODAwDQpT
dWJqZWN0OiBbUEFUQ0hdIHBuZnNibG9jazogYmFpbCBvdXQgcGFnZSB1bmFsaWduZWQgSU8NCg0K
YmxvY2sgbGF5b3V0IGRyaXZlciBjYW5ub3QgaGFuZGxlIHRoZXNlIElPLg0KDQpDYzogc3RhYmxl
QHZnZXIua2VybmVsLm9yZw0KU2lnbmVkLW9mZi1ieTogUGVuZyBUYW8gPGJlcmd3b2xmQGdtYWls
LmNvbT4NCi0tLQ0KIGZzL25mcy9ibG9ja2xheW91dC9ibG9ja2xheW91dC5jIHwgICAxNiArKysr
KysrKysrKysrKy0tDQogMSBmaWxlcyBjaGFuZ2VkLCAxNCBpbnNlcnRpb25zKCspLCAyIGRlbGV0
aW9ucygtKQ0KDQpkaWZmIC0tZ2l0IGEvZnMvbmZzL2Jsb2NrbGF5b3V0L2Jsb2NrbGF5b3V0LmMg
Yi9mcy9uZnMvYmxvY2tsYXlvdXQvYmxvY2tsYXlvdXQuYw0KaW5kZXggN2FlOGE2MC4uYTg0YTBk
YSAxMDA2NDQNCi0tLSBhL2ZzL25mcy9ibG9ja2xheW91dC9ibG9ja2xheW91dC5jDQorKysgYi9m
cy9uZnMvYmxvY2tsYXlvdXQvYmxvY2tsYXlvdXQuYw0KQEAgLTkyNSw2ICs5MjUsMTggQEAgbmZz
NF9ibGtfZ2V0X2RldmljZWluZm8oc3RydWN0IG5mc19zZXJ2ZXIgKnNlcnZlciwgY29uc3Qgc3Ry
dWN0IG5mc19maCAqZmgsDQogCXJldHVybiBydjsNCiB9DQogDQorc3RhdGljIGJvb2wNCitibF9w
Z190ZXN0KHN0cnVjdCBuZnNfcGFnZWlvX2Rlc2NyaXB0b3IgKnBnaW8sIHN0cnVjdCBuZnNfcGFn
ZSAqcHJldiwNCisJICAgc3RydWN0IG5mc19wYWdlICpyZXEpDQorew0KKwkvKiBCYWlsIG91dCBw
YWdlIHVubGlnbmVkIElPICovDQorCWlmIChyZXEtPndiX29mZnNldCB8fCByZXEtPndiX3BnYmFz
ZSB8fA0KKwkgICAgcmVxLT53Yl9ieXRlcyAhPSBQQUdFX0NBQ0hFX1NJWkUpDQorCQlyZXR1cm4g
ZmFsc2U7DQorDQorCXJldHVybiBwbmZzX2dlbmVyaWNfcGdfdGVzdChwZ2lvLCBwcmV2LCByZXEp
Ow0KK30NCisNCiBzdGF0aWMgaW50DQogYmxfc2V0X2xheW91dGRyaXZlcihzdHJ1Y3QgbmZzX3Nl
cnZlciAqc2VydmVyLCBjb25zdCBzdHJ1Y3QgbmZzX2ZoICpmaCkNCiB7DQpAQCAtOTk4LDEzICsx
MDEwLDEzIEBAIGJsX2NsZWFyX2xheW91dGRyaXZlcihzdHJ1Y3QgbmZzX3NlcnZlciAqc2VydmVy
KQ0KIA0KIHN0YXRpYyBjb25zdCBzdHJ1Y3QgbmZzX3BhZ2Vpb19vcHMgYmxfcGdfcmVhZF9vcHMg
PSB7DQogCS5wZ19pbml0ID0gcG5mc19nZW5lcmljX3BnX2luaXRfcmVhZCwNCi0JLnBnX3Rlc3Qg
PSBwbmZzX2dlbmVyaWNfcGdfdGVzdCwNCisJLnBnX3Rlc3QgPSBibF9wZ190ZXN0LA0KIAkucGdf
ZG9pbyA9IHBuZnNfZ2VuZXJpY19wZ19yZWFkcGFnZXMsDQogfTsNCiANCiBzdGF0aWMgY29uc3Qg
c3RydWN0IG5mc19wYWdlaW9fb3BzIGJsX3BnX3dyaXRlX29wcyA9IHsNCiAJLnBnX2luaXQgPSBw
bmZzX2dlbmVyaWNfcGdfaW5pdF93cml0ZSwNCi0JLnBnX3Rlc3QgPSBwbmZzX2dlbmVyaWNfcGdf
dGVzdCwNCisJLnBnX3Rlc3QgPSBibF9wZ190ZXN0LA0KIAkucGdfZG9pbyA9IHBuZnNfZ2VuZXJp
Y19wZ193cml0ZXBhZ2VzLA0KIH07DQogDQotLSANCjEuNy43LjYNCg==