2018-09-11 17:28:51

by Kinglong Mee

[permalink] [raw]
Subject: lseek gets bad offset from nfs client with ganesha/gluster which supports SEEK

The latest ganesha/gluster supports seek according to,

https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41#section-15.11

From the given sa_offset, find the next data_content4 of type sa_what
in the file. If the server can not find a corresponding sa_what,
then the status will still be NFS4_OK, but sr_eof would be TRUE. If
the server can find the sa_what, then the sr_offset is the start of
that content. If the sa_offset is beyond the end of the file, then
SEEK MUST return NFS4ERR_NXIO.

For a file's filemap as,

Part 1: HOLE 0x0000000000000000 ---> 0x0000000000600000
Part 2: DATA 0x0000000000600000 ---> 0x0000000000700000
Part 3: HOLE 0x0000000000700000 ---> 0x0000000001000000

SEEK(0x700000, SEEK_DATA) gets result (sr_eof:1, sr_offset:0x70000) from ganesha/gluster;
SEEK(0x700000, SEEK_HOLE) gets result (sr_eof:0, sr_offset:0x70000) from ganesha/gluster.

If an application depends the lseek result for data searching, it may enter infinite loop.

while (1) {
next_pos = lseek(fd, cur_pos, seek_type);
if (seek_type == SEEK_DATA) {
seek_type = SEEK_HOLE;
} else {
seek_type = SEEK_DATA;
}

if (next_pos == -1) {
return ;

cur_pos = next_pos;
}

The lseek syscall always gets 0x70000 from nfs client for those two cases,
but, if underlying filesystem is ext4/f2fs, or the nfs server is knfsd,
the lseek(0x700000, SEEK_DATA) gets ENXIO.

I wanna to know,
should I fix the ganesha/gluster as knfsd return ENXIO for the first case?
or should I fix the nfs client to return ENXIO for the first case?

thanks,
Kinglong Mee


2018-09-11 17:56:21

by Trond Myklebust

[permalink] [raw]
Subject: Re: lseek gets bad offset from nfs client with ganesha/gluster which supports SEEK

T24gVHVlLCAyMDE4LTA5LTExIGF0IDIwOjI5ICswODAwLCBLaW5nbG9uZyBNZWUgd3JvdGU6DQo+
IFRoZSBsYXRlc3QgZ2FuZXNoYS9nbHVzdGVyIHN1cHBvcnRzIHNlZWsgYWNjb3JkaW5nIHRvLA0K
PiANCj4gDQpodHRwczovL3Rvb2xzLmlldGYub3JnL2h0bWwvZHJhZnQtaWV0Zi1uZnN2NC1taW5v
cnZlcnNpb24yLTQxI3NlY3Rpb24tMTUuMTENCj4gDQo+ICAgIEZyb20gdGhlIGdpdmVuIHNhX29m
ZnNldCwgZmluZCB0aGUgbmV4dCBkYXRhX2NvbnRlbnQ0IG9mIHR5cGUNCj4gc2Ffd2hhdA0KPiAg
ICBpbiB0aGUgZmlsZS4gIElmIHRoZSBzZXJ2ZXIgY2FuIG5vdCBmaW5kIGEgY29ycmVzcG9uZGlu
ZyBzYV93aGF0LA0KPiAgICB0aGVuIHRoZSBzdGF0dXMgd2lsbCBzdGlsbCBiZSBORlM0X09LLCBi
dXQgc3JfZW9mIHdvdWxkIGJlDQo+IFRSVUUuICBJZg0KPiAgICB0aGUgc2VydmVyIGNhbiBmaW5k
IHRoZSBzYV93aGF0LCB0aGVuIHRoZSBzcl9vZmZzZXQgaXMgdGhlIHN0YXJ0DQo+IG9mDQo+ICAg
IHRoYXQgY29udGVudC4gIElmIHRoZSBzYV9vZmZzZXQgaXMgYmV5b25kIHRoZSBlbmQgb2YgdGhl
IGZpbGUsDQo+IHRoZW4NCj4gICAgU0VFSyBNVVNUIHJldHVybiBORlM0RVJSX05YSU8uDQo+IA0K
PiBGb3IgYSBmaWxlJ3MgZmlsZW1hcCBhcywNCj4gDQo+IFBhcnQgICAgMTogSE9MRSAweDAwMDAw
MDAwMDAwMDAwMDAgLS0tPiAweDAwMDAwMDAwMDA2MDAwMDANCj4gUGFydCAgICAyOiBEQVRBIDB4
MDAwMDAwMDAwMDYwMDAwMCAtLS0+IDB4MDAwMDAwMDAwMDcwMDAwMA0KPiBQYXJ0ICAgIDM6IEhP
TEUgMHgwMDAwMDAwMDAwNzAwMDAwIC0tLT4gMHgwMDAwMDAwMDAxMDAwMDAwDQo+IA0KPiBTRUVL
KDB4NzAwMDAwLCBTRUVLX0RBVEEpIGdldHMgcmVzdWx0IChzcl9lb2Y6MSwgc3Jfb2Zmc2V0OjB4
NzAwMDApDQo+IGZyb20gZ2FuZXNoYS9nbHVzdGVyOw0KPiBTRUVLKDB4NzAwMDAwLCBTRUVLX0hP
TEUpIGdldHMgcmVzdWx0IChzcl9lb2Y6MCwgc3Jfb2Zmc2V0OjB4NzAwMDApDQo+IGZyb20gZ2Fu
ZXNoYS9nbHVzdGVyLg0KPiANCj4gSWYgYW4gYXBwbGljYXRpb24gZGVwZW5kcyB0aGUgbHNlZWsg
cmVzdWx0IGZvciBkYXRhIHNlYXJjaGluZywgaXQgbWF5DQo+IGVudGVyIGluZmluaXRlIGxvb3Au
DQo+IA0KPiAgICAgICAgIHdoaWxlICgxKSB7DQo+ICAgICAgICAgICAgICAgICBuZXh0X3BvcyA9
IGxzZWVrKGZkLCBjdXJfcG9zLCBzZWVrX3R5cGUpOw0KPiAgICAgICAgICAgICAgICAgaWYgKHNl
ZWtfdHlwZSA9PSBTRUVLX0RBVEEpIHsNCj4gICAgICAgICAgICAgICAgICAgICAgICAgc2Vla190
eXBlID0gU0VFS19IT0xFOw0KPiAgICAgICAgICAgICAgICAgfSBlbHNlIHsNCj4gICAgICAgICAg
ICAgICAgICAgICAgICAgc2Vla190eXBlID0gU0VFS19EQVRBOw0KPiAgICAgICAgICAgICAgICAg
fQ0KPiANCj4gICAgICAgICAgICAgICAgIGlmIChuZXh0X3BvcyA9PSAtMSkgew0KPiAgICAgICAg
ICAgICAgICAgICAgICAgICByZXR1cm4gOw0KPiANCj4gICAgICAgICAgICAgICAgIGN1cl9wb3Mg
PSBuZXh0X3BvczsNCj4gICAgICAgICB9DQo+IA0KPiBUaGUgbHNlZWsgc3lzY2FsbCBhbHdheXMg
Z2V0cyAweDcwMDAwIGZyb20gbmZzIGNsaWVudCBmb3IgdGhvc2UgdHdvDQo+IGNhc2VzLCANCj4g
YnV0LCBpZiB1bmRlcmx5aW5nIGZpbGVzeXN0ZW0gaXMgZXh0NC9mMmZzLCBvciB0aGUgbmZzIHNl
cnZlciBpcw0KPiBrbmZzZCwNCj4gdGhlIGxzZWVrKDB4NzAwMDAwLCBTRUVLX0RBVEEpIGdldHMg
RU5YSU8uDQo+IA0KPiBJIHdhbm5hIHRvIGtub3csDQo+IHNob3VsZCBJIGZpeCB0aGUgZ2FuZXNo
YS9nbHVzdGVyIGFzIGtuZnNkIHJldHVybiBFTlhJTyBmb3IgdGhlIGZpcnN0DQo+IGNhc2U/DQo+
IG9yIHNob3VsZCBJIGZpeCB0aGUgbmZzIGNsaWVudCB0byByZXR1cm4gRU5YSU8gZm9yIHRoZSBm
aXJzdCBjYXNlPw0KPiANCg0KSXQgdGhhdCB0ZXN0IGNvcnJlY3Q/IFRoZSBmYWxsYmFjayBpbXBs
ZW1lbnRhdGlvbiBvZiBTRUVLX0RBVEEgYXNzdW1lcw0KdGhhdCB0aGUgZW50aXJlIGZpbGUgaXMg
ZGF0YSwgc28gbHNlZWsoU0VFS19EQVRBKSBvbiBhbnkgb2Zmc2V0IHRoYXQgaXMNCjw9IGVvZiB3
aWxsIGJlIGEgbm8tb3AuIFRoZSBmYWxsYmFjayBpbXBsZW1lbnRhdGlvbiBvZiBTRUVLX0hPTEUN
CmFzc3VtZXMgdGhhdCB0aGUgZmlyc3QgaG9sZSBpcyBhdCBlb2YuDQoNCklPVzogdW5sZXNzIHRo
ZSBpbml0aWFsIHZhbHVlIGZvciBjdXJfcG9zIGlzID4gZW9mLCBpdCBsb29rcyB0byBtZSBhcw0K
aWYgdGhlIGFib3ZlIHRlc3Qgd2lsbCBsb29wIGluZmluaXRlbHkgZ2l2ZW4gYW55IGZpbGVzeXN0
ZW0gdGhhdA0KZG9lc24ndCBpbXBsZW1lbnQgbmF0aXZlIHN1cHBvcnQgZm9yIFNFRUtfREFUQS9T
RUVLX0hPTEUuDQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50
YWluZXIsIEhhbW1lcnNwYWNlDQp0cm9uZC5teWtsZWJ1c3RAaGFtbWVyc3BhY2UuY29tDQoNCg0K

2018-09-11 19:47:06

by Kinglong Mee

[permalink] [raw]
Subject: Re: lseek gets bad offset from nfs client with ganesha/gluster which supports SEEK


On 2018/9/11 20:57, Trond Myklebust wrote:
> On Tue, 2018-09-11 at 20:29 +0800, Kinglong Mee wrote:
>> The latest ganesha/gluster supports seek according to,
>>
>>
> https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41#section-15.11
>>
>> From the given sa_offset, find the next data_content4 of type
>> sa_what
>> in the file. If the server can not find a corresponding sa_what,
>> then the status will still be NFS4_OK, but sr_eof would be
>> TRUE. If
>> the server can find the sa_what, then the sr_offset is the start
>> of
>> that content. If the sa_offset is beyond the end of the file,
>> then
>> SEEK MUST return NFS4ERR_NXIO.
>>
>> For a file's filemap as,
>>
>> Part 1: HOLE 0x0000000000000000 ---> 0x0000000000600000
>> Part 2: DATA 0x0000000000600000 ---> 0x0000000000700000
>> Part 3: HOLE 0x0000000000700000 ---> 0x0000000001000000>>
>> SEEK(0x700000, SEEK_DATA) gets result (sr_eof:1, sr_offset:0x70000)
>> from ganesha/gluster;
>> SEEK(0x700000, SEEK_HOLE) gets result (sr_eof:0, sr_offset:0x70000)
>> from ganesha/gluster.
>>
>> If an application depends the lseek result for data searching, it may
>> enter infinite loop.
>>
>> while (1) {
>> next_pos = lseek(fd, cur_pos, seek_type);
>> if (seek_type == SEEK_DATA) {
>> seek_type = SEEK_HOLE;
>> } else {
>> seek_type = SEEK_DATA;
>> }
>>
>> if (next_pos == -1) {
>> return ;
>>
>> cur_pos = next_pos;
>> }
>>
>> The lseek syscall always gets 0x70000 from nfs client for those two
>> cases,
>> but, if underlying filesystem is ext4/f2fs, or the nfs server is
>> knfsd,
>> the lseek(0x700000, SEEK_DATA) gets ENXIO.
>>
>> I wanna to know,
>> should I fix the ganesha/gluster as knfsd return ENXIO for the first
>> case?
>> or should I fix the nfs client to return ENXIO for the first case?
>>
>
> It that test correct? The fallback implementation of SEEK_DATA assumes
> that the entire file is data, so lseek(SEEK_DATA) on any offset that is
> <= eof will be a no-op. The fallback implementation of SEEK_HOLE
> assumes that the first hole is at eof.

I think that's non-nfsv4.2's logical.

>
> IOW: unless the initial value for cur_pos is > eof, it looks to me as
> if the above test will loop infinitely given any filesystem that
> doesn't implement native support for SEEK_DATA/SEEK_HOLE.
>

No, if underlying filesystem is ext4/f2fs, or the nfs server is knfsd,
the last lseek syscall always return ENXIO no matter the cur_pos is > eof or not.

A file ends with a hole as,
Part 22: DATA 0x0000000006a00000 ---> 0x0000000006afffff
Part 23: HOLE 0x0000000006b00000 ---> 0x000000000c7fffff

# stat testfile
File: testfile
Size: 209715200 Blocks: 22640 IO Block: 4096 regular file
Device: 807h/2055d Inode: 843122 Links: 2
Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2018-09-11 20:01:41.881227061 +0800
Modify: 2018-09-11 20:01:41.976478311 +0800
Change: 2018-09-11 20:01:41.976478311 +0800
Birth: -

# strace filemap testfile
... ...
lseek(3, 111149056, SEEK_HOLE) = 112197632
lseek(3, 112197632, SEEK_DATA) = -1 ENXIO (No such device or address)

Right now, when knfsd gets the ENXIO error, it returns the error to client directly,
and return to syscall.
But, ganesha set the sr_eof to true and return NFS4_OK to client as RFC description,
nfs client skips the sr_eof and return a bad offset to syscall.

thanks,
Kinglong Mee

2018-09-11 20:43:06

by Anna Schumaker

[permalink] [raw]
Subject: Re: lseek gets bad offset from nfs client with ganesha/gluster which supports SEEK

On Tue, 2018-09-11 at 22:47 +0800, Kinglong Mee wrote:
> On 2018/9/11 20:57, Trond Myklebust wrote:
> > On Tue, 2018-09-11 at 20:29 +0800, Kinglong Mee wrote:
> > > The latest ganesha/gluster supports seek according to,
> > >
> > >
> >
> > https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41#section-15.11
> > >
> > > From the given sa_offset, find the next data_content4 of type
> > > sa_what
> > > in the file. If the server can not find a corresponding sa_what,
> > > then the status will still be NFS4_OK, but sr_eof would be
> > > TRUE. If
> > > the server can find the sa_what, then the sr_offset is the start
> > > of
> > > that content. If the sa_offset is beyond the end of the file,
> > > then
> > > SEEK MUST return NFS4ERR_NXIO.
> > >
> > > For a file's filemap as,
> > >
> > > Part 1: HOLE 0x0000000000000000 ---> 0x0000000000600000
> > > Part 2: DATA 0x0000000000600000 ---> 0x0000000000700000
> > > Part 3: HOLE 0x0000000000700000 ---> 0x0000000001000000>>
> > > SEEK(0x700000, SEEK_DATA) gets result (sr_eof:1, sr_offset:0x70000)
> > > from ganesha/gluster;
> > > SEEK(0x700000, SEEK_HOLE) gets result (sr_eof:0, sr_offset:0x70000)
> > > from ganesha/gluster.
> > >
> > > If an application depends the lseek result for data searching, it may
> > > enter infinite loop.
> > >
> > > while (1) {
> > > next_pos = lseek(fd, cur_pos, seek_type);
> > > if (seek_type == SEEK_DATA) {
> > > seek_type = SEEK_HOLE;
> > > } else {
> > > seek_type = SEEK_DATA;
> > > }
> > >
> > > if (next_pos == -1) {
> > > return ;
> > >
> > > cur_pos = next_pos;
> > > }
> > >
> > > The lseek syscall always gets 0x70000 from nfs client for those two
> > > cases,
> > > but, if underlying filesystem is ext4/f2fs, or the nfs server is
> > > knfsd,
> > > the lseek(0x700000, SEEK_DATA) gets ENXIO.
> > >
> > > I wanna to know,
> > > should I fix the ganesha/gluster as knfsd return ENXIO for the first
> > > case?
> > > or should I fix the nfs client to return ENXIO for the first case?
> > >
> >
> > It that test correct? The fallback implementation of SEEK_DATA assumes
> > that the entire file is data, so lseek(SEEK_DATA) on any offset that is
> > <= eof will be a no-op. The fallback implementation of SEEK_HOLE
> > assumes that the first hole is at eof.
>
> I think that's non-nfsv4.2's logical.
>
> >
> > IOW: unless the initial value for cur_pos is > eof, it looks to me as
> > if the above test will loop infinitely given any filesystem that
> > doesn't implement native support for SEEK_DATA/SEEK_HOLE.
> >
>
> No, if underlying filesystem is ext4/f2fs, or the nfs server is knfsd,
> the last lseek syscall always return ENXIO no matter the cur_pos is > eof or
> not.
>
> A file ends with a hole as,
> Part 22: DATA 0x0000000006a00000 ---> 0x0000000006afffff
> Part 23: HOLE 0x0000000006b00000 ---> 0x000000000c7fffff
>
> # stat testfile
> File: testfile
> Size: 209715200 Blocks: 22640 IO Block: 4096 regular file
> Device: 807h/2055d Inode: 843122 Links: 2
> Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root)
> Access: 2018-09-11 20:01:41.881227061 +0800
> Modify: 2018-09-11 20:01:41.976478311 +0800
> Change: 2018-09-11 20:01:41.976478311 +0800
> Birth: -
>
> # strace filemap testfile
> ... ...
> lseek(3, 111149056, SEEK_HOLE) = 112197632
> lseek(3, 112197632, SEEK_DATA) = -1 ENXIO (No such device or address)
>
> Right now, when knfsd gets the ENXIO error, it returns the error to client
> directly,
> and return to syscall.
> But, ganesha set the sr_eof to true and return NFS4_OK to client as RFC
> description,
> nfs client skips the sr_eof and return a bad offset to syscall.

Would it make more sense to change Knfsd instead of the client? I think I was
trying to keep things simple when I wrote the code, so I just passed the result
of the lseek system call back to the client.

Anna
>
> thanks,
> Kinglong Mee

2018-09-12 06:33:36

by Kinglong Mee

[permalink] [raw]
Subject: Re: [NFS-Ganesha-Devel] Re: lseek gets bad offset from nfs client with ganesha/gluster which supports SEEK

On 2018/9/12 7:20, Frank Filz wrote:
>> On Tue, 2018-09-11 at 22:47 +0800, Kinglong Mee wrote:
>>> On 2018/9/11 20:57, Trond Myklebust wrote:
>>>> On Tue, 2018-09-11 at 20:29 +0800, Kinglong Mee wrote:
>>>>> The latest ganesha/gluster supports seek according to,
>>>>>
>>>>>
>>>>
>>>> https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41#sectio
>>>> n-15.11
>>>>>
>>>>> From the given sa_offset, find the next data_content4 of type
>>>>> sa_what
>>>>> in the file. If the server can not find a corresponding sa_what,
>>>>> then the status will still be NFS4_OK, but sr_eof would be
>>>>> TRUE. If
>>>>> the server can find the sa_what, then the sr_offset is the
>>>>> start of
>>>>> that content. If the sa_offset is beyond the end of the file,
>>>>> then
>>>>> SEEK MUST return NFS4ERR_NXIO.
>>>>>
>>>>> For a file's filemap as,
>>>>>
>>>>> Part 1: HOLE 0x0000000000000000 ---> 0x0000000000600000
>>>>> Part 2: DATA 0x0000000000600000 ---> 0x0000000000700000
>>>>> Part 3: HOLE 0x0000000000700000 ---> 0x0000000001000000>>
>>>>> SEEK(0x700000, SEEK_DATA) gets result (sr_eof:1,
>>>>> sr_offset:0x70000) from ganesha/gluster; SEEK(0x700000, SEEK_HOLE)
>>>>> gets result (sr_eof:0, sr_offset:0x70000) from ganesha/gluster.
>>>>>
>>>>> If an application depends the lseek result for data searching, it
>>>>> may enter infinite loop.
>>>>>
>>>>> while (1) {
>>>>> next_pos = lseek(fd, cur_pos, seek_type);
>>>>> if (seek_type == SEEK_DATA) {
>>>>> seek_type = SEEK_HOLE;
>>>>> } else {
>>>>> seek_type = SEEK_DATA;
>>>>> }
>>>>>
>>>>> if (next_pos == -1) {
>>>>> return ;
>>>>>
>>>>> cur_pos = next_pos;
>>>>> }
>>>>>
>>>>> The lseek syscall always gets 0x70000 from nfs client for those
>>>>> two cases, but, if underlying filesystem is ext4/f2fs, or the nfs
>>>>> server is knfsd, the lseek(0x700000, SEEK_DATA) gets ENXIO.
>>>>>
>>>>> I wanna to know,
>>>>> should I fix the ganesha/gluster as knfsd return ENXIO for the
>>>>> first case?
>>>>> or should I fix the nfs client to return ENXIO for the first case?
>>>>>
>>>>
>>>> It that test correct? The fallback implementation of SEEK_DATA
>>>> assumes that the entire file is data, so lseek(SEEK_DATA) on any
>>>> offset that is <= eof will be a no-op. The fallback implementation
>>>> of SEEK_HOLE assumes that the first hole is at eof.
>>>
>>> I think that's non-nfsv4.2's logical.
>>>
>>>>
>>>> IOW: unless the initial value for cur_pos is > eof, it looks to me
>>>> as if the above test will loop infinitely given any filesystem that
>>>> doesn't implement native support for SEEK_DATA/SEEK_HOLE.
>>>>
>>>
>>> No, if underlying filesystem is ext4/f2fs, or the nfs server is knfsd,
>>> the last lseek syscall always return ENXIO no matter the cur_pos is >
>>> eof or not.
>>>
>>> A file ends with a hole as,
>>> Part 22: DATA 0x0000000006a00000 ---> 0x0000000006afffff
>>> Part 23: HOLE 0x0000000006b00000 ---> 0x000000000c7fffff
>>>
>>> # stat testfile
>>> File: testfile
>>> Size: 209715200 Blocks: 22640 IO Block: 4096 regular file
>>> Device: 807h/2055d Inode: 843122 Links: 2
>>> Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root)
>>> Access: 2018-09-11 20:01:41.881227061 +0800
>>> Modify: 2018-09-11 20:01:41.976478311 +0800
>>> Change: 2018-09-11 20:01:41.976478311 +0800
>>> Birth: -
>>>
>>> # strace filemap testfile
>>> ... ...
>>> lseek(3, 111149056, SEEK_HOLE) = 112197632
>>> lseek(3, 112197632, SEEK_DATA) = -1 ENXIO (No such device or address)
>>>
>>> Right now, when knfsd gets the ENXIO error, it returns the error to
>>> client directly, and return to syscall.
>>> But, ganesha set the sr_eof to true and return NFS4_OK to client as
>>> RFC description, nfs client skips the sr_eof and return a bad offset
>>> to syscall.
>>
>> Would it make more sense to change Knfsd instead of the client? I think I was
>> trying to keep things simple when I wrote the code, so I just passed the result of
>> the lseek system call back to the client.
>
> Looking at the lseek(2) man page, it's not clear to me what should be returned if as in this example, there is a HOLE at the end of the file (i.e. filesize is larger than the range of the last DATA in the file). It sounds like ext4 returns ENXIO if a SEEK_DATA is done past the last data in the file.
>
> SEEK_DATA
> Adjust the file offset to the next location in the file greater than or equal to offset containing data. If offset points to data, then the file offset is set
> to offset.
>
> SEEK_HOLE
> Adjust the file offset to the next hole in the file greater than or equal to offset. If offset points into the middle of a hole, then the file offset is set to
> offset. If there is no hole past offset, then the file offset is adjusted to the end of the file (i.e., there is an implicit hole at the end of any file).
>
> In both of the above cases, lseek() fails if offset points past the end of the file.
>
> These operations allow applications to map holes in a sparsely allocated file. This can be useful for applications such as file backup tools, which can save space when
> creating backups and preserve holes, if they have a mechanism for discovering holes.
>
> For the purposes of these operations, a hole is a sequence of zeros that (normally) has not been allocated in the underlying file storage. However, a filesystem is not
> obliged to report holes, so these operations are not a guaranteed mechanism for mapping the storage space actually allocated to a file. (Furthermore, a sequence of
> zeros that actually has been written to the underlying storage may not be reported as a hole.) In the simplest implementation, a filesystem can support the operations
> by making SEEK_HOLE always return the offset of the end of the file, and making SEEK_DATA always return offset (i.e., even if the location referred to by offset is a
> hole, it can be considered to consist of data that is a sequence of zeros).
>
> The RFC text is pretty clear:
>
> SEEK is an operation that allows a client to determine the location
> of the next data_content4 in a file. It allows an implementation of
> the emerging extension to lseek(2) to allow clients to determine the
> next hole whilst in data or the next data whilst in a hole.
>
> From the given sa_offset, find the next data_content4 of type sa_what
> in the file. If the server can not find a corresponding sa_what,
> then the status will still be NFS4_OK, but sr_eof would be TRUE. If
> the server can find the sa_what, then the sr_offset is the start of
> that content. If the sa_offset is beyond the end of the file, then
> SEEK MUST return NFS4ERR_NXIO.
>
> All files MUST have a virtual hole at the end of the file. I.e., if
> a filesystem does not support sparse files, then a compound with
> {SEEK 0 NFS4_CONTENT_HOLE;} would return a result of {SEEK 1 X;}
> where 'X' was the size of the file.
>
> Sa_offset is not past the end of the file, but there is no more DATA, so a seek DATA from 0x70000 (original file) should return sr_eof TRUE.
>
> In either RFC or lseek(2), a seek HOLE for 0x70000 will return 0x70000.
>
> It certainly makes sense that you should be able to have a hole at the end of a file (pre-allocated disk blocks but no data written yet), and is in fact what fallocate(2) will do.
>
> An NFS server could check the filesize and if sa_offset is < filesize and a SEEK_DATA returns ENXIO, it could translate that to NFS4_OK and set sr_eof to TRUE.
>
> The Ganesha code in FSAL_GLUSTER I believe is wrong. It changes any ENXIO result to NFS4_OK with sr_eof TRUE. It would be better for it to do the simple thing knfsd does of always passing along the ENXIO (this may be best if it is not possible to safely verify sa_offset really is < filesize).

Do you mean modifying ganesha/gluster as knfsd does?

seek->seek_pos = vfs_llseek(file, seek->seek_offset, whence);
if (seek->seek_pos < 0)
status = nfserrno(seek->seek_pos);
else if (seek->seek_pos >= i_size_read(file_inode(file)))
seek->seek_eof = true;

It is a working implementation, but not according to RFC description,

If the server can not find a corresponding sa_what,
then the status will still be NFS4_OK, but sr_eof would be TRUE.

As in this example, there is a HOLE at the end of the file,
SEEK(in hole, SEEK_DATA) should return NFS4_OK and sr_eof is TRUE,
but knfsd return NFS4ERR_NXIO.

thanks,
Kinglong Mee

2018-09-12 17:02:45

by Frank Filz

[permalink] [raw]
Subject: RE: [NFS-Ganesha-Devel] Re: lseek gets bad offset from nfs client with ganesha/gluster which supports SEEK

> On 2018/9/12 7:20, Frank Filz wrote:
> >> On Tue, 2018-09-11 at 22:47 +0800, Kinglong Mee wrote:
> >>> On 2018/9/11 20:57, Trond Myklebust wrote:
> >>>> On Tue, 2018-09-11 at 20:29 +0800, Kinglong Mee wrote:
> >>>>> The latest ganesha/gluster supports seek according to,
> >>>>>
> >>>>>
> >>>>
> >>>> =
https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41#secti
> >>>> o
> >>>> n-15.11
> >>>>>
> >>>>> From the given sa_offset, find the next data_content4 of type
> >>>>> sa_what
> >>>>> in the file. If the server can not find a corresponding =
sa_what,
> >>>>> then the status will still be NFS4_OK, but sr_eof would be
> >>>>> TRUE. If
> >>>>> the server can find the sa_what, then the sr_offset is the
> >>>>> start of
> >>>>> that content. If the sa_offset is beyond the end of the =
file,
> >>>>> then
> >>>>> SEEK MUST return NFS4ERR_NXIO.
> >>>>>
> >>>>> For a file's filemap as,
> >>>>>
> >>>>> Part 1: HOLE 0x0000000000000000 ---> 0x0000000000600000
> >>>>> Part 2: DATA 0x0000000000600000 ---> 0x0000000000700000
> >>>>> Part 3: HOLE 0x0000000000700000 ---> 0x0000000001000000>>
> >>>>> SEEK(0x700000, SEEK_DATA) gets result (sr_eof:1,
> >>>>> sr_offset:0x70000) from ganesha/gluster; SEEK(0x700000, =
SEEK_HOLE)
> >>>>> gets result (sr_eof:0, sr_offset:0x70000) from ganesha/gluster.
> >>>>>
> >>>>> If an application depends the lseek result for data searching, =
it
> >>>>> may enter infinite loop.
> >>>>>
> >>>>> while (1) {
> >>>>> next_pos =3D lseek(fd, cur_pos, seek_type);
> >>>>> if (seek_type =3D=3D SEEK_DATA) {
> >>>>> seek_type =3D SEEK_HOLE;
> >>>>> } else {
> >>>>> seek_type =3D SEEK_DATA;
> >>>>> }
> >>>>>
> >>>>> if (next_pos =3D=3D -1) {
> >>>>> return ;
> >>>>>
> >>>>> cur_pos =3D next_pos;
> >>>>> }
> >>>>>
> >>>>> The lseek syscall always gets 0x70000 from nfs client for those
> >>>>> two cases, but, if underlying filesystem is ext4/f2fs, or the =
nfs
> >>>>> server is knfsd, the lseek(0x700000, SEEK_DATA) gets ENXIO.
> >>>>>
> >>>>> I wanna to know,
> >>>>> should I fix the ganesha/gluster as knfsd return ENXIO for the
> >>>>> first case?
> >>>>> or should I fix the nfs client to return ENXIO for the first =
case?
> >>>>>
> >>>>
> >>>> It that test correct? The fallback implementation of SEEK_DATA
> >>>> assumes that the entire file is data, so lseek(SEEK_DATA) on any
> >>>> offset that is <=3D eof will be a no-op. The fallback =
implementation
> >>>> of SEEK_HOLE assumes that the first hole is at eof.
> >>>
> >>> I think that's non-nfsv4.2's logical.
> >>>
> >>>>
> >>>> IOW: unless the initial value for cur_pos is > eof, it looks to =
me
> >>>> as if the above test will loop infinitely given any filesystem =
that
> >>>> doesn't implement native support for SEEK_DATA/SEEK_HOLE.
> >>>>
> >>>
> >>> No, if underlying filesystem is ext4/f2fs, or the nfs server is
> >>> knfsd, the last lseek syscall always return ENXIO no matter the
> >>> cur_pos is > eof or not.
> >>>
> >>> A file ends with a hole as,
> >>> Part 22: DATA 0x0000000006a00000 ---> 0x0000000006afffff
> >>> Part 23: HOLE 0x0000000006b00000 ---> 0x000000000c7fffff
> >>>
> >>> # stat testfile
> >>> File: testfile
> >>> Size: 209715200 Blocks: 22640 IO Block: 4096 =
regular file
> >>> Device: 807h/2055d Inode: 843122 Links: 2
> >>> Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ =
root)
> >>> Access: 2018-09-11 20:01:41.881227061 +0800
> >>> Modify: 2018-09-11 20:01:41.976478311 +0800
> >>> Change: 2018-09-11 20:01:41.976478311 +0800
> >>> Birth: -
> >>>
> >>> # strace filemap testfile
> >>> ... ...
> >>> lseek(3, 111149056, SEEK_HOLE) =3D 112197632
> >>> lseek(3, 112197632, SEEK_DATA) =3D -1 ENXIO (No such =
device or
> address)
> >>>
> >>> Right now, when knfsd gets the ENXIO error, it returns the error =
to
> >>> client directly, and return to syscall.
> >>> But, ganesha set the sr_eof to true and return NFS4_OK to client =
as
> >>> RFC description, nfs client skips the sr_eof and return a bad =
offset
> >>> to syscall.
> >>
> >> Would it make more sense to change Knfsd instead of the client? I
> >> think I was trying to keep things simple when I wrote the code, so =
I
> >> just passed the result of the lseek system call back to the client.
> >
> > Looking at the lseek(2) man page, it's not clear to me what should =
be returned
> if as in this example, there is a HOLE at the end of the file (i.e. =
filesize is larger
> than the range of the last DATA in the file). It sounds like ext4 =
returns ENXIO if a
> SEEK_DATA is done past the last data in the file.
> >
> > SEEK_DATA
> > Adjust the file offset to the next location in the =
file greater than or
> equal to offset containing data. If offset points to data, then the =
file offset is
> set
> > to offset.
> >
> > SEEK_HOLE
> > Adjust the file offset to the next hole in the file =
greater than or equal
> to offset. If offset points into the middle of a hole, then the file =
offset is set to
> > offset. If there is no hole past offset, then the =
file offset is adjusted to
> the end of the file (i.e., there is an implicit hole at the end of any =
file).
> >
> > In both of the above cases, lseek() fails if offset points =
past the end of the
> file.
> >
> > These operations allow applications to map holes in a =
sparsely allocated
> file. This can be useful for applications such as file backup tools, =
which can save
> space when
> > creating backups and preserve holes, if they have a mechanism =
for
> discovering holes.
> >
> > For the purposes of these operations, a hole is a sequence of =
zeros that
> (normally) has not been allocated in the underlying file storage. =
However, a
> filesystem is not
> > obliged to report holes, so these operations are not a =
guaranteed
> mechanism for mapping the storage space actually allocated to a file.
> (Furthermore, a sequence of
> > zeros that actually has been written to the underlying =
storage may not be
> reported as a hole.) In the simplest implementation, a filesystem can =
support
> the operations
> > by making SEEK_HOLE always return the offset of the end of =
the file, and
> making SEEK_DATA always return offset (i.e., even if the location =
referred to by
> offset is a
> > hole, it can be considered to consist of data that is a =
sequence of zeros).
> >
> > The RFC text is pretty clear:
> >
> > SEEK is an operation that allows a client to determine the =
location
> > of the next data_content4 in a file. It allows an implementation =
of
> > the emerging extension to lseek(2) to allow clients to determine =
the
> > next hole whilst in data or the next data whilst in a hole.
> >
> > From the given sa_offset, find the next data_content4 of type =
sa_what
> > in the file. If the server can not find a corresponding sa_what,
> > then the status will still be NFS4_OK, but sr_eof would be TRUE. =
If
> > the server can find the sa_what, then the sr_offset is the start =
of
> > that content. If the sa_offset is beyond the end of the file, =
then
> > SEEK MUST return NFS4ERR_NXIO.
> >
> > All files MUST have a virtual hole at the end of the file. I.e., =
if
> > a filesystem does not support sparse files, then a compound with
> > {SEEK 0 NFS4_CONTENT_HOLE;} would return a result of {SEEK 1 X;}
> > where 'X' was the size of the file.
> >
> > Sa_offset is not past the end of the file, but there is no more =
DATA, so a seek
> DATA from 0x70000 (original file) should return sr_eof TRUE.
> >
> > In either RFC or lseek(2), a seek HOLE for 0x70000 will return =
0x70000.
> >
> > It certainly makes sense that you should be able to have a hole at =
the end of a
> file (pre-allocated disk blocks but no data written yet), and is in =
fact what
> fallocate(2) will do.
> >
> > An NFS server could check the filesize and if sa_offset is < =
filesize and a
> SEEK_DATA returns ENXIO, it could translate that to NFS4_OK and set =
sr_eof to
> TRUE.
> >
> > The Ganesha code in FSAL_GLUSTER I believe is wrong. It changes any =
ENXIO
> result to NFS4_OK with sr_eof TRUE. It would be better for it to do =
the simple
> thing knfsd does of always passing along the ENXIO (this may be best =
if it is not
> possible to safely verify sa_offset really is < filesize).
>=20
> Do you mean modifying ganesha/gluster as knfsd does?
>=20
> seek->seek_pos =3D vfs_llseek(file, seek->seek_offset, =
whence);
> if (seek->seek_pos < 0)
> status =3D nfserrno(seek->seek_pos);
> else if (seek->seek_pos >=3D i_size_read(file_inode(file)))
> seek->seek_eof =3D true;
>=20
> It is a working implementation, but not according to RFC description,
>=20
> If the server can not find a corresponding sa_what,
> then the status will still be NFS4_OK, but sr_eof would be TRUE.
>=20
> As in this example, there is a HOLE at the end of the file, SEEK(in =
hole,
> SEEK_DATA) should return NFS4_OK and sr_eof is TRUE, but knfsd return
> NFS4ERR_NXIO.

FSAL_GLUSTER always translates lseek return of ENXIO to NFS4_Ok with =
sr_eod TRUE.

It should at least ONLY do that if sa_offset is < filesize (which would =
then be correct per RFC).

Knfsd, to my understanding, looks like it always just returns ENXIO =
(which isn't exactly per RFC, but at least doesn't confuse the client =
and application as badly).

Frank

2018-09-12 21:35:54

by Frank Filz

[permalink] [raw]
Subject: RE: [NFS-Ganesha-Devel] Re: lseek gets bad offset from nfs client with ganesha/gluster which supports SEEK

> On Tue, 2018-09-11 at 22:47 +0800, Kinglong Mee wrote:
> > On 2018/9/11 20:57, Trond Myklebust wrote:
> > > On Tue, 2018-09-11 at 20:29 +0800, Kinglong Mee wrote:
> > > > The latest ganesha/gluster supports seek according to,
> > > >
> > > >
> > >
> > > =
https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41#sectio
> > > n-15.11
> > > >
> > > > From the given sa_offset, find the next data_content4 of type
> > > > sa_what
> > > > in the file. If the server can not find a corresponding =
sa_what,
> > > > then the status will still be NFS4_OK, but sr_eof would be
> > > > TRUE. If
> > > > the server can find the sa_what, then the sr_offset is the
> > > > start of
> > > > that content. If the sa_offset is beyond the end of the =
file,
> > > > then
> > > > SEEK MUST return NFS4ERR_NXIO.
> > > >
> > > > For a file's filemap as,
> > > >
> > > > Part 1: HOLE 0x0000000000000000 ---> 0x0000000000600000
> > > > Part 2: DATA 0x0000000000600000 ---> 0x0000000000700000
> > > > Part 3: HOLE 0x0000000000700000 ---> 0x0000000001000000>>
> > > > SEEK(0x700000, SEEK_DATA) gets result (sr_eof:1,
> > > > sr_offset:0x70000) from ganesha/gluster; SEEK(0x700000, =
SEEK_HOLE)
> > > > gets result (sr_eof:0, sr_offset:0x70000) from ganesha/gluster.
> > > >
> > > > If an application depends the lseek result for data searching, =
it
> > > > may enter infinite loop.
> > > >
> > > > while (1) {
> > > > next_pos =3D lseek(fd, cur_pos, seek_type);
> > > > if (seek_type =3D=3D SEEK_DATA) {
> > > > seek_type =3D SEEK_HOLE;
> > > > } else {
> > > > seek_type =3D SEEK_DATA;
> > > > }
> > > >
> > > > if (next_pos =3D=3D -1) {
> > > > return ;
> > > >
> > > > cur_pos =3D next_pos;
> > > > }
> > > >
> > > > The lseek syscall always gets 0x70000 from nfs client for those
> > > > two cases, but, if underlying filesystem is ext4/f2fs, or the =
nfs
> > > > server is knfsd, the lseek(0x700000, SEEK_DATA) gets ENXIO.
> > > >
> > > > I wanna to know,
> > > > should I fix the ganesha/gluster as knfsd return ENXIO for the
> > > > first case?
> > > > or should I fix the nfs client to return ENXIO for the first =
case?
> > > >
> > >
> > > It that test correct? The fallback implementation of SEEK_DATA
> > > assumes that the entire file is data, so lseek(SEEK_DATA) on any
> > > offset that is <=3D eof will be a no-op. The fallback =
implementation
> > > of SEEK_HOLE assumes that the first hole is at eof.
> >
> > I think that's non-nfsv4.2's logical.
> >
> > >
> > > IOW: unless the initial value for cur_pos is > eof, it looks to me
> > > as if the above test will loop infinitely given any filesystem =
that
> > > doesn't implement native support for SEEK_DATA/SEEK_HOLE.
> > >
> >
> > No, if underlying filesystem is ext4/f2fs, or the nfs server is =
knfsd,
> > the last lseek syscall always return ENXIO no matter the cur_pos is =
>
> > eof or not.
> >
> > A file ends with a hole as,
> > Part 22: DATA 0x0000000006a00000 ---> 0x0000000006afffff
> > Part 23: HOLE 0x0000000006b00000 ---> 0x000000000c7fffff
> >
> > # stat testfile
> > File: testfile
> > Size: 209715200 Blocks: 22640 IO Block: 4096 regular =
file
> > Device: 807h/2055d Inode: 843122 Links: 2
> > Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ =
root)
> > Access: 2018-09-11 20:01:41.881227061 +0800
> > Modify: 2018-09-11 20:01:41.976478311 +0800
> > Change: 2018-09-11 20:01:41.976478311 +0800
> > Birth: -
> >
> > # strace filemap testfile
> > ... ...
> > lseek(3, 111149056, SEEK_HOLE) =3D 112197632
> > lseek(3, 112197632, SEEK_DATA) =3D -1 ENXIO (No such device =
or address)
> >
> > Right now, when knfsd gets the ENXIO error, it returns the error to
> > client directly, and return to syscall.
> > But, ganesha set the sr_eof to true and return NFS4_OK to client as
> > RFC description, nfs client skips the sr_eof and return a bad offset
> > to syscall.
>=20
> Would it make more sense to change Knfsd instead of the client? I =
think I was
> trying to keep things simple when I wrote the code, so I just passed =
the result of
> the lseek system call back to the client.

Looking at the lseek(2) man page, it's not clear to me what should be =
returned if as in this example, there is a HOLE at the end of the file =
(i.e. filesize is larger than the range of the last DATA in the file). =
It sounds like ext4 returns ENXIO if a SEEK_DATA is done past the last =
data in the file.

SEEK_DATA
Adjust the file offset to the next location in the file =
greater than or equal to offset containing data. If offset points to =
data, then the file offset is set
to offset.

SEEK_HOLE
Adjust the file offset to the next hole in the file =
greater than or equal to offset. If offset points into the middle of a =
hole, then the file offset is set to
offset. If there is no hole past offset, then the file =
offset is adjusted to the end of the file (i.e., there is an implicit =
hole at the end of any file).

In both of the above cases, lseek() fails if offset points past =
the end of the file.

These operations allow applications to map holes in a sparsely =
allocated file. This can be useful for applications such as file backup =
tools, which can save space when
creating backups and preserve holes, if they have a mechanism for =
discovering holes.

For the purposes of these operations, a hole is a sequence of =
zeros that (normally) has not been allocated in the underlying file =
storage. However, a filesystem is not
obliged to report holes, so these operations are not a guaranteed =
mechanism for mapping the storage space actually allocated to a file. =
(Furthermore, a sequence of
zeros that actually has been written to the underlying storage =
may not be reported as a hole.) In the simplest implementation, a =
filesystem can support the operations
by making SEEK_HOLE always return the offset of the end of the =
file, and making SEEK_DATA always return offset (i.e., even if the =
location referred to by offset is a
hole, it can be considered to consist of data that is a sequence =
of zeros).

The RFC text is pretty clear:

SEEK is an operation that allows a client to determine the location
of the next data_content4 in a file. It allows an implementation of
the emerging extension to lseek(2) to allow clients to determine the
next hole whilst in data or the next data whilst in a hole.

From the given sa_offset, find the next data_content4 of type sa_what
in the file. If the server can not find a corresponding sa_what,
then the status will still be NFS4_OK, but sr_eof would be TRUE. If
the server can find the sa_what, then the sr_offset is the start of
that content. If the sa_offset is beyond the end of the file, then
SEEK MUST return NFS4ERR_NXIO.

All files MUST have a virtual hole at the end of the file. I.e., if
a filesystem does not support sparse files, then a compound with
{SEEK 0 NFS4_CONTENT_HOLE;} would return a result of {SEEK 1 X;}
where 'X' was the size of the file.

Sa_offset is not past the end of the file, but there is no more DATA, so =
a seek DATA from 0x70000 (original file) should return sr_eof TRUE.

In either RFC or lseek(2), a seek HOLE for 0x70000 will return 0x70000.

It certainly makes sense that you should be able to have a hole at the =
end of a file (pre-allocated disk blocks but no data written yet), and =
is in fact what fallocate(2) will do.

An NFS server could check the filesize and if sa_offset is < filesize =
and a SEEK_DATA returns ENXIO, it could translate that to NFS4_OK and =
set sr_eof to TRUE.

The Ganesha code in FSAL_GLUSTER I believe is wrong. It changes any =
ENXIO result to NFS4_OK with sr_eof TRUE. It would be better for it to =
do the simple thing knfsd does of always passing along the ENXIO (this =
may be best if it is not possible to safely verify sa_offset really is < =
filesize).

So Anna, I think your knfsd implementation is ok.

Frank

2018-09-13 05:10:58

by Kinglong Mee

[permalink] [raw]
Subject: Re: [NFS-Ganesha-Devel] Re: lseek gets bad offset from nfs client with ganesha/gluster which supports SEEK

On 2018/9/12 19:58, Frank Filz wrote:
>> On 2018/9/12 7:20, Frank Filz wrote:
>>>> On Tue, 2018-09-11 at 22:47 +0800, Kinglong Mee wrote:
>>>>> On 2018/9/11 20:57, Trond Myklebust wrote:
>>>>>> On Tue, 2018-09-11 at 20:29 +0800, Kinglong Mee wrote:
>>>>>>> The latest ganesha/gluster supports seek according to,
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41#secti
>>>>>> o
>>>>>> n-15.11
>>>>>>>
>>>>>>> From the given sa_offset, find the next data_content4 of type
>>>>>>> sa_what
>>>>>>> in the file. If the server can not find a corresponding sa_what,
>>>>>>> then the status will still be NFS4_OK, but sr_eof would be
>>>>>>> TRUE. If
>>>>>>> the server can find the sa_what, then the sr_offset is the
>>>>>>> start of
>>>>>>> that content. If the sa_offset is beyond the end of the file,
>>>>>>> then
>>>>>>> SEEK MUST return NFS4ERR_NXIO.
>>>>>>>
>>>>>>> For a file's filemap as,
>>>>>>>
>>>>>>> Part 1: HOLE 0x0000000000000000 ---> 0x0000000000600000
>>>>>>> Part 2: DATA 0x0000000000600000 ---> 0x0000000000700000
>>>>>>> Part 3: HOLE 0x0000000000700000 ---> 0x0000000001000000>>
>>>>>>> SEEK(0x700000, SEEK_DATA) gets result (sr_eof:1,
>>>>>>> sr_offset:0x70000) from ganesha/gluster; SEEK(0x700000, SEEK_HOLE)
>>>>>>> gets result (sr_eof:0, sr_offset:0x70000) from ganesha/gluster.
>>>>>>>
>>>>>>> If an application depends the lseek result for data searching, it
>>>>>>> may enter infinite loop.
>>>>>>>
>>>>>>> while (1) {
>>>>>>> next_pos = lseek(fd, cur_pos, seek_type);
>>>>>>> if (seek_type == SEEK_DATA) {
>>>>>>> seek_type = SEEK_HOLE;
>>>>>>> } else {
>>>>>>> seek_type = SEEK_DATA;
>>>>>>> }
>>>>>>>
>>>>>>> if (next_pos == -1) {
>>>>>>> return ;
>>>>>>>
>>>>>>> cur_pos = next_pos;
>>>>>>> }
>>>>>>>
>>>>>>> The lseek syscall always gets 0x70000 from nfs client for those
>>>>>>> two cases, but, if underlying filesystem is ext4/f2fs, or the nfs
>>>>>>> server is knfsd, the lseek(0x700000, SEEK_DATA) gets ENXIO.
>>>>>>>
>>>>>>> I wanna to know,
>>>>>>> should I fix the ganesha/gluster as knfsd return ENXIO for the
>>>>>>> first case?
>>>>>>> or should I fix the nfs client to return ENXIO for the first case?
>>>>>>>
>>>>>>
>>>>>> It that test correct? The fallback implementation of SEEK_DATA
>>>>>> assumes that the entire file is data, so lseek(SEEK_DATA) on any
>>>>>> offset that is <= eof will be a no-op. The fallback implementation
>>>>>> of SEEK_HOLE assumes that the first hole is at eof.
>>>>>
>>>>> I think that's non-nfsv4.2's logical.
>>>>>
>>>>>>
>>>>>> IOW: unless the initial value for cur_pos is > eof, it looks to me
>>>>>> as if the above test will loop infinitely given any filesystem that
>>>>>> doesn't implement native support for SEEK_DATA/SEEK_HOLE.
>>>>>>
>>>>>
>>>>> No, if underlying filesystem is ext4/f2fs, or the nfs server is
>>>>> knfsd, the last lseek syscall always return ENXIO no matter the
>>>>> cur_pos is > eof or not.
>>>>>
>>>>> A file ends with a hole as,
>>>>> Part 22: DATA 0x0000000006a00000 ---> 0x0000000006afffff
>>>>> Part 23: HOLE 0x0000000006b00000 ---> 0x000000000c7fffff
>>>>>
>>>>> # stat testfile
>>>>> File: testfile
>>>>> Size: 209715200 Blocks: 22640 IO Block: 4096 regular file
>>>>> Device: 807h/2055d Inode: 843122 Links: 2
>>>>> Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root)
>>>>> Access: 2018-09-11 20:01:41.881227061 +0800
>>>>> Modify: 2018-09-11 20:01:41.976478311 +0800
>>>>> Change: 2018-09-11 20:01:41.976478311 +0800
>>>>> Birth: -
>>>>>
>>>>> # strace filemap testfile
>>>>> ... ...
>>>>> lseek(3, 111149056, SEEK_HOLE) = 112197632
>>>>> lseek(3, 112197632, SEEK_DATA) = -1 ENXIO (No such device or
>> address)
>>>>>
>>>>> Right now, when knfsd gets the ENXIO error, it returns the error to
>>>>> client directly, and return to syscall.
>>>>> But, ganesha set the sr_eof to true and return NFS4_OK to client as
>>>>> RFC description, nfs client skips the sr_eof and return a bad offset
>>>>> to syscall.
>>>>
>>>> Would it make more sense to change Knfsd instead of the client? I
>>>> think I was trying to keep things simple when I wrote the code, so I
>>>> just passed the result of the lseek system call back to the client.
>>>
>>> Looking at the lseek(2) man page, it's not clear to me what should be returned
>> if as in this example, there is a HOLE at the end of the file (i.e. filesize is larger
>> than the range of the last DATA in the file). It sounds like ext4 returns ENXIO if a
>> SEEK_DATA is done past the last data in the file.
>>>
>>> SEEK_DATA
>>> Adjust the file offset to the next location in the file greater than or
>> equal to offset containing data. If offset points to data, then the file offset is
>> set
>>> to offset.
>>>
>>> SEEK_HOLE
>>> Adjust the file offset to the next hole in the file greater than or equal
>> to offset. If offset points into the middle of a hole, then the file offset is set to
>>> offset. If there is no hole past offset, then the file offset is adjusted to
>> the end of the file (i.e., there is an implicit hole at the end of any file).
>>>
>>> In both of the above cases, lseek() fails if offset points past the end of the
>> file.
>>>
>>> These operations allow applications to map holes in a sparsely allocated
>> file. This can be useful for applications such as file backup tools, which can save
>> space when
>>> creating backups and preserve holes, if they have a mechanism for
>> discovering holes.
>>>
>>> For the purposes of these operations, a hole is a sequence of zeros that
>> (normally) has not been allocated in the underlying file storage. However, a
>> filesystem is not
>>> obliged to report holes, so these operations are not a guaranteed
>> mechanism for mapping the storage space actually allocated to a file.
>> (Furthermore, a sequence of
>>> zeros that actually has been written to the underlying storage may not be
>> reported as a hole.) In the simplest implementation, a filesystem can support
>> the operations
>>> by making SEEK_HOLE always return the offset of the end of the file, and
>> making SEEK_DATA always return offset (i.e., even if the location referred to by
>> offset is a
>>> hole, it can be considered to consist of data that is a sequence of zeros).
>>>
>>> The RFC text is pretty clear:
>>>
>>> SEEK is an operation that allows a client to determine the location
>>> of the next data_content4 in a file. It allows an implementation of
>>> the emerging extension to lseek(2) to allow clients to determine the
>>> next hole whilst in data or the next data whilst in a hole.
>>>
>>> From the given sa_offset, find the next data_content4 of type sa_what
>>> in the file. If the server can not find a corresponding sa_what,
>>> then the status will still be NFS4_OK, but sr_eof would be TRUE. If
>>> the server can find the sa_what, then the sr_offset is the start of
>>> that content. If the sa_offset is beyond the end of the file, then
>>> SEEK MUST return NFS4ERR_NXIO.
>>>
>>> All files MUST have a virtual hole at the end of the file. I.e., if
>>> a filesystem does not support sparse files, then a compound with
>>> {SEEK 0 NFS4_CONTENT_HOLE;} would return a result of {SEEK 1 X;}
>>> where 'X' was the size of the file.
>>>
>>> Sa_offset is not past the end of the file, but there is no more DATA, so a seek
>> DATA from 0x70000 (original file) should return sr_eof TRUE.
>>>
>>> In either RFC or lseek(2), a seek HOLE for 0x70000 will return 0x70000.
>>>
>>> It certainly makes sense that you should be able to have a hole at the end of a
>> file (pre-allocated disk blocks but no data written yet), and is in fact what
>> fallocate(2) will do.
>>>
>>> An NFS server could check the filesize and if sa_offset is < filesize and a
>> SEEK_DATA returns ENXIO, it could translate that to NFS4_OK and set sr_eof to
>> TRUE.
>>>
>>> The Ganesha code in FSAL_GLUSTER I believe is wrong. It changes any ENXIO
>> result to NFS4_OK with sr_eof TRUE. It would be better for it to do the simple
>> thing knfsd does of always passing along the ENXIO (this may be best if it is not
>> possible to safely verify sa_offset really is < filesize).
>>
>> Do you mean modifying ganesha/gluster as knfsd does?
>>
>> seek->seek_pos = vfs_llseek(file, seek->seek_offset, whence);
>> if (seek->seek_pos < 0)
>> status = nfserrno(seek->seek_pos);
>> else if (seek->seek_pos >= i_size_read(file_inode(file)))
>> seek->seek_eof = true;
>>
>> It is a working implementation, but not according to RFC description,
>>
>> If the server can not find a corresponding sa_what,
>> then the status will still be NFS4_OK, but sr_eof would be TRUE.
>>
>> As in this example, there is a HOLE at the end of the file, SEEK(in hole,
>> SEEK_DATA) should return NFS4_OK and sr_eof is TRUE, but knfsd return
>> NFS4ERR_NXIO.
>
> FSAL_GLUSTER always translates lseek return of ENXIO to NFS4_Ok with sr_eod TRUE.
>
> It should at least ONLY do that if sa_offset is < filesize (which would then be correct per RFC).
>
> Knfsd, to my understanding, looks like it always just returns ENXIO (which isn't exactly per RFC, but at least doesn't confuse the client and application as badly).
>

Copy that.
I will push a fix as knfsd.

thanks,
Kinglong Mee

2020-09-14 19:30:16

by Frank Filz

[permalink] [raw]
Subject: RE: [NFS-Ganesha-Devel] Re: lseek gets bad offset from nfs client with ganesha/gluster which supports SEEK

Has something changed in the RFC, the knfsd code, or the client with respect to seeking a hole at the end of file, particularly a legacy file that has no actual holes?

Thanks

Frank

> -----Original Message-----
> From: Kinglong Mee [mailto:[email protected]]
> Sent: Wednesday, September 12, 2018 5:04 PM
> To: Frank Filz <[email protected]>; 'Anna Schumaker'
> <[email protected]>; 'Trond Myklebust'
> <[email protected]>; [email protected]; linux-
> [email protected]
> Subject: [NFS-Ganesha-Devel] Re: lseek gets bad offset from nfs client with
> ganesha/gluster which supports SEEK
>
> On 2018/9/12 19:58, Frank Filz wrote:
> >> On 2018/9/12 7:20, Frank Filz wrote:
> >>>> On Tue, 2018-09-11 at 22:47 +0800, Kinglong Mee wrote:
> >>>>> On 2018/9/11 20:57, Trond Myklebust wrote:
> >>>>>> On Tue, 2018-09-11 at 20:29 +0800, Kinglong Mee wrote:
> >>>>>>> The latest ganesha/gluster supports seek according to,
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41#sec
> >>>>>> ti
> >>>>>> o
> >>>>>> n-15.11
> >>>>>>>
> >>>>>>> From the given sa_offset, find the next data_content4 of type
> >>>>>>> sa_what
> >>>>>>> in the file. If the server can not find a corresponding sa_what,
> >>>>>>> then the status will still be NFS4_OK, but sr_eof would be
> >>>>>>> TRUE. If
> >>>>>>> the server can find the sa_what, then the sr_offset is the
> >>>>>>> start of
> >>>>>>> that content. If the sa_offset is beyond the end of the
> >>>>>>> file, then
> >>>>>>> SEEK MUST return NFS4ERR_NXIO.
> >>>>>>>
> >>>>>>> For a file's filemap as,
> >>>>>>>
> >>>>>>> Part 1: HOLE 0x0000000000000000 ---> 0x0000000000600000
> >>>>>>> Part 2: DATA 0x0000000000600000 ---> 0x0000000000700000
> >>>>>>> Part 3: HOLE 0x0000000000700000 ---> 0x0000000001000000>>
> >>>>>>> SEEK(0x700000, SEEK_DATA) gets result (sr_eof:1,
> >>>>>>> sr_offset:0x70000) from ganesha/gluster; SEEK(0x700000,
> >>>>>>> SEEK_HOLE) gets result (sr_eof:0, sr_offset:0x70000) from
> ganesha/gluster.
> >>>>>>>
> >>>>>>> If an application depends the lseek result for data searching,
> >>>>>>> it may enter infinite loop.
> >>>>>>>
> >>>>>>> while (1) {
> >>>>>>> next_pos = lseek(fd, cur_pos, seek_type);
> >>>>>>> if (seek_type == SEEK_DATA) {
> >>>>>>> seek_type = SEEK_HOLE;
> >>>>>>> } else {
> >>>>>>> seek_type = SEEK_DATA;
> >>>>>>> }
> >>>>>>>
> >>>>>>> if (next_pos == -1) {
> >>>>>>> return ;
> >>>>>>>
> >>>>>>> cur_pos = next_pos;
> >>>>>>> }
> >>>>>>>
> >>>>>>> The lseek syscall always gets 0x70000 from nfs client for those
> >>>>>>> two cases, but, if underlying filesystem is ext4/f2fs, or the
> >>>>>>> nfs server is knfsd, the lseek(0x700000, SEEK_DATA) gets ENXIO.
> >>>>>>>
> >>>>>>> I wanna to know,
> >>>>>>> should I fix the ganesha/gluster as knfsd return ENXIO for the
> >>>>>>> first case?
> >>>>>>> or should I fix the nfs client to return ENXIO for the first case?
> >>>>>>>
> >>>>>>
> >>>>>> It that test correct? The fallback implementation of SEEK_DATA
> >>>>>> assumes that the entire file is data, so lseek(SEEK_DATA) on any
> >>>>>> offset that is <= eof will be a no-op. The fallback
> >>>>>> implementation of SEEK_HOLE assumes that the first hole is at eof.
> >>>>>
> >>>>> I think that's non-nfsv4.2's logical.
> >>>>>
> >>>>>>
> >>>>>> IOW: unless the initial value for cur_pos is > eof, it looks to
> >>>>>> me as if the above test will loop infinitely given any filesystem
> >>>>>> that doesn't implement native support for SEEK_DATA/SEEK_HOLE.
> >>>>>>
> >>>>>
> >>>>> No, if underlying filesystem is ext4/f2fs, or the nfs server is
> >>>>> knfsd, the last lseek syscall always return ENXIO no matter the
> >>>>> cur_pos is > eof or not.
> >>>>>
> >>>>> A file ends with a hole as,
> >>>>> Part 22: DATA 0x0000000006a00000 ---> 0x0000000006afffff
> >>>>> Part 23: HOLE 0x0000000006b00000 ---> 0x000000000c7fffff
> >>>>>
> >>>>> # stat testfile
> >>>>> File: testfile
> >>>>> Size: 209715200 Blocks: 22640 IO Block: 4096 regular file
> >>>>> Device: 807h/2055d Inode: 843122 Links: 2
> >>>>> Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root)
> >>>>> Access: 2018-09-11 20:01:41.881227061 +0800
> >>>>> Modify: 2018-09-11 20:01:41.976478311 +0800
> >>>>> Change: 2018-09-11 20:01:41.976478311 +0800
> >>>>> Birth: -
> >>>>>
> >>>>> # strace filemap testfile
> >>>>> ... ...
> >>>>> lseek(3, 111149056, SEEK_HOLE) = 112197632
> >>>>> lseek(3, 112197632, SEEK_DATA) = -1 ENXIO (No such device or
> >> address)
> >>>>>
> >>>>> Right now, when knfsd gets the ENXIO error, it returns the error
> >>>>> to client directly, and return to syscall.
> >>>>> But, ganesha set the sr_eof to true and return NFS4_OK to client
> >>>>> as RFC description, nfs client skips the sr_eof and return a bad
> >>>>> offset to syscall.
> >>>>
> >>>> Would it make more sense to change Knfsd instead of the client? I
> >>>> think I was trying to keep things simple when I wrote the code, so
> >>>> I just passed the result of the lseek system call back to the client.
> >>>
> >>> Looking at the lseek(2) man page, it's not clear to me what should
> >>> be returned
> >> if as in this example, there is a HOLE at the end of the file (i.e.
> >> filesize is larger than the range of the last DATA in the file). It
> >> sounds like ext4 returns ENXIO if a SEEK_DATA is done past the last data in
> the file.
> >>>
> >>> SEEK_DATA
> >>> Adjust the file offset to the next location in the
> >>> file greater than or
> >> equal to offset containing data. If offset points to data, then the
> >> file offset is set
> >>> to offset.
> >>>
> >>> SEEK_HOLE
> >>> Adjust the file offset to the next hole in the file
> >>> greater than or equal
> >> to offset. If offset points into the middle of a hole, then the file
> >> offset is set to
> >>> offset. If there is no hole past offset, then the
> >>> file offset is adjusted to
> >> the end of the file (i.e., there is an implicit hole at the end of any file).
> >>>
> >>> In both of the above cases, lseek() fails if offset points
> >>> past the end of the
> >> file.
> >>>
> >>> These operations allow applications to map holes in a
> >>> sparsely allocated
> >> file. This can be useful for applications such as file backup tools,
> >> which can save space when
> >>> creating backups and preserve holes, if they have a mechanism
> >>> for
> >> discovering holes.
> >>>
> >>> For the purposes of these operations, a hole is a sequence of
> >>> zeros that
> >> (normally) has not been allocated in the underlying file storage.
> >> However, a filesystem is not
> >>> obliged to report holes, so these operations are not a
> >>> guaranteed
> >> mechanism for mapping the storage space actually allocated to a file.
> >> (Furthermore, a sequence of
> >>> zeros that actually has been written to the underlying
> >>> storage may not be
> >> reported as a hole.) In the simplest implementation, a filesystem
> >> can support the operations
> >>> by making SEEK_HOLE always return the offset of the end of
> >>> the file, and
> >> making SEEK_DATA always return offset (i.e., even if the location
> >> referred to by offset is a
> >>> hole, it can be considered to consist of data that is a sequence of zeros).
> >>>
> >>> The RFC text is pretty clear:
> >>>
> >>> SEEK is an operation that allows a client to determine the location
> >>> of the next data_content4 in a file. It allows an implementation of
> >>> the emerging extension to lseek(2) to allow clients to determine the
> >>> next hole whilst in data or the next data whilst in a hole.
> >>>
> >>> From the given sa_offset, find the next data_content4 of type sa_what
> >>> in the file. If the server can not find a corresponding sa_what,
> >>> then the status will still be NFS4_OK, but sr_eof would be TRUE. If
> >>> the server can find the sa_what, then the sr_offset is the start of
> >>> that content. If the sa_offset is beyond the end of the file, then
> >>> SEEK MUST return NFS4ERR_NXIO.
> >>>
> >>> All files MUST have a virtual hole at the end of the file. I.e., if
> >>> a filesystem does not support sparse files, then a compound with
> >>> {SEEK 0 NFS4_CONTENT_HOLE;} would return a result of {SEEK 1 X;}
> >>> where 'X' was the size of the file.
> >>>
> >>> Sa_offset is not past the end of the file, but there is no more
> >>> DATA, so a seek
> >> DATA from 0x70000 (original file) should return sr_eof TRUE.
> >>>
> >>> In either RFC or lseek(2), a seek HOLE for 0x70000 will return 0x70000.
> >>>
> >>> It certainly makes sense that you should be able to have a hole at
> >>> the end of a
> >> file (pre-allocated disk blocks but no data written yet), and is in
> >> fact what
> >> fallocate(2) will do.
> >>>
> >>> An NFS server could check the filesize and if sa_offset is <
> >>> filesize and a
> >> SEEK_DATA returns ENXIO, it could translate that to NFS4_OK and set
> >> sr_eof to TRUE.
> >>>
> >>> The Ganesha code in FSAL_GLUSTER I believe is wrong. It changes any
> >>> ENXIO
> >> result to NFS4_OK with sr_eof TRUE. It would be better for it to do
> >> the simple thing knfsd does of always passing along the ENXIO (this
> >> may be best if it is not possible to safely verify sa_offset really is < filesize).
> >>
> >> Do you mean modifying ganesha/gluster as knfsd does?
> >>
> >> seek->seek_pos = vfs_llseek(file, seek->seek_offset, whence);
> >> if (seek->seek_pos < 0)
> >> status = nfserrno(seek->seek_pos);
> >> else if (seek->seek_pos >= i_size_read(file_inode(file)))
> >> seek->seek_eof = true;
> >>
> >> It is a working implementation, but not according to RFC description,
> >>
> >> If the server can not find a corresponding sa_what,
> >> then the status will still be NFS4_OK, but sr_eof would be TRUE.
> >>
> >> As in this example, there is a HOLE at the end of the file, SEEK(in
> >> hole,
> >> SEEK_DATA) should return NFS4_OK and sr_eof is TRUE, but knfsd return
> >> NFS4ERR_NXIO.
> >
> > FSAL_GLUSTER always translates lseek return of ENXIO to NFS4_Ok with
> sr_eod TRUE.
> >
> > It should at least ONLY do that if sa_offset is < filesize (which would then be
> correct per RFC).
> >
> > Knfsd, to my understanding, looks like it always just returns ENXIO (which isn't
> exactly per RFC, but at least doesn't confuse the client and application as badly).
> >
>
> Copy that.
> I will push a fix as knfsd.
>
> thanks,
> Kinglong Mee
> _______________________________________________
> Devel mailing list -- [email protected] To unsubscribe send an email to
> [email protected]