Return-Path: Received: from mail-pf1-f179.google.com ([209.85.210.179]:37452 "EHLO mail-pf1-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726261AbeIMFK6 (ORCPT ); Thu, 13 Sep 2018 01:10:58 -0400 Received: by mail-pf1-f179.google.com with SMTP id h69-v6so1790519pfd.4 for ; Wed, 12 Sep 2018 17:04:05 -0700 (PDT) Subject: Re: [NFS-Ganesha-Devel] Re: lseek gets bad offset from nfs client with ganesha/gluster which supports SEEK To: Frank Filz , 'Anna Schumaker' , 'Trond Myklebust' , devel@lists.nfs-ganesha.org, linux-nfs@vger.kernel.org References: <182fecd3-66c3-0cbd-fbc0-cf1a88b7f165@gmail.com> <182a9c6a29690fc00789768bdfe03affbac0db9e.camel@hammerspace.com> <53f12a27-e1f5-936f-23e9-58d3cbf7a00f@gmail.com> <6f906850076c15aedff11d721fe91d3279935224.camel@gmail.com> <00e301d44a26$12cd63d0$38682b70$@mindspring.com> <012901d44a8f$ed7c6bc0$c8754340$@mindspring.com> From: Kinglong Mee Message-ID: Date: Thu, 13 Sep 2018 08:03:51 +0800 MIME-Version: 1.0 In-Reply-To: <012901d44a8f$ed7c6bc0$c8754340$@mindspring.com> Content-Type: text/plain; charset=utf-8 Sender: linux-nfs-owner@vger.kernel.org List-ID: On 2018/9/12 19:58, Frank Filz wrote: >> On 2018/9/12 7:20, Frank Filz wrote: >>>> On Tue, 2018-09-11 at 22:47 +0800, Kinglong Mee wrote: >>>>> On 2018/9/11 20:57, Trond Myklebust wrote: >>>>>> On Tue, 2018-09-11 at 20:29 +0800, Kinglong Mee wrote: >>>>>>> The latest ganesha/gluster supports seek according to, >>>>>>> >>>>>>> >>>>>> >>>>>> https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41#secti >>>>>> o >>>>>> n-15.11 >>>>>>> >>>>>>> From the given sa_offset, find the next data_content4 of type >>>>>>> sa_what >>>>>>> in the file. If the server can not find a corresponding sa_what, >>>>>>> then the status will still be NFS4_OK, but sr_eof would be >>>>>>> TRUE. If >>>>>>> the server can find the sa_what, then the sr_offset is the >>>>>>> start of >>>>>>> that content. If the sa_offset is beyond the end of the file, >>>>>>> then >>>>>>> SEEK MUST return NFS4ERR_NXIO. >>>>>>> >>>>>>> For a file's filemap as, >>>>>>> >>>>>>> Part 1: HOLE 0x0000000000000000 ---> 0x0000000000600000 >>>>>>> Part 2: DATA 0x0000000000600000 ---> 0x0000000000700000 >>>>>>> Part 3: HOLE 0x0000000000700000 ---> 0x0000000001000000>> >>>>>>> SEEK(0x700000, SEEK_DATA) gets result (sr_eof:1, >>>>>>> sr_offset:0x70000) from ganesha/gluster; SEEK(0x700000, SEEK_HOLE) >>>>>>> gets result (sr_eof:0, sr_offset:0x70000) from ganesha/gluster. >>>>>>> >>>>>>> If an application depends the lseek result for data searching, it >>>>>>> may enter infinite loop. >>>>>>> >>>>>>> while (1) { >>>>>>> next_pos = lseek(fd, cur_pos, seek_type); >>>>>>> if (seek_type == SEEK_DATA) { >>>>>>> seek_type = SEEK_HOLE; >>>>>>> } else { >>>>>>> seek_type = SEEK_DATA; >>>>>>> } >>>>>>> >>>>>>> if (next_pos == -1) { >>>>>>> return ; >>>>>>> >>>>>>> cur_pos = next_pos; >>>>>>> } >>>>>>> >>>>>>> The lseek syscall always gets 0x70000 from nfs client for those >>>>>>> two cases, but, if underlying filesystem is ext4/f2fs, or the nfs >>>>>>> server is knfsd, the lseek(0x700000, SEEK_DATA) gets ENXIO. >>>>>>> >>>>>>> I wanna to know, >>>>>>> should I fix the ganesha/gluster as knfsd return ENXIO for the >>>>>>> first case? >>>>>>> or should I fix the nfs client to return ENXIO for the first case? >>>>>>> >>>>>> >>>>>> It that test correct? The fallback implementation of SEEK_DATA >>>>>> assumes that the entire file is data, so lseek(SEEK_DATA) on any >>>>>> offset that is <= eof will be a no-op. The fallback implementation >>>>>> of SEEK_HOLE assumes that the first hole is at eof. >>>>> >>>>> I think that's non-nfsv4.2's logical. >>>>> >>>>>> >>>>>> IOW: unless the initial value for cur_pos is > eof, it looks to me >>>>>> as if the above test will loop infinitely given any filesystem that >>>>>> doesn't implement native support for SEEK_DATA/SEEK_HOLE. >>>>>> >>>>> >>>>> No, if underlying filesystem is ext4/f2fs, or the nfs server is >>>>> knfsd, the last lseek syscall always return ENXIO no matter the >>>>> cur_pos is > eof or not. >>>>> >>>>> A file ends with a hole as, >>>>> Part 22: DATA 0x0000000006a00000 ---> 0x0000000006afffff >>>>> Part 23: HOLE 0x0000000006b00000 ---> 0x000000000c7fffff >>>>> >>>>> # stat testfile >>>>> File: testfile >>>>> Size: 209715200 Blocks: 22640 IO Block: 4096 regular file >>>>> Device: 807h/2055d Inode: 843122 Links: 2 >>>>> Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root) >>>>> Access: 2018-09-11 20:01:41.881227061 +0800 >>>>> Modify: 2018-09-11 20:01:41.976478311 +0800 >>>>> Change: 2018-09-11 20:01:41.976478311 +0800 >>>>> Birth: - >>>>> >>>>> # strace filemap testfile >>>>> ... ... >>>>> lseek(3, 111149056, SEEK_HOLE) = 112197632 >>>>> lseek(3, 112197632, SEEK_DATA) = -1 ENXIO (No such device or >> address) >>>>> >>>>> Right now, when knfsd gets the ENXIO error, it returns the error to >>>>> client directly, and return to syscall. >>>>> But, ganesha set the sr_eof to true and return NFS4_OK to client as >>>>> RFC description, nfs client skips the sr_eof and return a bad offset >>>>> to syscall. >>>> >>>> Would it make more sense to change Knfsd instead of the client? I >>>> think I was trying to keep things simple when I wrote the code, so I >>>> just passed the result of the lseek system call back to the client. >>> >>> Looking at the lseek(2) man page, it's not clear to me what should be returned >> if as in this example, there is a HOLE at the end of the file (i.e. filesize is larger >> than the range of the last DATA in the file). It sounds like ext4 returns ENXIO if a >> SEEK_DATA is done past the last data in the file. >>> >>> SEEK_DATA >>> Adjust the file offset to the next location in the file greater than or >> equal to offset containing data. If offset points to data, then the file offset is >> set >>> to offset. >>> >>> SEEK_HOLE >>> Adjust the file offset to the next hole in the file greater than or equal >> to offset. If offset points into the middle of a hole, then the file offset is set to >>> offset. If there is no hole past offset, then the file offset is adjusted to >> the end of the file (i.e., there is an implicit hole at the end of any file). >>> >>> In both of the above cases, lseek() fails if offset points past the end of the >> file. >>> >>> These operations allow applications to map holes in a sparsely allocated >> file. This can be useful for applications such as file backup tools, which can save >> space when >>> creating backups and preserve holes, if they have a mechanism for >> discovering holes. >>> >>> For the purposes of these operations, a hole is a sequence of zeros that >> (normally) has not been allocated in the underlying file storage. However, a >> filesystem is not >>> obliged to report holes, so these operations are not a guaranteed >> mechanism for mapping the storage space actually allocated to a file. >> (Furthermore, a sequence of >>> zeros that actually has been written to the underlying storage may not be >> reported as a hole.) In the simplest implementation, a filesystem can support >> the operations >>> by making SEEK_HOLE always return the offset of the end of the file, and >> making SEEK_DATA always return offset (i.e., even if the location referred to by >> offset is a >>> hole, it can be considered to consist of data that is a sequence of zeros). >>> >>> The RFC text is pretty clear: >>> >>> SEEK is an operation that allows a client to determine the location >>> of the next data_content4 in a file. It allows an implementation of >>> the emerging extension to lseek(2) to allow clients to determine the >>> next hole whilst in data or the next data whilst in a hole. >>> >>> From the given sa_offset, find the next data_content4 of type sa_what >>> in the file. If the server can not find a corresponding sa_what, >>> then the status will still be NFS4_OK, but sr_eof would be TRUE. If >>> the server can find the sa_what, then the sr_offset is the start of >>> that content. If the sa_offset is beyond the end of the file, then >>> SEEK MUST return NFS4ERR_NXIO. >>> >>> All files MUST have a virtual hole at the end of the file. I.e., if >>> a filesystem does not support sparse files, then a compound with >>> {SEEK 0 NFS4_CONTENT_HOLE;} would return a result of {SEEK 1 X;} >>> where 'X' was the size of the file. >>> >>> Sa_offset is not past the end of the file, but there is no more DATA, so a seek >> DATA from 0x70000 (original file) should return sr_eof TRUE. >>> >>> In either RFC or lseek(2), a seek HOLE for 0x70000 will return 0x70000. >>> >>> It certainly makes sense that you should be able to have a hole at the end of a >> file (pre-allocated disk blocks but no data written yet), and is in fact what >> fallocate(2) will do. >>> >>> An NFS server could check the filesize and if sa_offset is < filesize and a >> SEEK_DATA returns ENXIO, it could translate that to NFS4_OK and set sr_eof to >> TRUE. >>> >>> The Ganesha code in FSAL_GLUSTER I believe is wrong. It changes any ENXIO >> result to NFS4_OK with sr_eof TRUE. It would be better for it to do the simple >> thing knfsd does of always passing along the ENXIO (this may be best if it is not >> possible to safely verify sa_offset really is < filesize). >> >> Do you mean modifying ganesha/gluster as knfsd does? >> >> seek->seek_pos = vfs_llseek(file, seek->seek_offset, whence); >> if (seek->seek_pos < 0) >> status = nfserrno(seek->seek_pos); >> else if (seek->seek_pos >= i_size_read(file_inode(file))) >> seek->seek_eof = true; >> >> It is a working implementation, but not according to RFC description, >> >> If the server can not find a corresponding sa_what, >> then the status will still be NFS4_OK, but sr_eof would be TRUE. >> >> As in this example, there is a HOLE at the end of the file, SEEK(in hole, >> SEEK_DATA) should return NFS4_OK and sr_eof is TRUE, but knfsd return >> NFS4ERR_NXIO. > > FSAL_GLUSTER always translates lseek return of ENXIO to NFS4_Ok with sr_eod TRUE. > > It should at least ONLY do that if sa_offset is < filesize (which would then be correct per RFC). > > Knfsd, to my understanding, looks like it always just returns ENXIO (which isn't exactly per RFC, but at least doesn't confuse the client and application as badly). > Copy that. I will push a fix as knfsd. thanks, Kinglong Mee