Return-Path: Received: from mail-pl1-f174.google.com ([209.85.214.174]:38394 "EHLO mail-pl1-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726686AbeILGdg (ORCPT ); Wed, 12 Sep 2018 02:33:36 -0400 Received: by mail-pl1-f174.google.com with SMTP id u11-v6so141468plq.5 for ; Tue, 11 Sep 2018 18:31:35 -0700 (PDT) From: Kinglong Mee Subject: Re: [NFS-Ganesha-Devel] Re: lseek gets bad offset from nfs client with ganesha/gluster which supports SEEK To: Frank Filz , 'Anna Schumaker' , 'Trond Myklebust' , devel@lists.nfs-ganesha.org, linux-nfs@vger.kernel.org, kinglongmee@gmail.com References: <182fecd3-66c3-0cbd-fbc0-cf1a88b7f165@gmail.com> <182a9c6a29690fc00789768bdfe03affbac0db9e.camel@hammerspace.com> <53f12a27-e1f5-936f-23e9-58d3cbf7a00f@gmail.com> <6f906850076c15aedff11d721fe91d3279935224.camel@gmail.com> <00e301d44a26$12cd63d0$38682b70$@mindspring.com> Message-ID: Date: Wed, 12 Sep 2018 09:31:22 +0800 MIME-Version: 1.0 In-Reply-To: <00e301d44a26$12cd63d0$38682b70$@mindspring.com> Content-Type: text/plain; charset=utf-8 Sender: linux-nfs-owner@vger.kernel.org List-ID: On 2018/9/12 7:20, Frank Filz wrote: >> On Tue, 2018-09-11 at 22:47 +0800, Kinglong Mee wrote: >>> On 2018/9/11 20:57, Trond Myklebust wrote: >>>> On Tue, 2018-09-11 at 20:29 +0800, Kinglong Mee wrote: >>>>> The latest ganesha/gluster supports seek according to, >>>>> >>>>> >>>> >>>> https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41#sectio >>>> n-15.11 >>>>> >>>>> From the given sa_offset, find the next data_content4 of type >>>>> sa_what >>>>> in the file. If the server can not find a corresponding sa_what, >>>>> then the status will still be NFS4_OK, but sr_eof would be >>>>> TRUE. If >>>>> the server can find the sa_what, then the sr_offset is the >>>>> start of >>>>> that content. If the sa_offset is beyond the end of the file, >>>>> then >>>>> SEEK MUST return NFS4ERR_NXIO. >>>>> >>>>> For a file's filemap as, >>>>> >>>>> Part 1: HOLE 0x0000000000000000 ---> 0x0000000000600000 >>>>> Part 2: DATA 0x0000000000600000 ---> 0x0000000000700000 >>>>> Part 3: HOLE 0x0000000000700000 ---> 0x0000000001000000>> >>>>> SEEK(0x700000, SEEK_DATA) gets result (sr_eof:1, >>>>> sr_offset:0x70000) from ganesha/gluster; SEEK(0x700000, SEEK_HOLE) >>>>> gets result (sr_eof:0, sr_offset:0x70000) from ganesha/gluster. >>>>> >>>>> If an application depends the lseek result for data searching, it >>>>> may enter infinite loop. >>>>> >>>>> while (1) { >>>>> next_pos = lseek(fd, cur_pos, seek_type); >>>>> if (seek_type == SEEK_DATA) { >>>>> seek_type = SEEK_HOLE; >>>>> } else { >>>>> seek_type = SEEK_DATA; >>>>> } >>>>> >>>>> if (next_pos == -1) { >>>>> return ; >>>>> >>>>> cur_pos = next_pos; >>>>> } >>>>> >>>>> The lseek syscall always gets 0x70000 from nfs client for those >>>>> two cases, but, if underlying filesystem is ext4/f2fs, or the nfs >>>>> server is knfsd, the lseek(0x700000, SEEK_DATA) gets ENXIO. >>>>> >>>>> I wanna to know, >>>>> should I fix the ganesha/gluster as knfsd return ENXIO for the >>>>> first case? >>>>> or should I fix the nfs client to return ENXIO for the first case? >>>>> >>>> >>>> It that test correct? The fallback implementation of SEEK_DATA >>>> assumes that the entire file is data, so lseek(SEEK_DATA) on any >>>> offset that is <= eof will be a no-op. The fallback implementation >>>> of SEEK_HOLE assumes that the first hole is at eof. >>> >>> I think that's non-nfsv4.2's logical. >>> >>>> >>>> IOW: unless the initial value for cur_pos is > eof, it looks to me >>>> as if the above test will loop infinitely given any filesystem that >>>> doesn't implement native support for SEEK_DATA/SEEK_HOLE. >>>> >>> >>> No, if underlying filesystem is ext4/f2fs, or the nfs server is knfsd, >>> the last lseek syscall always return ENXIO no matter the cur_pos is > >>> eof or not. >>> >>> A file ends with a hole as, >>> Part 22: DATA 0x0000000006a00000 ---> 0x0000000006afffff >>> Part 23: HOLE 0x0000000006b00000 ---> 0x000000000c7fffff >>> >>> # stat testfile >>> File: testfile >>> Size: 209715200 Blocks: 22640 IO Block: 4096 regular file >>> Device: 807h/2055d Inode: 843122 Links: 2 >>> Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root) >>> Access: 2018-09-11 20:01:41.881227061 +0800 >>> Modify: 2018-09-11 20:01:41.976478311 +0800 >>> Change: 2018-09-11 20:01:41.976478311 +0800 >>> Birth: - >>> >>> # strace filemap testfile >>> ... ... >>> lseek(3, 111149056, SEEK_HOLE) = 112197632 >>> lseek(3, 112197632, SEEK_DATA) = -1 ENXIO (No such device or address) >>> >>> Right now, when knfsd gets the ENXIO error, it returns the error to >>> client directly, and return to syscall. >>> But, ganesha set the sr_eof to true and return NFS4_OK to client as >>> RFC description, nfs client skips the sr_eof and return a bad offset >>> to syscall. >> >> Would it make more sense to change Knfsd instead of the client? I think I was >> trying to keep things simple when I wrote the code, so I just passed the result of >> the lseek system call back to the client. > > Looking at the lseek(2) man page, it's not clear to me what should be returned if as in this example, there is a HOLE at the end of the file (i.e. filesize is larger than the range of the last DATA in the file). It sounds like ext4 returns ENXIO if a SEEK_DATA is done past the last data in the file. > > SEEK_DATA > Adjust the file offset to the next location in the file greater than or equal to offset containing data. If offset points to data, then the file offset is set > to offset. > > SEEK_HOLE > Adjust the file offset to the next hole in the file greater than or equal to offset. If offset points into the middle of a hole, then the file offset is set to > offset. If there is no hole past offset, then the file offset is adjusted to the end of the file (i.e., there is an implicit hole at the end of any file). > > In both of the above cases, lseek() fails if offset points past the end of the file. > > These operations allow applications to map holes in a sparsely allocated file. This can be useful for applications such as file backup tools, which can save space when > creating backups and preserve holes, if they have a mechanism for discovering holes. > > For the purposes of these operations, a hole is a sequence of zeros that (normally) has not been allocated in the underlying file storage. However, a filesystem is not > obliged to report holes, so these operations are not a guaranteed mechanism for mapping the storage space actually allocated to a file. (Furthermore, a sequence of > zeros that actually has been written to the underlying storage may not be reported as a hole.) In the simplest implementation, a filesystem can support the operations > by making SEEK_HOLE always return the offset of the end of the file, and making SEEK_DATA always return offset (i.e., even if the location referred to by offset is a > hole, it can be considered to consist of data that is a sequence of zeros). > > The RFC text is pretty clear: > > SEEK is an operation that allows a client to determine the location > of the next data_content4 in a file. It allows an implementation of > the emerging extension to lseek(2) to allow clients to determine the > next hole whilst in data or the next data whilst in a hole. > > From the given sa_offset, find the next data_content4 of type sa_what > in the file. If the server can not find a corresponding sa_what, > then the status will still be NFS4_OK, but sr_eof would be TRUE. If > the server can find the sa_what, then the sr_offset is the start of > that content. If the sa_offset is beyond the end of the file, then > SEEK MUST return NFS4ERR_NXIO. > > All files MUST have a virtual hole at the end of the file. I.e., if > a filesystem does not support sparse files, then a compound with > {SEEK 0 NFS4_CONTENT_HOLE;} would return a result of {SEEK 1 X;} > where 'X' was the size of the file. > > Sa_offset is not past the end of the file, but there is no more DATA, so a seek DATA from 0x70000 (original file) should return sr_eof TRUE. > > In either RFC or lseek(2), a seek HOLE for 0x70000 will return 0x70000. > > It certainly makes sense that you should be able to have a hole at the end of a file (pre-allocated disk blocks but no data written yet), and is in fact what fallocate(2) will do. > > An NFS server could check the filesize and if sa_offset is < filesize and a SEEK_DATA returns ENXIO, it could translate that to NFS4_OK and set sr_eof to TRUE. > > The Ganesha code in FSAL_GLUSTER I believe is wrong. It changes any ENXIO result to NFS4_OK with sr_eof TRUE. It would be better for it to do the simple thing knfsd does of always passing along the ENXIO (this may be best if it is not possible to safely verify sa_offset really is < filesize). Do you mean modifying ganesha/gluster as knfsd does? seek->seek_pos = vfs_llseek(file, seek->seek_offset, whence); if (seek->seek_pos < 0) status = nfserrno(seek->seek_pos); else if (seek->seek_pos >= i_size_read(file_inode(file))) seek->seek_eof = true; It is a working implementation, but not according to RFC description, If the server can not find a corresponding sa_what, then the status will still be NFS4_OK, but sr_eof would be TRUE. As in this example, there is a HOLE at the end of the file, SEEK(in hole, SEEK_DATA) should return NFS4_OK and sr_eof is TRUE, but knfsd return NFS4ERR_NXIO. thanks, Kinglong Mee