From: Bernd Schubert Subject: Re: infinite getdents64 loop Date: Tue, 31 May 2011 19:43:58 +0200 Message-ID: <4DE528DE.5020908@itwm.fraunhofer.de> References: <201105281502.32719.sweet_f_a@gmx.de> <201105301137.02061.sweet_f_a@gmx.de> <1306767521.5971.2.camel@lade.trondhjem.org> <201105311147.24939.sweet_f_a@gmx.de> <4DE4C063.9060100@itwm.fraunhofer.de> <20110531123518.GB4215@thunk.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: "Ted Ts'o" , linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, "linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List" , Fan Yong To: Andreas Dilger Return-path: In-Reply-To: Sender: linux-nfs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: linux-ext4.vger.kernel.org On 05/31/2011 07:26 PM, Andreas Dilger wrote: > On 2011-05-31, at 6:35 AM, Ted Ts'o wrote: >> On Tue, May 31, 2011 at 12:18:11PM +0200, Bernd Schubert wrote: >>> >>> Out of interest, did anyone ever benchmark if dirindex provides any >>> advantages to readdir? And did those benchmarks include the >>> disadvantages of the present implementation (non-linear inode >>> numbers from readdir, so disk seeks on stat() (e.g. from 'ls -l') or >>> 'rm -fr $dir')? >> >> The problem is that seekdir/telldir is terminally broken (and so is >> NFSv2 for using a such a tiny cookie) in that it fundamentally assumes >> a linear data structure. If you're going to use any kind of >> tree-based data structure, a 32-bit "offset" for seekdir/telldir just >> doesn't cut it. We actually play games where we memoize the low >> 32-bits of the hash and keep track of which cookies we hand out via >> seekdir/telldir so that things mostly work --- except for NFSv2, where >> with the 32-bit cookie, you're just hosed. >> >> The reason why we have to iterate over the directory in hash tree >> order is because if we have a leaf node split, half the directories >> entries get copied to another directory entry, given the promises made >> by seekdir() and telldir() about directory entries appearing exactly >> once during a readdir() stream, even if you hold the fd open for weeks >> or days, mean that you really have to iterate over things in hash >> order. >> >> I'd have to look, since it's been too many years, but as I recall the >> problem was that there is a common path for NFSv2 and NFSv3/v4, so we >> don't know whether we can hand back a 32-bit cookie or a 64-bit >> cookie, so we're always handing the NFS server a 32-bit "offset", even >> though ew could do better. Actually, if we had an interface where we >> could give you a 128-bit "offset" into the directory, we could >> probably eliminate the duplicate cookie problem entirely. We just >> send 64-bits worth of hash, plus the first two bytes of the of file >> name. > > If it's of interest, we've implemented a 64-bit hash mode for ext4 to > solve just this problem for Lustre. The llseek() code will return a > 64-bit hash value on 64-bit systems, unless it is running for some > process that needs a 32-bit hash value (only NFSv2, AFAIK). > > The attached patch can at least form the basis for being able to return > 64-bit hash values for userspace/NFSv3/v4 when usable. The patch > is NOT usable as it stands now, since I've had to modify it from the > version that we are currently using for Lustre (this version hasn't > actually been compiled), but it at least shows the outline of what needs > to be done to get this working. None of the NFS side is implemented. Thanks Andreas! I haven't tested it yet, but the generic idea looks good. I guess the lower part of the patch (netfilter stuff) got accidentally in? Cheers, Bernd -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html