Trond:
Fiddling with the Crytographic File System the other day, I managed to
tickle a mysterious bug. When some directories grew large enough,
suddenly a chunk of files would half "disappear". "find" would list
them fine, but "ls" and "echo *" wouldn't.
After a bit of troubleshooting, I discovered that the CFS daemon
(which presents itself to the system as an NFS daemon) was using
small, big-endian cookies in its directory entries. These became
large positive and negative little-endian "d_off" values in the dirent
structs.
The C library (in glibc-2.1.3/sysdeps/unix/sysv/linux/getdents.c) does
some fancy, double-buffering footwork in getdents(2) to try to guess
how many bytes of kernel_dirents it needs to read into a temporary
buffer to fill the user-supplied buffer with user dirents (which have
an extra "d_type" field). When its heuristic screws up, it does an
lseek on the directory so the next getdents(2) will start with the
right directory entry:
if ((char *) dp + new_reclen > buf + nbytes)
{
/* Our heuristic failed. We read too many entries. Reset
the stream. `last_offset' contains the last known
position. If it is zero this is the first record we are
reading. In this case do a relative search. */
if (last_offset == 0)
__lseek (fd, -retval, SEEK_CUR);
else
__lseek (fd, last_offset, SEEK_SET);
break;
}
In my case, for "ls" and "bash", the "last_offset" happened to be a
negative little-endian cookie. The kernel's "default_lseek" returned
EINVAL, the error was ignored, and "ls" and "bash" were blissfully
unaware that a bunch of directory entries had been read into the
temporary buffer and forever lost. Since "find" used a different
buffer size, it happened to have a positive little-endian cookie for
"last_offset" and didn't exhibit the problem.
A fix was easy---after modifying CFS to convert its cookies to small,
little-endian numbers, everything worked fine.
However, who's to blame here? It can't be CFS---any four-byte cookie
should be valid, right?
Is the kernel NFS client code to blame? If it's going to be using
cookies as offsets, shouldn't we have an nfs_lseek that special-cases
directory lseeks (at least those using SEEK_SET) to take negative
offsets, so utilities and libraries don't need to be bigfile-aware
just to read directories? And what in the world can we do about bogus
code like the:
__lseek (fd, -retval, SEEK_CUR);
that appears above? Shouldn't any non-SEEK_SET lseek on an NFS
directory fail with an error?
Any thoughts?
Thanks.
Kevin <[email protected]>
>>>>> " " == Kevin Buhr <[email protected]> writes:
> However, who's to blame here? It can't be CFS---any four-byte
> cookie should be valid, right?
> Is the kernel NFS client code to blame? If it's going to be
> using cookies as offsets, shouldn't we have an nfs_lseek that
> special-cases directory lseeks (at least those using SEEK_SET)
> to take negative offsets, so utilities and libraries don't need
> to be bigfile-aware just to read directories? And what in the
> world can we do about bogus code like the:
The problem here is that in NFS we have to match cookies in lieu of
using true directory 'offsets'. I did try to work around this by using
offsets into the page cache and the likes, however this sort of scheme
is almost impossible to implement sanely because an offset into the
page cache changes all the time. This was why I returned to Olaf's
scheme in which we use the cookie as the return value for lseek &
friends.
The problem then arises that lseek tries to cram both a returned
offset and an error value into the return values. When NFS returns an
opaque type, this causes a problem; one that won't be fixed by adding
an nfs_lseek. Furthermore, lseek is 32-bits: for NFSv3 and higher, the
cookie is 64-bits...
I know of no scheme that can fix all problems with lseek.
For example concerning SEEK_CUR: forget about it. NFS is not POSIX and
never will be. You simply cannot give meaningful semantics to SEEK_CUR
as long as the client knows nothing about the organization of dirents
on the server.
We can return offsets that are based on the internal caching of
dirents, but the problem then is that you need to find some permanent
'index' that doesn't change when we invalidate the cache and read it
in anew. Making stuff like 'rm -rf *' work (when the directory size &
organization keeps changing) is quite a challenge...
One possibility would be to make a pointer into a table of cookies be
our 'offset'. That could work if we can ensure that cookies can't move
around...
Cheers,
Trond
Trond Myklebust <[email protected]> writes:
>
> The problem then arises that lseek tries to cram both a returned
> offset and an error value into the return values.
Oops. You're right; I didn't think of this.
So, I guess the best short-term solution is to fix the C library so it
always uses llseek for directories and never tries something stupid
like a SEEK_CUR. Then, at least it'll always work for NFSv2. I'll
file a bug report.
At the same time, a patch for CFS to use "small" (from a little-endian
perspective) cookies couldn't hurt, so I'll do that, too.
Thanks for the help.
Kevin <[email protected]>
Hi!
> Fiddling with the Crytographic File System the other day, I managed to
> tickle a mysterious bug. When some directories grew large enough,
> suddenly a chunk of files would half "disappear". "find" would list
> them fine, but "ls" and "echo *" wouldn't.
>
> After a bit of troubleshooting, I discovered that the CFS daemon
> (which presents itself to the system as an NFS daemon) was using
Do you run CFS daemon and client on same machine? Where is
documentation/download of CFS?
Pavel
--
I'm [email protected]. "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [email protected]