2019-11-21 17:56:28

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [musl] getdents64 lost direntries with SMB/NFS and buffer size < unknown threshold

On Wed, Nov 20, 2019 at 03:59:13PM -0500, Rich Felker wrote:
>
> POSIX only allows both behaviors (showing or not showing) the entry
> that was deleted. It does not allow deletion of one entry to cause
> other entries not to be seen.

Agreed, but POSIX requires this of *readdir*. POSIX says nothing
about getdents64(2), which is Linux's internal implementation which is
exposed to a libc.

So we would need to see what is exactly going on at the interfaces
between the VFS and libc, the nfs client code and the VFS, the nfs
client code and the nfs server, and possibly the behavior of the nfs
server.

First of all.... you can't reproduce this on anything other than with
NFS, correct? That is, does it show up if you are using ext4, xfs,
btrfs, etc.?

Secondly, have you tried this on more than one NFS server
implementation?

Finally, can you capture strace logs and tcpdump logs of the
communication between the NFS client and server code?

> > But many file systems simply provide not the necessary on-disk data
> > structures which are need to ensure stable iteration in the face of
> > modification of the directory. There are hacks, of course, such as
> > compacting the on-disk directory only on file creation, which solves
> > the file removal case.

Oh, that's not the worst of it. You have to do a lot more if the file
system needs to support telldir/seekdir, and if you want to export the
file system over NFS. If you are using anything other than a linear
linked list implementation for your directory, you have to really turn
sommersaults to make sure things work (and work efficiently) in the
face of, say, node splits of you are using some kind of tree structure
for your directory.

Most file systems do get this right, at least if they hope to be
safely able to be exportable via NFS, or via CIFS using Samba.

- Ted


2019-12-25 19:40:28

by Florian Weimer

[permalink] [raw]
Subject: Re: [musl] getdents64 lost direntries with SMB/NFS and buffer size < unknown threshold

* Theodore Y. Ts'o:

> On Wed, Nov 20, 2019 at 03:59:13PM -0500, Rich Felker wrote:
>>
>> POSIX only allows both behaviors (showing or not showing) the entry
>> that was deleted. It does not allow deletion of one entry to cause
>> other entries not to be seen.
>
> Agreed, but POSIX requires this of *readdir*. POSIX says nothing
> about getdents64(2), which is Linux's internal implementation which is
> exposed to a libc.

Sure, but Linux better provides some reasonable foundation for a libc.

I mean, sure, we can read the entire directory into RAM on the first
readdir, and get a fully conforming implementation this way (and as
Rich noted, glibc's 32 KiB buffer tends to approximate that in
practice). But that doesn't strike me as particularly useful.

The POSIX requirement is really unfortunate because it leads to
incorrect implementations of rm -rf which would on a compliant system
and fail in practice.

> So we would need to see what is exactly going on at the interfaces
> between the VFS and libc, the nfs client code and the VFS, the nfs
> client code and the nfs server, and possibly the behavior of the nfs
> server.
>
> First of all.... you can't reproduce this on anything other than with
> NFS, correct? That is, does it show up if you are using ext4, xfs,
> btrfs, etc.?

I'm sure it shows up with certain directory contents on any Linux file
system except for those that happen to have a separate B-tree (or
equivalent) for telldir/seekdir support. And even those will have
broken corner case in case of billions of directory operations.

32 bits are simply not enough storage space for the cookie. Hashing
just masks the presence of these bugs, but does not eliminate them
completely.