2004-10-25 01:22:28

by Timo Sirainen

[permalink] [raw]
Subject: readdir loses renamed files

I'd have thought this had already been asked many times before, but
google didn't show me anything.

My problem is that mails in a large maildir get temporarily lost. This
happens because readdir() never returns a file which was just rename()d
by another process. Either new or the old name would have been fine,
but it's not returned at all.

Is there a chance this could get fixed? Every OS/filesystem I've tested
so far has had the same problem, so I'll have to implement some extra
locking anyway (so much for maildir being lockless), but it would be
nice to have at least one OS where it works without the extra locking
overhead.

I have a test program if someone wants to try it:
http://dovecot.org/tmp/readdir.c

(and please Cc replies)


Attachments:
PGP.sig (186.00 B)
This is a digitally signed message part

2004-10-25 08:30:13

by Chris Wedgwood

[permalink] [raw]
Subject: Re: readdir loses renamed files

On Mon, Oct 25, 2004 at 04:21:57AM +0300, Timo Sirainen wrote:

> My problem is that mails in a large maildir get temporarily
> lost. This happens because readdir() never returns a file which was
> just rename()d by another process. Either new or the old name would
> have been fine, but it's not returned at all.

i don't think there are well defined semantics for this, it's
intrinsically hard to make it work the way you want for a number of
reasons (and what they are depends on the underlying fs)

> Is there a chance this could get fixed? Every OS/filesystem I've
> tested so far has had the same problem

i'll argue it's an application bug

> so I'll have to implement some extra locking anyway (so much for
> maildir being lockless), but it would be nice to have at least one
> OS where it works without the extra locking overhead.

why do you need extra locking? the next time the maildir is scanned
the message(s) will appear surely?


2004-10-25 12:36:09

by Timo Sirainen

[permalink] [raw]
Subject: Re: readdir loses renamed files

On 25.10.2004, at 11:29, Chris Wedgwood wrote:

> On Mon, Oct 25, 2004 at 04:21:57AM +0300, Timo Sirainen wrote:
>
>> My problem is that mails in a large maildir get temporarily
>> lost. This happens because readdir() never returns a file which was
>> just rename()d by another process. Either new or the old name would
>> have been fine, but it's not returned at all.
>
> i don't think there are well defined semantics for this, it's
> intrinsically hard to make it work the way you want for a number of
> reasons (and what they are depends on the underlying fs)

Thought so. Maybe reiser4 would work?

>> so I'll have to implement some extra locking anyway (so much for
>> maildir being lockless), but it would be nice to have at least one
>> OS where it works without the extra locking overhead.
>
> why do you need extra locking? the next time the maildir is scanned
> the message(s) will appear surely?

So if I lose a file, how many times should I scan the directory again
to know if it's really gone? And is it really worth the extra overhead
when it's hardly ever needed? I'd rather not knowingly build software
that works only in optimal conditions.

The test program that I had showed that the next scan didn't
necessarily return it either. The file was sometimes lost for as long
as 5 scans.

Of course I could just accept that messages go away and come back
again, but it's not very nice for an IMAP server to do. Some clients
attach metadata to messages based on their IMAP UID, so that would be
lost.

I guess one solution would be to use one of the dnotify's replacements
which tells which files were added, removed or renamed. Then readdir()
would be needed only when mailbox is being opened.


Attachments:
PGP.sig (186.00 B)
This is a digitally signed message part

2004-10-25 12:40:30

by Theodore Ts'o

[permalink] [raw]
Subject: Re: readdir loses renamed files

On Mon, Oct 25, 2004 at 04:21:57AM +0300, Timo Sirainen wrote:
> I'd have thought this had already been asked many times before, but
> google didn't show me anything.
>
> My problem is that mails in a large maildir get temporarily lost. This
> happens because readdir() never returns a file which was just rename()d
> by another process. Either new or the old name would have been fine,
> but it's not returned at all.
>
> Is there a chance this could get fixed? Every OS/filesystem I've tested
> so far has had the same problem, so I'll have to implement some extra
> locking anyway (so much for maildir being lockless), but it would be
> nice to have at least one OS where it works without the extra locking
> overhead.

In some cases it won't even just get lost, but the old and new name
can both be returned. For example, if you assume the use of a simple
non-tree, linked-list implementation of a directory, such can be found
in Solaris's ufs, BSD 4.3's FFS, Linux's ext2 and minix filesystems,
and many others, and you have a fully tightly packed directory (i.e.,
no gaps), with the directory entry "foo" at the beginning of the file,
and readdir() has already returned the first "foo" entry when some
other application renames it "Supercalifragilisticexpialidocious", the
new name will not fit in the old name's directory location, so it will
be placed at the end of the directory --- where it will be returned by
readdir() a second time.

This is not a bug; the POSIX specification explicitly allows this
behavior. If a filename is renamed during a readdir() session of a
directory, it is undefined where that neither, either, or both of the
new and old filenames will be returned.

And that's because there's no good way to do this without trashing the
performance of the system, especially when most applications don't
care. (Do you really want your entire system running significantly
slower, penalizing all other applications on your system, just because
of one stupid/badly-written application?) In order to do this, the
kernel would have to atomically snapshot the directory --- even one
which might be several megabytes in length, and store a copy of it in
memory, while the application calls readdir(). Several processes
could perform a denial-of-service attack by starting to call
readdir(), and then stopping. This would end up locking up huge
amounts of non-pageable system memory, and cause the system to come
down in a hurry.

- Ted

2004-10-25 12:48:39

by Jan Engelhardt

[permalink] [raw]
Subject: Re: readdir loses renamed files

>So if I lose a file, how many times should I scan the directory again
>to know if it's really gone? And is it really worth the extra overhead

Maybe the use of stat() will show whether a file really exists.

>The test program that I had showed that the next scan didn't
>necessarily return it either. The file was sometimes lost for as long
>as 5 scans.

Unrelated to this issue, I had a look into reiser3 for some other project of
mine. What I've found was that upon renaming() a file, the old entry is marked
invisble first, and then the new one is marked visible.
You would need to meet a lot of conditions to have a file lost (IMO):
- using reiser3
- reiserfs_rename() is suspended for as long as 5 scans
(only happens either on SMP or UP+preempt)
- reiserfs_rename() hangs... somwhat, because while(i<5){while(readdir(...)){}}
usually takes longer than a rename


Jan Engelhardt
--
Gesellschaft f?r Wissenschaftliche Datenverarbeitung
Am Fassberg, 37077 G?ttingen, http://www.gwdg.de

2004-10-25 13:23:09

by Timo Sirainen

[permalink] [raw]
Subject: Re: readdir loses renamed files

On 25.10.2004, at 15:37, Theodore Ts'o wrote:

> This is not a bug; the POSIX specification explicitly allows this
> behavior. If a filename is renamed during a readdir() session of a
> directory, it is undefined where that neither, either, or both of the
> new and old filenames will be returned.

BTW. Would be nice if this was mentioned in readdir(3) manual page.
UNIX98 specs also didn't mention rename specifically, and I don't know
of other freely available specs.

> And that's because there's no good way to do this without trashing the
> performance of the system, especially when most applications don't
> care. (Do you really want your entire system running significantly
> slower, penalizing all other applications on your system, just because
> of one stupid/badly-written application?) In order to do this, the
> kernel would have to atomically snapshot the directory --- even one
> which might be several megabytes in length, and store a copy of it in
> memory, while the application calls readdir(). Several processes
> could perform a denial-of-service attack by starting to call
> readdir(), and then stopping. This would end up locking up huge
> amounts of non-pageable system memory, and cause the system to come
> down in a hurry.

That would be a generic kernel solution for it, but it's not what I'm
asking.

Only thing needed not to lose the files would be that renamed files
appeared always at the end of the readdir() list, or at the same
location where the original file was. But apparently that's not how
filesystems nowadays implement it, and probably not very simple to
change to work that way.


Attachments:
PGP.sig (186.00 B)
This is a digitally signed message part

2004-10-28 09:37:43

by Matthias Andree

[permalink] [raw]
Subject: Re: readdir loses renamed files

On Mon, 25 Oct 2004, Theodore Ts'o wrote:

> And that's because there's no good way to do this without trashing the
> performance of the system, especially when most applications don't
> care. (Do you really want your entire system running significantly
> slower, penalizing all other applications on your system, just because
> of one stupid/badly-written application?)

Please - is it really necessary that application writers are offended in
this way? Timo is investing enormous time and effort in writing a *good*
application, and he's effectively seeking a way to *robustly* deal with
Maildir format mail storage. Please leave it at "readdir/getdents don't
work the way you expect and cannot for this and that reason."

Timo tries to implement a *robust* Maildir reader and has just bumped
into the flaws of DJB's "no-locking" store.

Yes, it's a mail server again that poses file system questions on this
list; only it's IMAP this time rather than SMTP and directory
synchronous I/O...

> In order to do this, the
> kernel would have to atomically snapshot the directory --- even one
> which might be several megabytes in length, and store a copy of it in
> memory, while the application calls readdir(). Several processes
> could perform a denial-of-service attack by starting to call
> readdir(), and then stopping. This would end up locking up huge
> amounts of non-pageable system memory, and cause the system to come
> down in a hurry.

I'd like to question that.

Just some rough thoughts:

1. the number of open file handles (including directories seen as files
for a moment at least) is limited per process, and I'd think the
number of directories open can be lower

2. versioned information might provide what the application wants
without locking up the system

3. the application could be offered an interface for atomic directory
reads that requires the application to provide sufficient memory in a
single contiguous buffer (making it thread-safe in the same go).

--
Matthias Andree

2004-10-28 11:51:36

by Andreas Dilger

[permalink] [raw]
Subject: Re: readdir loses renamed files

On Oct 28, 2004 11:34 +0200, Matthias Andree wrote:
> On Mon, 25 Oct 2004, Theodore Ts'o wrote:
> > And that's because there's no good way to do this without trashing the
> > performance of the system, especially when most applications don't
> > care. (Do you really want your entire system running significantly
> > slower, penalizing all other applications on your system, just because
> > of one stupid/badly-written application?)
>
> Please - is it really necessary that application writers are offended in
> this way? Timo is investing enormous time and effort in writing a *good*
> application, and he's effectively seeking a way to *robustly* deal with
> Maildir format mail storage. Please leave it at "readdir/getdents don't
> work the way you expect and cannot for this and that reason."
>
> Timo tries to implement a *robust* Maildir reader and has just bumped
> into the flaws of DJB's "no-locking" store.
>
> Yes, it's a mail server again that poses file system questions on this
> list; only it's IMAP this time rather than SMTP and directory
> synchronous I/O...

I read over in reiserfs-list that the reason for the crazy renaming is
to store "attributes" as part of the filename. Why not just store them
as EAs as they were intended? With the large inode patches (posted here
a couple of times already) the cost of storing EAs is negligible.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://members.shaw.ca/adilger/ http://members.shaw.ca/golinux/

2004-10-28 14:36:44

by Jan Engelhardt

[permalink] [raw]
Subject: Re: readdir loses renamed files


>I read over in reiserfs-list that the reason for the crazy renaming is
>to store "attributes" as part of the filename. Why not just store them
>as EAs as they were intended? With the large inode patches (posted here
>a couple of times already) the cost of storing EAs is negligible.

Well, reiser stores attributes (at least ACLs) in files.


Jan Engelhardt
--
Gesellschaft für Wissenschaftliche Datenverarbeitung
Am Fassberg, 37077 Göttingen, http://www.gwdg.de

2004-10-28 15:52:45

by Matthias Andree

[permalink] [raw]
Subject: Re: readdir loses renamed files

On Thu, 28 Oct 2004, Andreas Dilger wrote:

> I read over in reiserfs-list that the reason for the crazy renaming is
> to store "attributes" as part of the filename. Why not just store them
> as EAs as they were intended? With the large inode patches (posted here
> a couple of times already) the cost of storing EAs is negligible.

The "attributes" stored are really mail flags such as "read", "replied
to", the size and so on. Not sure if it makes sense storing these as
extended attributes, or, the other way around, are EAs supposed to be
some "associated" generic file that can be attached to an existing file?

At any rate, the resulting software would no longer be able to call
its backing storage to be in "Maildir" format.

I know AmigaOS had a limited amount of space (some dozen characters) for
a generic file comment, but otherwise.

--
Matthias Andree

2004-10-28 17:12:23

by Theodore Ts'o

[permalink] [raw]
Subject: Re: readdir loses renamed files

On Thu, Oct 28, 2004 at 11:34:26AM +0200, Matthias Andree wrote:
> Please - is it really necessary that application writers are offended in
> this way? Timo is investing enormous time and effort in writing a *good*
> application, and he's effectively seeking a way to *robustly* deal with
> Maildir format mail storage. Please leave it at "readdir/getdents don't
> work the way you expect and cannot for this and that reason."
>
> Timo tries to implement a *robust* Maildir reader and has just bumped
> into the flaws of DJB's "no-locking" store.

That's true, I should also have included badly-/stupidly- designed
mailstore architectures in the list of possibilities. :-)

Stepping back for a moment, do you really need such semantics? After
all, when you're dealing with Maildir, even if you're not dealing with
rename(), you still have to worry about programs deleting or inserting
(or moving between Mailboxes) messages out from under you while you
are doing the readdir() scan.

Since by definition Maildir is lockless, it is expected that
applications be able to deal with such changes. If they can't, either
the design of Maildir is busted (and I have my own opinions of DJB's
designs, which aren't worth going into here) or the application is
busted. Take your pick.

> Just some rough thoughts:
>
> 1. the number of open file handles (including directories seen as files
> for a moment at least) is limited per process, and I'd think the
> number of directories open can be lower

But directory sizes are unlimited --- they could conceivably be
hundreds of megabytes, and so it's not reasonable to require the
kernel to do the snapshot using unpageable kernel memory.

> 2. versioned information might provide what the application wants
> without locking up the system

Not given the POSIX readdir/opendir interface!

(And if we have the freedom to redesign the readdir POSIX interface,
there are plenty of other changes I'd make along the way. Nuking
telldir and seekdir would be near the top of the list. If you want
an example of truly brain-damaged design, just take telldir and
seekdir... please!)

> 3. the application could be offered an interface for atomic directory
> reads that requires the application to provide sufficient memory in a
> single contiguous buffer (making it thread-safe in the same go).

Actually, you can do this today, if you use the underlying
sys_getdents64 system call. But the application would have to
allocate potentially a very large amount of userspace memory.

- Ted

2004-10-28 19:05:02

by Bernd Eckenfels

[permalink] [raw]
Subject: Re: readdir loses renamed files

In article <[email protected]> you wrote:
> But directory sizes are unlimited --- they could conceivably be
> hundreds of megabytes, and so it's not reasonable to require the
> kernel to do the snapshot using unpageable kernel memory.

Well, I guess that what COW or Versioning is for. Another option would be a
optimistic locking readdir alternative (or usage of fam-like events in
addition).

Gruss
Bernd

2004-10-29 21:20:06

by Hans Reiser

[permalink] [raw]
Subject: Re: readdir loses renamed files

Andreas Dilger wrote:

>On Oct 28, 2004 11:34 +0200, Matthias Andree wrote:
>
>
>>On Mon, 25 Oct 2004, Theodore Ts'o wrote:
>>
>>
>>>And that's because there's no good way to do this without trashing the
>>>performance of the system, especially when most applications don't
>>>care. (Do you really want your entire system running significantly
>>>slower, penalizing all other applications on your system, just because
>>>of one stupid/badly-written application?)
>>>
>>>
>>Please - is it really necessary that application writers are offended in
>>this way? Timo is investing enormous time and effort in writing a *good*
>>application, and he's effectively seeking a way to *robustly* deal with
>>Maildir format mail storage. Please leave it at "readdir/getdents don't
>>work the way you expect and cannot for this and that reason."
>>
>>Timo tries to implement a *robust* Maildir reader and has just bumped
>>into the flaws of DJB's "no-locking" store.
>>
>>Yes, it's a mail server again that poses file system questions on this
>>list; only it's IMAP this time rather than SMTP and directory
>>synchronous I/O...
>>
>>
Matthias is right. readdir is badly architected, and no one has fixed
it for ~30 years.

It should be possible to perform an atomic readdir if that is what you
want to do and if you have space in your process to stuff the result.

Hans

2004-10-29 21:47:27

by Jan Engelhardt

[permalink] [raw]
Subject: Re: readdir loses renamed files


>Matthias is right. readdir is badly architected, and no one has fixed
>it for ~30 years.

As long as M$ windows has the same problem, it's justified that we have that
problem for 30 years now.

>It should be possible to perform an atomic readdir if that is what you
>want to do and if you have space in your process to stuff the result.

How much would it cost to always append the new name into the directory rather
than modifying it in place? OTOH, especially Reiserfs does not use linear file
lists, so it would get tricky.

2004-10-30 19:13:18

by Hans Reiser

[permalink] [raw]
Subject: Re: readdir loses renamed files

Jan Engelhardt wrote:

>>Matthias is right. readdir is badly architected, and no one has fixed
>>it for ~30 years.
>>
>>
>
>As long as M$ windows has the same problem, it's justified that we have that
>problem for 30 years now.
>
>
>
>>It should be possible to perform an atomic readdir if that is what you
>>want to do and if you have space in your process to stuff the result.
>>
>>
>
>How much would it cost to always append the new name into the directory rather
>than modifying it in place?
>
Forgive me, what does the sentence above mean? Paste it out of order?

Better to fix the API.

> OTOH, especially Reiserfs does not use linear file
>lists, so it would get tricky.
>
>
>
>
>
We use sorted directories.

2004-10-31 06:32:32

by Jan Engelhardt

[permalink] [raw]
Subject: Re: readdir loses renamed files

>>>It should be possible to perform an atomic readdir if that is what you
>>>want to do and if you have space in your process to stuff the result.
>>
>>How much would it cost to always append the new name into the directory rather
>>than modifying it in place?
>
>Forgive me, what does the sentence above mean? Paste it out of order?

As I have read from earlier replies, ext2/3 replaces a filename with the new
one, given that it is the same length or shorter, and especially that might
skip a while when readdir()ing.
So I was concerned about the speed impact which would arise, if the filename
was never modified in-place but always appended as a new object to the
end-of-directory.



Jan Engelhardt
--
Gesellschaft für Wissenschaftliche Datenverarbeitung
Am Fassberg, 37077 Göttingen, http://www.gwdg.de

2004-11-01 05:38:38

by Hans Reiser

[permalink] [raw]
Subject: Re: readdir loses renamed files

Jan Engelhardt wrote:

>>>>It should be possible to perform an atomic readdir if that is what you
>>>>want to do and if you have space in your process to stuff the result.
>>>>
>>>>
>>>How much would it cost to always append the new name into the directory rather
>>>than modifying it in place?
>>>
>>>
>>Forgive me, what does the sentence above mean? Paste it out of order?
>>
>>
>
>As I have read from earlier replies, ext2/3 replaces a filename with the new
>one, given that it is the same length or shorter, and especially that might
>skip a while when readdir()ing.
>So I was concerned about the speed impact which would arise, if the filename
>was never modified in-place but always appended as a new object to the
>end-of-directory.
>
>
>
>Jan Engelhardt
>
>
The api is fundamentally broken. Sorry. Will try to avoid making that
mistake in sys_reiser4.