2003-09-11 17:33:38

by Mike Fedyk

[permalink] [raw]
Subject: Re: Status of fsync() wrt mail servers

On Thu, Sep 11, 2003 at 02:33:25PM +0200, Matthias Andree wrote:
> Does reiserfs3.6 support dirsync? I thought it was ext3-specific until
> now.
>

That was what I was asking too.

> Please take care to distinguish (file) meta data from directory data.
>

Hmm, it seems to me, that all meta-data relating to the file fsync() was
called on should be sent to the disk and waited for by the call.

> Basically, what we know is that with Linux 2.4, ext3fs, reiserfs and XFS
> will flush all pending transactions (per file system) that were
> requested prior to a synchronous operation (fsync, sync, umount, ...)
> out to disk.
>
> This can heftily bite your back if you're running your MTA's queue on a
> large file system that has other sustained write load (logging, data
> bases, ...).
>
> I recently helped one qmail user debug this; the symptom was that the
> first mail in a burst of mails took 2 seconds to queue, subsequent mails
> were queued much quicker (70 ms). He was using ext3fs, and had one huge
> / (root) file system and so the "synch the whole file system" behaviour
> made his qmail-queue synch *all* his dirty blocks to disk...

Can you be sure the MTA wasn't calling sync() just to be sure (Many MTAs are
funny in that they think the spool is on a seperate disk and filesystem).
fsync() shouldn't be flushing anything not relating to the file it was
called on (that includes directory entries related to the file also IMHO).

Also, if the MTA wasn't running as root, it shouldn't be able to make sync()
affect the entire system. Is there anything that says that sync() can't
just flush the user's buffers unless you're running as root or with some
CAP_ capability?

Mike


2003-09-12 00:22:43

by Matthias Andree

[permalink] [raw]
Subject: Re: Status of fsync() wrt mail servers

Mike Fedyk <[email protected]> writes:

>> I recently helped one qmail user debug this; the symptom was that the
>> first mail in a burst of mails took 2 seconds to queue, subsequent mails
>> were queued much quicker (70 ms). He was using ext3fs, and had one huge
>> / (root) file system and so the "synch the whole file system" behaviour
>> made his qmail-queue synch *all* his dirty blocks to disk...
>
> Can you be sure the MTA wasn't calling sync() just to be sure (Many MTAs are
> funny in that they think the spool is on a seperate disk and
> filesystem).

For qmail and Postfix I can be. sync(8) isn't remotely useful, because
it's allowed to return before completion.

> fsync() shouldn't be flushing anything not relating to the file it was
> called on (that includes directory entries related to the file also
> IMHO).

It "should", but current implementations on Linux do exactly that: flush
everything. Maybe you've got better luck with BSD softupdates, but
that's going to munch disk I/O big time next time you reboot after a
crash: fsck needed. It runs niced in the background so the machine boots
up, but the box won't satisfy higher I/O demands. Looks like a "ex
duobus malis" game.

> Also, if the MTA wasn't running as root, it shouldn't be able to make sync()
> affect the entire system.

I'd like to see your plans that prevent DoS by local users...

One machine's light load is another one's DoS attack.

> Is there anything that says that sync() can't just flush the user's
> buffers unless you're running as root or with some CAP_ capability?

Does the kernel track "whose dirty buffer is this" (uid_t) at all?

--
Matthias Andree

Encrypt your mail: my GnuPG key ID is 0x052E7D95