2008-05-26 07:05:39

by Andrew Morton

[permalink] [raw]
Subject: Re: [Bug 421482] Firefox 3 uses fsync excessively


This:

On Sun, 25 May 2008 22:13:10 -0700 [email protected] wrote:

> Do not reply to this email. You can add comments to this bug at
> https://bugzilla.mozilla.org/show_bug.cgi?id=421482
>
>
>
>
>
> --- Comment #152 from Karl Tomlinson (:karlt) <[email protected]> 2008-05-25 22:12:23 PDT ---
> Created an attachment (id=322475)
> --> (https://bugzilla.mozilla.org/attachment.cgi?id=322475)
> fdatasync/sync_file_range test program
>
> fdatasync/sync_file_range test program
>
> This first creates a file of length 1 then does one fsync on the new file.
> Then the file is continually modified without changing the length and synced
> after each modification using one of three methods (somewhat randomly
> selected): fsync/fdatasync/sync_file_range.
>
> The I/O load for the test results below was produced using dd with a small
> blocksize to limit the I/O some:
>
> dd if=/dev/zero of=large bs=64 count=$((3*1024*1024*1024/64))
>
> I used ltrace instead of strace as my strace didn't find sync_file_range (and
> my glibc-2.5 libraries don't seem to have a sync_file_range function), so
> sync_file_range appears below as "syscall(277".
>
> rm -f datasync-test.tmp &&
> ltrace -t -T -e trace=,fsync,fdatasync,syscall ./a.out
>
> 16:12:59 fsync(3) = 0 <11.864858>
> 16:13:13 fdatasync(3) = 0 <14.706356>
> 16:13:30 fsync(3) = 0 <12.832373>
> 16:13:45 syscall(277, 3, 0, 1, 7) = 0 <0.343116>
> 16:13:49 fdatasync(3) = 0 <8.231468>
> 16:14:01 syscall(277, 3, 0, 1, 7) = 0 <2.347144>
> 16:14:06 fsync(3) = 0 <6.938656>
> 16:14:16 fdatasync(3) = 0 <8.359644>
> 16:14:27 fsync(3) = 0 <5.928242>
> 16:14:35 syscall(277, 3, 0, 1, 7) = 0 <0.009531>
> 16:14:39 fdatasync(3) = 0 <7.356126>
> 16:14:50 fsync(3) = 0 <6.402128>
> 16:14:59 syscall(277, 3, 0, 1, 7) = 0 <0.802706>
> 16:15:03 syscall(277, 3, 0, 1, 7) = 0 <2.985404>
> 16:15:08 fsync(3) = 0 <4.722020>
> 16:15:15 fdatasync(3) = 0 <6.532945>
> 16:15:24 fdatasync(3) = 0 <2.294488>
> 16:15:30 fsync(3) = 0 <7.986250>
> 16:15:40 syscall(277, 3, 0, 1, 7) = 0 <1.409809>
> 16:15:45 fdatasync(3) = 0 <5.404190>
>
> The results are consistent with fdatasync being implemented as fsync on ext3.
>
> They show the potential for considerable savings from growing (and shrinking)
> files in large hunks and using sync_file_range (which also should reduce the
> impact on the rest of the filesystem).

is wrong, isn't it?

It's purportedly showing that fdatasync() on ext3 is syncing the whole
world in fsync()-fashion even with an application which does not grow
the file size.

But fdatasync() shouldn't do that. Even if the inode is dirty from
atime or mtime updates, that shouldn't cause fdatasync() to run an
ext3 commit?



2008-05-26 10:07:51

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [Bug 421482] Firefox 3 uses fsync excessively

On Mon, May 26, 2008 at 12:05:06AM -0700, Andrew Morton wrote:
> It's purportedly showing that fdatasync() on ext3 is syncing the whole
> world in fsync()-fashion even with an application which does not grow
> the file size.
>
> But fdatasync() shouldn't do that. Even if the inode is dirty from
> atime or mtime updates, that shouldn't cause fdatasync() to run an
> ext3 commit?

Well, ideally it shouldn't, although POSIX allows fdatasync() to be
implemented in terms of fsync(). It is at the moment. :-/

The problem is we don't currently have a way of distinguishing between
a "smudged" inode (only the mtime/atime has changed) and a "dirty"
inode (even if the number of blocks hasn't changed, if i_size has
changed, or i_mode, or anything else, including extended attributes
inline in the inode). We're not tracking that difference. If we only
allow mtime/atime changes through setattr (see Cristoph's patches),
and don't set the VFS dirty bit, but our own "smudged" bit, we could
do it --- but at the moment, we're not.

- Ted

2008-05-26 11:10:16

by Jörn Engel

[permalink] [raw]
Subject: Re: [Bug 421482] Firefox 3 uses fsync excessively

On Mon, 26 May 2008 06:07:51 -0400, Theodore Tso wrote:
> On Mon, May 26, 2008 at 12:05:06AM -0700, Andrew Morton wrote:
> > It's purportedly showing that fdatasync() on ext3 is syncing the whole
> > world in fsync()-fashion even with an application which does not grow
> > the file size.
> >
> > But fdatasync() shouldn't do that. Even if the inode is dirty from
> > atime or mtime updates, that shouldn't cause fdatasync() to run an
> > ext3 commit?
>
> Well, ideally it shouldn't, although POSIX allows fdatasync() to be
> implemented in terms of fsync(). It is at the moment. :-/
>
> The problem is we don't currently have a way of distinguishing between
> a "smudged" inode (only the mtime/atime has changed) and a "dirty"
> inode (even if the number of blocks hasn't changed, if i_size has
> changed, or i_mode, or anything else, including extended attributes
> inline in the inode). We're not tracking that difference. If we only
> allow mtime/atime changes through setattr (see Cristoph's patches),
> and don't set the VFS dirty bit, but our own "smudged" bit, we could
> do it --- but at the moment, we're not.

Don't we already have this bit since Linux 2.4.0-test12? I_DIRTY_SYNC
is admittedly not well-named for "smudged". But it used to mean just
that. I_DIRTY_DATASYNC was the real dirty bit. Which, in I_DIRTY_PAGES,
has been split into I_DIRTY_DATASYNC and I_DIRTY_PAGES.

Now we just have to use sane names.

Jörn

--
Don't worry about people stealing your ideas. If your ideas are any good,
you'll have to ram them down people's throats.
-- Howard Aiken quoted by Ken Iverson quoted by Jim Horning quoted by
Raph Levien, 1979

2008-05-26 11:39:18

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [Bug 421482] Firefox 3 uses fsync excessively

On Mon, May 26, 2008 at 01:10:16PM +0200, J?rn Engel wrote:
> Don't we already have this bit since Linux 2.4.0-test12? I_DIRTY_SYNC
> is admittedly not well-named for "smudged". But it used to mean just
> that. I_DIRTY_DATASYNC was the real dirty bit. Which, in I_DIRTY_PAGES,
> has been split into I_DIRTY_DATASYNC and I_DIRTY_PAGES.
>
> Now we just have to use sane names.

We're currently forcing a new commit if I_DIRTY_SYNC or
I_DIRTY_DATASYNC (but not necessarily I_DIRTY_PAGES) is set. If
I_DIRTY_SYNC really means "smudged" (I believe you but I'll want to go
through the code and prove it to myself :-), then this might be a very
easy fix. We'll need to make sure that unmount time we do actually
force out all inodes even if only I_DIRTY_SYNC is set.

(And then, we should rename things to more sane names. :-)

- Ted

2008-05-26 12:52:24

by Jörn Engel

[permalink] [raw]
Subject: Re: [Bug 421482] Firefox 3 uses fsync excessively

On Mon, 26 May 2008 07:38:46 -0400, Theodore Tso wrote:
>
> On Mon, May 26, 2008 at 01:10:16PM +0200, Jörn Engel wrote:
> > Don't we already have this bit since Linux 2.4.0-test12? I_DIRTY_SYNC
> > is admittedly not well-named for "smudged". But it used to mean just
> > that. I_DIRTY_DATASYNC was the real dirty bit. Which, in I_DIRTY_PAGES,
^^^^^^^^^^^^^
That should have been "2.4.0-prerelease".

> > has been split into I_DIRTY_DATASYNC and I_DIRTY_PAGES.
> >
> > Now we just have to use sane names.
>
> We're currently forcing a new commit if I_DIRTY_SYNC or
> I_DIRTY_DATASYNC (but not necessarily I_DIRTY_PAGES) is set. If
> I_DIRTY_SYNC really means "smudged" (I believe you but I'll want to go
> through the code and prove it to myself :-),

Proving it to yourself is good advice indeed. I'm sure it used to mean
"smudged" in 2.4.0 time. Whether any changes since have damaged that
property I haven't checked.

> then this might be a very
> easy fix. We'll need to make sure that unmount time we do actually
> force out all inodes even if only I_DIRTY_SYNC is set.
>
> (And then, we should rename things to more sane names. :-)

Jörn

--
Joern's library part 11:
http://www.unicom.com/pw/reply-to-harmful.html

2008-05-26 18:50:28

by Andrew Morton

[permalink] [raw]
Subject: Re: [Bug 421482] Firefox 3 uses fsync excessively

On Mon, 26 May 2008 06:07:51 -0400 Theodore Tso <[email protected]> wrote:

> On Mon, May 26, 2008 at 12:05:06AM -0700, Andrew Morton wrote:
> > It's purportedly showing that fdatasync() on ext3 is syncing the whole
> > world in fsync()-fashion even with an application which does not grow
> > the file size.
> >
> > But fdatasync() shouldn't do that. Even if the inode is dirty from
> > atime or mtime updates, that shouldn't cause fdatasync() to run an
> > ext3 commit?
>
> Well, ideally it shouldn't, although POSIX allows fdatasync() to be
> implemented in terms of fsync(). It is at the moment. :-/

Well..

> The problem is we don't currently have a way of distinguishing between
> a "smudged" inode (only the mtime/atime has changed) and a "dirty"
> inode (even if the number of blocks hasn't changed, if i_size has
> changed, or i_mode, or anything else, including extended attributes
> inline in the inode).

Who do you mena by "we"? ext3 or the kernel as a whole?

> We're not tracking that difference. If we only
> allow mtime/atime changes through setattr (see Cristoph's patches),
> and don't set the VFS dirty bit, but our own "smudged" bit, we could
> do it --- but at the moment, we're not.

But the VFS _does_ track these things, via the eternally
incomprehensible I_DIRTY_SYNC and I_DIRTY_DATASYNC.

We have:

if (datasync && !(inode->i_state & I_DIRTY_DATASYNC))
goto out;

which _should_ cause the fs to skip the commit during fdatasync() if
only mtime and ctime have changed?


2008-05-26 20:22:23

by Jamie Lokier

[permalink] [raw]
Subject: Re: [Bug 421482] Firefox 3 uses fsync excessively

J?rn Engel wrote:
> > We're currently forcing a new commit if I_DIRTY_SYNC or
> > I_DIRTY_DATASYNC (but not necessarily I_DIRTY_PAGES) is set. If
> > I_DIRTY_SYNC really means "smudged" (I believe you but I'll want to go
> > through the code and prove it to myself :-),
>
> Proving it to yourself is good advice indeed. I'm sure it used to mean
> "smudged" in 2.4.0 time. Whether any changes since have damaged that
> property I haven't checked.

I noticed fdatasync() doing a full fsync(), and had a look at those
flags a few kernels ago, to implement fdatasync(). I wasn't convinced
the flags were being used in that way, but now I don't remember why.

So, yes, do check what they mean _now_. And then, please, make us all
happy and implement fdatasync() :-)

Here's a thought for someone implementing fdatasync(). If a database
uses O_DIRECT writes (typically with aio), then wants data which it's
written to be committed to the hard disk platter, and the filesystem
is mounted "barrier=1" - should it call fdatasync()? Should that emit
the barrier? If another application uses normal (not O_DIRECT)
writes, and then _is delayed_ so long that kernel writeback occurs and
all cache is clean, and then calls fdatasync(), should that call emit
a barrier in that case? (Answers imho: yes and yes).

> > (And then, we should rename things to more sane names. :-)

Please, yes! The names made sense instinctively, until I looked at
the code then they didn't :-)

-- Jamie

2008-05-29 17:08:52

by Bryan Henderson

[permalink] [raw]
Subject: Re: fdatasync/barriers (was : [Bug 421482] Firefox 3 uses fsync excessively)

> Here's a thought for someone implementing fdatasync(). If a database
> uses O_DIRECT writes (typically with aio), then wants data which it's
> written to be committed to the hard disk platter, and the filesystem
> is mounted "barrier=1" - should it call fdatasync()? Should that emit
> the barrier? If another application uses normal (not O_DIRECT)
> writes, and then _is delayed_ so long that kernel writeback occurs and
> all cache is clean, and then calls fdatasync(), should that call emit
> a barrier in that case? (Answers imho: yes and yes).

I don't get it. What would be the value of emitting the barrier?

--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems