2005-10-19 15:31:45

by Xin Zhao

[permalink] [raw]
Subject: Is ext3 flush data to disk when files are closed?

As far as I know, if an application modifies a file on an ext3 file
ssytem, it actually change the page cache, and the dirty pages will be
flushed to disk by kupdate periodically.

My question is: if a file is to be closed, but some of its data pages
are marked as dirty, will system block on close() and wait for dirty
pages being flushed to disk? If so, it seems to decrease performance
significantly if a lot of updates on many small files are involved.

Can someone point me to the right place to check how it works? Thanks!

Xin


2005-10-19 15:48:35

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: Is ext3 flush data to disk when files are closed?


On Wed, 19 Oct 2005, Xin Zhao wrote:

> As far as I know, if an application modifies a file on an ext3 file
> ssytem, it actually change the page cache, and the dirty pages will be
> flushed to disk by kupdate periodically.
>
> My question is: if a file is to be closed, but some of its data pages
> are marked as dirty, will system block on close() and wait for dirty
> pages being flushed to disk? If so, it seems to decrease performance
> significantly if a lot of updates on many small files are involved.
>
> Can someone point me to the right place to check how it works? Thanks!
>
> Xin

In principle, if you open a file, write to it, close it, have
somebody else open it, read it, close it, then delete it, it
probably will never touch a physical disk. This is the basic
way a VFS (virtual file system) works. The system maintains a
RAM Disk that overflows to the physical media.

Given that, there are various ways to provoke the system into
writing data to the disk(s), such as executing `sync`. However,
normally file-data are written when the kernel needs to free
up some memory or when the disk(s) are un-mounted.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.13.4 on an i686 machine (5589.55 BogoMips).
Warning : 98.36% of all statistics are fiction.
.

****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2005-10-19 15:51:09

by Badari Pulavarty

[permalink] [raw]
Subject: Re: Is ext3 flush data to disk when files are closed?

On Wed, 2005-10-19 at 11:31 -0400, Xin Zhao wrote:
> As far as I know, if an application modifies a file on an ext3 file
> ssytem, it actually change the page cache, and the dirty pages will be
> flushed to disk by kupdate periodically.
>
> My question is: if a file is to be closed, but some of its data pages
> are marked as dirty, will system block on close() and wait for dirty
> pages being flushed to disk? If so, it seems to decrease performance
> significantly if a lot of updates on many small files are involved.
>
> Can someone point me to the right place to check how it works? Thanks!

On the last close() of the file, it should be flushed through ..

iput_final() -> generic_drop_inode() -> write_inode_now()
-> __writeback_single_inode() -> __sync_single_inode()
-> do_writepages()


Thanks,
Badari

2005-10-19 20:00:50

by Andrew Morton

[permalink] [raw]
Subject: Re: Is ext3 flush data to disk when files are closed?

Badari Pulavarty <[email protected]> wrote:
>
> On Wed, 2005-10-19 at 11:31 -0400, Xin Zhao wrote:
> > As far as I know, if an application modifies a file on an ext3 file
> > ssytem, it actually change the page cache, and the dirty pages will be
> > flushed to disk by kupdate periodically.
> >
> > My question is: if a file is to be closed, but some of its data pages
> > are marked as dirty, will system block on close() and wait for dirty
> > pages being flushed to disk? If so, it seems to decrease performance
> > significantly if a lot of updates on many small files are involved.
> >
> > Can someone point me to the right place to check how it works? Thanks!
>
> On the last close() of the file, it should be flushed through ..
>
> iput_final() -> generic_drop_inode() -> write_inode_now()
> -> __writeback_single_inode() -> __sync_single_inode()
> -> do_writepages()

The dcache's reference to the inode will prevent this from happening at
close() time.

2005-10-19 21:12:05

by Badari Pulavarty

[permalink] [raw]
Subject: Re: Is ext3 flush data to disk when files are closed?

On Wed, 2005-10-19 at 13:00 -0700, Andrew Morton wrote:
> Badari Pulavarty <[email protected]> wrote:
> >
> > On Wed, 2005-10-19 at 11:31 -0400, Xin Zhao wrote:
> > > As far as I know, if an application modifies a file on an ext3 file
> > > ssytem, it actually change the page cache, and the dirty pages will be
> > > flushed to disk by kupdate periodically.
> > >
> > > My question is: if a file is to be closed, but some of its data pages
> > > are marked as dirty, will system block on close() and wait for dirty
> > > pages being flushed to disk? If so, it seems to decrease performance
> > > significantly if a lot of updates on many small files are involved.
> > >
> > > Can someone point me to the right place to check how it works? Thanks!
> >
> > On the last close() of the file, it should be flushed through ..
> >
> > iput_final() -> generic_drop_inode() -> write_inode_now()
> > -> __writeback_single_inode() -> __sync_single_inode()
> > -> do_writepages()
>
> The dcache's reference to the inode will prevent this from happening at
> close() time.
>

I thought so too, till I wrote a kprobe/systemtap script to print
the callers of generic_forget_inode() earlier and saw that most
of my stacks are from exit() or close().

0xffffffff801a0222 : generic_drop_inode+0x2/0x170 []
0xffffffff8019eeb0 : iput+0x50/0x90 []
0xffffffff8019c7bb : dput+0x1db/0x220 []
0xffffffff80184461 : __fput+0x171/0x1e0 []
0xffffffff801829ce : filp_close+0x6e/0x90 []
0xffffffff801388eb : put_files_struct+0x6b/0xc0 []
0xffffffff801392ef : do_exit+0x1ff/0xbb0 []



0xffffffff801a0222 : generic_drop_inode+0x2/0x170 []
0xffffffff8019eeb0 : iput+0x50/0x90 []
0xffffffff8019c7bb : dput+0x1db/0x220 []
0xffffffff80184461 : __fput+0x171/0x1e0 []
0xffffffff801829ce : filp_close+0x6e/0x90 []
0xffffffff80182a90 : sys_close+0xa0/0xd0 []
0xffffffff8010dbc2 : system_call+0x1a/0x83 []


Thanks,
Badari

2005-10-19 22:09:30

by Andrew Morton

[permalink] [raw]
Subject: Re: Is ext3 flush data to disk when files are closed?

Badari Pulavarty <[email protected]> wrote:
>
> On Wed, 2005-10-19 at 13:00 -0700, Andrew Morton wrote:
> > Badari Pulavarty <[email protected]> wrote:
> > >
> > > On Wed, 2005-10-19 at 11:31 -0400, Xin Zhao wrote:
> > > > As far as I know, if an application modifies a file on an ext3 file
> > > > ssytem, it actually change the page cache, and the dirty pages will be
> > > > flushed to disk by kupdate periodically.
> > > >
> > > > My question is: if a file is to be closed, but some of its data pages
> > > > are marked as dirty, will system block on close() and wait for dirty
> > > > pages being flushed to disk? If so, it seems to decrease performance
> > > > significantly if a lot of updates on many small files are involved.
> > > >
> > > > Can someone point me to the right place to check how it works? Thanks!
> > >
> > > On the last close() of the file, it should be flushed through ..
> > >
> > > iput_final() -> generic_drop_inode() -> write_inode_now()
> > > -> __writeback_single_inode() -> __sync_single_inode()
> > > -> do_writepages()
> >
> > The dcache's reference to the inode will prevent this from happening at
> > close() time.
> >
>
> I thought so too, till I wrote a kprobe/systemtap script to print
> the callers of generic_forget_inode() earlier and saw that most
> of my stacks are from exit() or close().
>
> 0xffffffff801a0222 : generic_drop_inode+0x2/0x170 []
> 0xffffffff8019eeb0 : iput+0x50/0x90 []
> 0xffffffff8019c7bb : dput+0x1db/0x220 []
> 0xffffffff80184461 : __fput+0x171/0x1e0 []
> 0xffffffff801829ce : filp_close+0x6e/0x90 []
> 0xffffffff801388eb : put_files_struct+0x6b/0xc0 []
> 0xffffffff801392ef : do_exit+0x1ff/0xbb0 []
>

But generic_forget_inode usually doesn't dispose of the inode.

if (!hlist_unhashed(&inode->i_hash)) {
if (!(inode->i_state & (I_DIRTY|I_LOCK)))
list_move(&inode->i_list, &inode_unused);
inodes_stat.nr_unused++;
if (!sb || (sb->s_flags & MS_ACTIVE)) {
spin_unlock(&inode_lock);
return;
}

2005-10-19 22:17:21

by Badari Pulavarty

[permalink] [raw]
Subject: Re: Is ext3 flush data to disk when files are closed?

On Wed, 2005-10-19 at 15:09 -0700, Andrew Morton wrote:
> Badari Pulavarty <[email protected]> wrote:
> >
> > On Wed, 2005-10-19 at 13:00 -0700, Andrew Morton wrote:
> > > Badari Pulavarty <[email protected]> wrote:
> > > >
> > > > On Wed, 2005-10-19 at 11:31 -0400, Xin Zhao wrote:
> > > > > As far as I know, if an application modifies a file on an ext3 file
> > > > > ssytem, it actually change the page cache, and the dirty pages will be
> > > > > flushed to disk by kupdate periodically.
> > > > >
> > > > > My question is: if a file is to be closed, but some of its data pages
> > > > > are marked as dirty, will system block on close() and wait for dirty
> > > > > pages being flushed to disk? If so, it seems to decrease performance
> > > > > significantly if a lot of updates on many small files are involved.
> > > > >
> > > > > Can someone point me to the right place to check how it works? Thanks!
> > > >
> > > > On the last close() of the file, it should be flushed through ..
> > > >
> > > > iput_final() -> generic_drop_inode() -> write_inode_now()
> > > > -> __writeback_single_inode() -> __sync_single_inode()
> > > > -> do_writepages()
> > >
> > > The dcache's reference to the inode will prevent this from happening at
> > > close() time.
> > >
> >
> > I thought so too, till I wrote a kprobe/systemtap script to print
> > the callers of generic_forget_inode() earlier and saw that most
> > of my stacks are from exit() or close().
> >
> > 0xffffffff801a0222 : generic_drop_inode+0x2/0x170 []
> > 0xffffffff8019eeb0 : iput+0x50/0x90 []
> > 0xffffffff8019c7bb : dput+0x1db/0x220 []
> > 0xffffffff80184461 : __fput+0x171/0x1e0 []
> > 0xffffffff801829ce : filp_close+0x6e/0x90 []
> > 0xffffffff801388eb : put_files_struct+0x6b/0xc0 []
> > 0xffffffff801392ef : do_exit+0x1ff/0xbb0 []
> >
>
> But generic_forget_inode usually doesn't dispose of the inode.
>
> if (!hlist_unhashed(&inode->i_hash)) {
> if (!(inode->i_state & (I_DIRTY|I_LOCK)))
> list_move(&inode->i_list, &inode_unused);
> inodes_stat.nr_unused++;
> if (!sb || (sb->s_flags & MS_ACTIVE)) {
> spin_unlock(&inode_lock);
> return;
> }

Okay, makes sense.

Thanks,
Badari

2005-10-21 04:12:57

by Andy Isaacson

[permalink] [raw]
Subject: Re: Is ext3 flush data to disk when files are closed?

On Wed, Oct 19, 2005 at 11:48:32AM -0400, linux-os (Dick Johnson) wrote:
[snip true statements about the observable behavior of the linux buffer
cache]
> This is the basic way a VFS (virtual file system) works.

Dear Wrongbot,

The 'V' in VFS has *nothing* to do with the buffer caching strategy.
The VFS is virtual because it provides a common API for many
filesystems, not because of how the buffer cache (which isn't even a
part of the VFS!) manages dirty buffers.

> The system maintains a RAM Disk that overflows to the physical media.

That's so wrong it's hard to even know where to start correcting.
(Sometimes I wonder why I bother.)

Of course "RAM Disk" implies something to do with CONFIG_BLK_DEV_RAM,
which is a complete red herring WRT the buffer cache. The buffer cache
doesn't "overflow" -- rather, it is a cache of buffered data which is
held in memory temporarily with the hope that we can avoid some number
of IO operations through coalescing writes and satisfying reads from
cached data.

> Given that, there are various ways to provoke the system into
> writing data to the disk(s), such as executing `sync`. However,
> normally file-data are written when the kernel needs to free
> up some memory or when the disk(s) are un-mounted.

You left out the very important role of pdflush. When dirty pages reach
a certain age*, pdflush causes them to be written out to disk even
though the kernel doesn't "need to free up some memory" and the
filesystem hasn't been unmounted.

[*] or the watermarks trigger due to there being too many dirty pages.

-andy