2007-08-01 20:45:53

by Miklos Szeredi

[permalink] [raw]
Subject: kupdate weirdness

The following strange behavior can be observed:

1. large file is written
2. after 30 seconds, nr_dirty goes down by 1024
3. then for some time (< 30 sec) nothing happens (disk idle)
4. then nr_dirty again goes down by 1024
5. repeat from 3. until whole file is written

So basically a 4Mbyte chunk of the file is written every 30 seconds.
I'm quite sure this is not the intended behavior.

The reason seems to be that __sync_single_inode() will move the
partially written inode from s_io onto s_dirty, and sync_sb_inode()
will not splice it back onto s_io until the rest of the inodes on s_io
has been processed.

Since there will probably be a recently dirtied inode on s_io, this
will take some of time, but always less than 30 sec.

I don't know what's the easiest solution.

Any ideas?

Miklos


2007-08-01 21:15:48

by Andrew Morton

[permalink] [raw]
Subject: Re: kupdate weirdness

On Wed, 01 Aug 2007 22:45:16 +0200
Miklos Szeredi <[email protected]> wrote:

> The following strange behavior can be observed:
>
> 1. large file is written
> 2. after 30 seconds, nr_dirty goes down by 1024
> 3. then for some time (< 30 sec) nothing happens (disk idle)
> 4. then nr_dirty again goes down by 1024
> 5. repeat from 3. until whole file is written
>
> So basically a 4Mbyte chunk of the file is written every 30 seconds.
> I'm quite sure this is not the intended behavior.
>
> The reason seems to be that __sync_single_inode() will move the
> partially written inode from s_io onto s_dirty, and sync_sb_inode()
> will not splice it back onto s_io until the rest of the inodes on s_io
> has been processed.

It does all sorts of weird crap.

> Since there will probably be a recently dirtied inode on s_io, this
> will take some of time, but always less than 30 sec.
>
> I don't know what's the easiest solution.
>
> Any ideas?

Try 2.6.23-rc1-mm2.

2007-08-02 01:53:50

by David Chinner

[permalink] [raw]
Subject: Re: kupdate weirdness

On Wed, Aug 01, 2007 at 10:45:16PM +0200, Miklos Szeredi wrote:
> The following strange behavior can be observed:
>
> 1. large file is written
> 2. after 30 seconds, nr_dirty goes down by 1024
> 3. then for some time (< 30 sec) nothing happens (disk idle)
> 4. then nr_dirty again goes down by 1024
> 5. repeat from 3. until whole file is written
>
> So basically a 4Mbyte chunk of the file is written every 30 seconds.
> I'm quite sure this is not the intended behavior.
>
> The reason seems to be that __sync_single_inode() will move the
> partially written inode from s_io onto s_dirty, and sync_sb_inode()
> will not splice it back onto s_io until the rest of the inodes on s_io
> has been processed.

It's been doing this for a long time.

http://marc.info/?l=linux-kernel&m=113919849421679&w=2

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

2007-08-02 15:53:15

by Miklos Szeredi

[permalink] [raw]
Subject: Re: kupdate weirdness

> > The following strange behavior can be observed:
> >
> > 1. large file is written
> > 2. after 30 seconds, nr_dirty goes down by 1024
> > 3. then for some time (< 30 sec) nothing happens (disk idle)
> > 4. then nr_dirty again goes down by 1024
> > 5. repeat from 3. until whole file is written
> >
> > So basically a 4Mbyte chunk of the file is written every 30 seconds.
> > I'm quite sure this is not the intended behavior.
> >
> > The reason seems to be that __sync_single_inode() will move the
> > partially written inode from s_io onto s_dirty, and sync_sb_inode()
> > will not splice it back onto s_io until the rest of the inodes on s_io
> > has been processed.
>
> It does all sorts of weird crap.
>
> > Since there will probably be a recently dirtied inode on s_io, this
> > will take some of time, but always less than 30 sec.
> >
> > I don't know what's the easiest solution.
> >
> > Any ideas?
>
> Try 2.6.23-rc1-mm2.

Much better, but still not perfect.

Now it writes out 1024 pages after 30 seconds and then the rest after
another 30s.

If my analysis is correct, this is because when it first gets onto
s_io other inodes will get there too (with up-to 30s later dirying
time), and the contents of s_more_io won't be recycled until the
current contents of s_io are processed.

Maybe this is OK, the previous weird stuff didn't seem to bother a lot
of people either.

Miklos

2007-08-02 19:18:54

by Andrew Morton

[permalink] [raw]
Subject: Re: kupdate weirdness

On Thu, 02 Aug 2007 17:52:39 +0200
Miklos Szeredi <[email protected]> wrote:

> > > The following strange behavior can be observed:
> > >
> > > 1. large file is written
> > > 2. after 30 seconds, nr_dirty goes down by 1024
> > > 3. then for some time (< 30 sec) nothing happens (disk idle)
> > > 4. then nr_dirty again goes down by 1024
> > > 5. repeat from 3. until whole file is written
> > >
> > > So basically a 4Mbyte chunk of the file is written every 30 seconds.
> > > I'm quite sure this is not the intended behavior.
> > >
> > > The reason seems to be that __sync_single_inode() will move the
> > > partially written inode from s_io onto s_dirty, and sync_sb_inode()
> > > will not splice it back onto s_io until the rest of the inodes on s_io
> > > has been processed.
> >
> > It does all sorts of weird crap.
> >
> > > Since there will probably be a recently dirtied inode on s_io, this
> > > will take some of time, but always less than 30 sec.
> > >
> > > I don't know what's the easiest solution.
> > >
> > > Any ideas?
> >
> > Try 2.6.23-rc1-mm2.
>
> Much better, but still not perfect.

I've kinda lost track of the status of all these patches. I _think_ Ken
has identified a remaining problem even after his
writeback-fix-periodic-superblock-dirty-inode-flushing.patch, but maybe I
misremember.

Ken, can you remind us of the status there, please?

> Now it writes out 1024 pages after 30 seconds and then the rest after
> another 30s.

Bah.

> If my analysis is correct, this is because when it first gets onto
> s_io other inodes will get there too (with up-to 30s later dirying
> time), and the contents of s_more_io won't be recycled until the
> current contents of s_io are processed.
>
> Maybe this is OK, the previous weird stuff didn't seem to bother a lot
> of people either.

There were heaps of problems in there and it is surprising how few people
were hitting them. Ordered-mode journalling filesystems will fix it all up
behind the scenes, of course.

I just have a bad feeling about that code - list_heads are the wrong data
structure and it all needs to be ripped and redone using some indexable
data structure. There has been desultory discussion, but nothing's
happening and nothing will happen in the medium term, so we need to keep
on whapping bandainds on it.

2007-08-02 19:35:47

by Miklos Szeredi

[permalink] [raw]
Subject: Re: kupdate weirdness

> There were heaps of problems in there and it is surprising how few people
> were hitting them. Ordered-mode journalling filesystems will fix it all up
> behind the scenes, of course.
>
> I just have a bad feeling about that code - list_heads are the wrong data
> structure and it all needs to be ripped and redone using some indexable
> data structure. There has been desultory discussion, but nothing's
> happening and nothing will happen in the medium term, so we need to keep
> on whapping bandainds on it.

The reason why I'm looking at that code is because of those
balance_dirty_pages() deadlocks. I'm not perfectly happy with the
per-pdi-per-cpu counters Peter's patch is introducing.

I was wondering if we can count the number of writeback pages through
the radix tree, just like we do for dirty pages?

All that would be needed is to keep the under-writeback inodes on some
list as well.

But I realize, that this introduces it's own problems as well...

Miklos