2008-03-19 23:31:49

by David Chinner

[permalink] [raw]
Subject: Re: [PATCH] A deadlock free and best try version of drop_caches()

On Wed, Mar 19, 2008 at 07:27:29PM +0800, Fengguang Wu wrote:
> On Tue, Mar 18, 2008 at 10:28:44PM +1100, David Chinner wrote:
> > Looks like everything is backed up on the inode_lock. Why? Looks
> > like drop_pagecache_sb() is doing something ..... suboptimal.
......
> > Anyone know the reason why drop_pagecache_sb() uses such a brute-force
> > mechanism to free up clean page cache pages?
>
> Because extensive use of it(out of testings) is discouraged? ;-)
>
> I have been running a longer but safer version, let's merge it?

So you walk the inode hash to find inodes? Seems like a nice idea on
the surface.... Won't it need to hold the iprune_mutex to prevent
races with prune_icache() and invalidate_list()?

Hmmmm - what about unhashed inodes? We'll never see them with this
method of traversal. I ask because I'm working on some prototype
patches for XFS that avoid using the inode hash altogether and drive
inode lookup from the multitude of radix trees we have per filesystem
(for parallelised and lockless inode lookup).

The above scanning method would not work at all with that sort of
filesystem structure. Perhaps combining the bulk get/put with Jan's
get/put method for walking the sb inode list would be sufficient?

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group


2008-03-20 12:28:28

by Wu Fengguang

[permalink] [raw]
Subject: Re: [PATCH] A deadlock free and best try version of drop_caches()

On Thu, Mar 20, 2008 at 09:03:18AM +1100, David Chinner wrote:
> On Wed, Mar 19, 2008 at 07:27:29PM +0800, Fengguang Wu wrote:
> > On Tue, Mar 18, 2008 at 10:28:44PM +1100, David Chinner wrote:
> > > Looks like everything is backed up on the inode_lock. Why? Looks
> > > like drop_pagecache_sb() is doing something ..... suboptimal.
> ......
> > > Anyone know the reason why drop_pagecache_sb() uses such a brute-force
> > > mechanism to free up clean page cache pages?
> >
> > Because extensive use of it(out of testings) is discouraged? ;-)
> >
> > I have been running a longer but safer version, let's merge it?
>
> So you walk the inode hash to find inodes? Seems like a nice idea on
> the surface.... Won't it need to hold the iprune_mutex to prevent
> races with prune_icache() and invalidate_list()?

Oh, I didn't notice iprune_mutex :-/

> Hmmmm - what about unhashed inodes? We'll never see them with this
> method of traversal. I ask because I'm working on some prototype
> patches for XFS that avoid using the inode hash altogether and drive
> inode lookup from the multitude of radix trees we have per filesystem
> (for parallelised and lockless inode lookup).
>
> The above scanning method would not work at all with that sort of
> filesystem structure. Perhaps combining the bulk get/put with Jan's
> get/put method for walking the sb inode list would be sufficient?

You mean to move __invalidate_mapping_pages() out of inode_lock in
drop_pagecache_sb()? Sounds reasonable.

BTW, I noticed the following comments on locking order are outdated:

filemap.c
* ->inode_lock
* ->sb_lock (fs/fs-writeback.c)

rmap.c
* sb_lock (within inode_lock in fs/fs-writeback.c)

They are invalidated by this patch:
http://git.kernel.org/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=commit;h=03da8814512f55441b9bd57c1ceddd28fbcf59ba
[PATCH] Rearrangement of inode_lock in writeback_inodes()

[...] It narrows down use of inode_lock and removes unneccassary
nesting of sb_lock and inode_lock. [...]

Thank you,
Fengguang