2002-01-10 00:41:06

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH] pagecache lock ordering

There's two places, do_buffer_fdatasync and __find_lock_page_helper,
which violate the pagemap_lru_lock before pagecache_lock rule,
if page_cache_release frees page calling lru_cache_del.

I was worried when I saw Ben's patch to move lru_cache_del down
into __free_pages_ok: thought it might cause a spinlock ordering
violation somewhere. Source check showed up no new problem from
his patch, but these two already a problem (and still a problem
when Ben's patch goes in).

Hugh

--- 2.4.18-pre2/mm/filemap.c Wed Jan 9 18:17:39 2002
+++ linux/mm/filemap.c Wed Jan 9 19:11:17 2002
@@ -493,6 +493,7 @@
{
struct list_head *curr;
struct page *page;
+ struct page *relpage = NULL;
int retval = 0;

spin_lock(&pagecache_lock);
@@ -509,6 +510,9 @@

page_cache_get(page);
spin_unlock(&pagecache_lock);
+ if (relpage)
+ page_cache_release(relpage);
+ relpage = page;
lock_page(page);

/* The buffers could have been free'd while we waited for the page lock */
@@ -518,10 +522,11 @@
UnlockPage(page);
spin_lock(&pagecache_lock);
curr = page->list.next;
- page_cache_release(page);
}
spin_unlock(&pagecache_lock);

+ if (relpage)
+ page_cache_release(relpage);
return retval;
}

@@ -900,14 +905,15 @@
if (TryLockPage(page)) {
spin_unlock(&pagecache_lock);
lock_page(page);
- spin_lock(&pagecache_lock);

/* Has the page been re-allocated while we slept? */
if (page->mapping != mapping || page->index != offset) {
UnlockPage(page);
page_cache_release(page);
+ spin_lock(&pagecache_lock);
goto repeat;
}
+ spin_lock(&pagecache_lock);
}
}
return page;


2002-01-10 00:59:39

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] pagecache lock ordering

Hugh Dickins wrote:
>
> There's two places, do_buffer_fdatasync

generic_buffer_fdatasync() and hence do_buffer_fdatasync()
are completely unused. It may be simpler to just trash
them.

> and __find_lock_page_helper,

Yeah. The code can't deadlock because:

page_cache_get();
spin_lock(&pagecache_lock);
page_cache_release();

we implicitly *know* that page_cache_release won't try
to acquire pagemap_lru_lock, because the page is in the
pagecache and has count=2 or more. Which is a bit, umm,
subtle.

I get the feeling that a lot of this would be cleaned up
if presence on an LRU contributed to page->count. It
seems strange, kludgy and probably racy that this is not
the case.

-

2002-01-10 14:05:27

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH] pagecache lock ordering

On Wed, 9 Jan 2002, Andrew Morton wrote:
> Hugh Dickins wrote:
> >
> > There's two places, do_buffer_fdatasync
>
> generic_buffer_fdatasync() and hence do_buffer_fdatasync()
> are completely unused. It may be simpler to just trash
> them.

Oh, nice observation. writeout_one_page could be trashed at
the same time (but not waitfor_one_page, still in use elsewhere).

But what's the chance that an out-of-tree filesystem might
be using generic_buffer_fdatasync, which is still prototyped
and exported? Remove from 2.5 but leave in 2.4? I'd like to
see them go completely - revenge for being misled by them -
but Marcelo may decide otherwise.

> > and __find_lock_page_helper,
>
> Yeah. The code can't deadlock because:
>
> page_cache_get();
===> spin_unlock(&pagecache_lock);
===> lock_page();
> spin_lock(&pagecache_lock);
> page_cache_release();
>
> we implicitly *know* that page_cache_release won't try
> to acquire pagemap_lru_lock, because the page is in the
> pagecache and has count=2 or more. Which is a bit, umm,
> subtle.

I'm not entirely convinced (you'll agree that the argument
depends on rather more than you've stated there), but today
I cannot support the counter-example I thought I had (a page
already "on its way out": but we're more careful to hold both
pagecache_lock and page lock when checking page count before
removing from inode cache than I remembered).

(Hmm, but if it's all thoroughly protected, why do we even bother
to recheck mapping and index same there? Perhaps the shmem case.)

I was also overlooking that the lru_cache_del call was added into
page_cache_release to cope with the _anonymous_ pages Linus put
on LRU. Cached pages were and are deleted from LRU before getting
there, and __find_lock_page_helper applies to cached not anon pages.

So I agree, both parts of my patch are unnecessary:
harmless, but no justification for applying it.

> I get the feeling that a lot of this would be cleaned up
> if presence on an LRU contributed to page->count. It
> seems strange, kludgy and probably racy that this is not
> the case.

That makes a lot of sense: but I feel much safer in agreeing
with you than in making the corresponding changes!

Many thanks for looking it over,
Hugh

2002-01-10 20:45:21

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] pagecache lock ordering

Hugh Dickins wrote:
>
> On Wed, 9 Jan 2002, Andrew Morton wrote:
> > Hugh Dickins wrote:
> > >
> > > There's two places, do_buffer_fdatasync
> >
> > generic_buffer_fdatasync() and hence do_buffer_fdatasync()
> > are completely unused. It may be simpler to just trash
> > them.
>
> Oh, nice observation. writeout_one_page could be trashed at
> the same time (but not waitfor_one_page, still in use elsewhere).

I'm struggling to see a use for generic_buffer_fdatasync(). Maybe
for a filesystem which doesn't implement ->writepage()? Dunno.

The only places where waitfor_one_page() is used are in the
directory-in-pagecache filesystems, for synchronous directories.
And the usage there seems completely incorrect.

They call ->commit_write(), which leaves the buffers dirty
and unlocked, then they call waitfor_one_page(), which does nothing
at all because the buffers are unlocked. I/O was never started,
we didn't sync the blocks.

I feel I'm missing somethng here, because I checked that code
a few months back. hmmm..

> But what's the chance that an out-of-tree filesystem might
> be using generic_buffer_fdatasync, which is still prototyped
> and exported? Remove from 2.5 but leave in 2.4? I'd like to
> see them go completely - revenge for being misled by them -
> but Marcelo may decide otherwise.

This code is untested, and there is no way for us to test it.
And now we think it may be deadlocky. It has to go.

But the withdrawal of an exported API from 2.4.x is a policy
decision for Marcelo to make.

> ...
>
> > I get the feeling that a lot of this would be cleaned up
> > if presence on an LRU contributed to page->count. It
> > seems strange, kludgy and probably racy that this is not
> > the case.
>
> That makes a lot of sense: but I feel much safer in agreeing
> with you than in making the corresponding changes!
>

Heh. I'll do the death-to-generic_buffer_fdatasync patch, and
I'll check whether we need to convert the dir-in-pagecache
filesytems over to writepage/wait_on_page.

-

2002-01-10 23:28:28

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: [PATCH] pagecache lock ordering

On Thu, Jan 10, 2002 at 12:39:11PM -0800, Andrew Morton wrote:
> I'm struggling to see a use for generic_buffer_fdatasync(). Maybe
> for a filesystem which doesn't implement ->writepage()? Dunno.

I seem to be using it in aio. Well, at least code based on it which
seems to work for most filesystems for O_DATASYNC...

-ben
--
Fish.

2002-01-10 23:53:37

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] pagecache lock ordering

Benjamin LaHaise wrote:
>
> On Thu, Jan 10, 2002 at 12:39:11PM -0800, Andrew Morton wrote:
> > I'm struggling to see a use for generic_buffer_fdatasync(). Maybe
> > for a filesystem which doesn't implement ->writepage()? Dunno.
>
> I seem to be using it in aio. Well, at least code based on it which
> seems to work for most filesystems for O_DATASYNC...
>

You seem to be using writeout_one_page(). What I was
thinking of was:

- Kill generic_buffer_fdatasync().
- Move writeout_one_page() into fs/buffer.c
- Move waitfor_one_page() into fs/buffer.c. This is just
for completeness; I expect this function will have no
callers soon. __iodesc_sync_wait_page() could use it though.

OK by you?

-

2002-01-10 23:55:12

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: [PATCH] pagecache lock ordering

On Thu, Jan 10, 2002 at 03:47:37PM -0800, Andrew Morton wrote:
> You seem to be using writeout_one_page(). What I was
> thinking of was:
>
> - Kill generic_buffer_fdatasync().
> - Move writeout_one_page() into fs/buffer.c
> - Move waitfor_one_page() into fs/buffer.c. This is just
> for completeness; I expect this function will have no
> callers soon. __iodesc_sync_wait_page() could use it though.
>
> OK by you?

*shrug* it just makes my task of merging patches harder, that's all.

-ben
--
Fish.