2005-05-13 01:26:47

by Jim Washer

[permalink] [raw]
Subject: CONFIRMED bug in do_generic_file_read

Well, my original post got ZERO responses, but I've continued to look
at this, and with the help of a colleague I think we've bottomed out.

do_generic_file_read does indeed have a bug, as mentioned below. However,
this bug is generally not hit as the call to a_ops->readpage (normally
pointing at blkdev_readpage) NEVER returns an error. However, the Veritas
code's readpage does return errors, and this triggers this KERNEL BUG.

I don't know enough about the page cache to know if simply NULLing the
page->mapping pointer is safe.

Anyone care to address this?

thanks
- jim


On Mon, 9 May 2005 08:02:05 -0700 (PDT)
Jim Washer <[email protected]> wrote:

>
> I've been looking at a stack trace from a 2.4.x kenrel where the kernel
> hits the "page has mapping still set" BUG() in __free_pages_ok.
>
> What I've found is that in do_generic_file_read, the call:
> error = mapping->a_ops->readpage(filp, page);
>
> is returning an error (EFAULT)
>
> However, shortly after, we fall into:
> /* UHHUH! A synchronous read error occurred. Report it */
> desc->error = error;
> page_cache_release(page);
> break;
>
> This code tries to free the page in question. However, is it reasonable for
> do_generic_file_read to free a page "that belongs to" filesystem code?
>
> Is there some (un)written rule that says filesystem readpage code MUST
> unmap a page before returning an error?
>
>
> I notice that several other locations in do_generic_file_read first check
> that the page has no mapping before freeing, but not here.
>
>
> Further details.
> 1)The file system in question is veritas VXFS, which is NOT open source.
> However, I'm asking this as a general question about whether or not
> there is a requirement that an given FS unmap a page before returning
> an error
>
> 2)The kernel is RHEL3u3, but the code path here apprears the same in 2.4.30
>
> 3)I've not included the stack trace here, as it came from a RHEL3 kernel,
> which I assume most reader of lkml would not have access to.
> I don't want to get into arguing about the analysis of this particular
> stack, but rather I want to consider if the attempted page free in
> do_generic_file_read is "correct" in ALL cases.
>
>
> thanks for considering this.
>
> - jim
>
>
> --
> James Washer
> IBM Linux Change Team
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
James Washer
IBM Linux Change Team


2005-05-13 07:57:44

by Al Viro

[permalink] [raw]
Subject: Re: CONFIRMED bug in do_generic_file_read

On Thu, May 12, 2005 at 05:57:57PM -0700, Jim Washer wrote:
> Well, my original post got ZERO responses, but I've continued to look
> at this, and with the help of a colleague I think we've bottomed out.

> do_generic_file_read does indeed have a bug, as mentioned below. However,
> this bug is generally not hit as the call to a_ops->readpage (normally
> pointing at blkdev_readpage) NEVER returns an error. However, the Veritas
> code's readpage does return errors, and this triggers this KERNEL BUG.

Hardly. What it does, apparently, is extra page_cache_release().

> I don't know enough about the page cache to know if simply NULLing the
> page->mapping pointer is safe.

Analysis above is BS.
a) page with non-NULL ->mapping == page in cache. Code that adds
page to page cache grabs a reference before putting it on the lists.
b) code that evicts page from cache drops a reference after getting
the sucker off lists.
c) final page_cache_release() while page is still in cache is,
indeed, a bug. Of memory-corrupting variety.
d) ->readpage() has no fscking business playing with any of the above.
e) while we are at it, -EFAULT is a hell of a strange error for
->readpage(), but that's a separate story.

Check for unbalanced page_cache_release() in their code.

> Anyone care to address this?

See above. Anything beyond that requires
1) access to their code
2) good supply of industrial-strength barf-bags to deal with
inevitable consequences of (1).

2005-05-13 15:17:44

by Al Viro

[permalink] [raw]
Subject: Re: CONFIRMED bug in do_generic_file_read

On Fri, May 13, 2005 at 06:42:50AM -0700, James Washer wrote:
> Al, relax.. as I said, I don't know much about page cache code.
>
> So, let me ask a question, if I can, with out upsetting you further.
>
> You say the analysis is, ah, incorrect.
>
> Can you help me understand what a readpage routine SHOULD do with a page
> when it finds it cannot "arrange" a successful read? Is simply returning
> an error incorrect behaviour? If so, what should the readpage do?

It is a perfectly acceptable behaviour. And it works just fine - e.g.
nfs_readpage() does that in quite a few cases.

What you are missing is the fact that page_cache_release() frees the
page only when it drops the final reference. And pages are pinned
down while they are in page cache.

If you see page_cache_release() right after ->readpage() triggering that
check, you've got out of ->readpage() with
* only one reference to page remaining
* one reference to that page acquired earlier in do_generic_file_read()
and not dropped until now.
* one reference to that page acquired back when it had been put
in page cache. Matching page_cache_release() would be done when page
is removed from page cache, but places that do it would remove the page
from cache first. Which would set ->mapping to NULL.

Conclusion: something had done an unbalanced page_cache_release(). That
happened after the moment when do_generic_file_read() had found the page
and pinned it down and before the end of ->readpage(). Most likely -
->readpage() itself or something called by it.
the page, but

2005-05-13 18:20:51

by Al Viro

[permalink] [raw]
Subject: Re: CONFIRMED bug in do_generic_file_read

On Fri, May 13, 2005 at 10:55:24AM -0700, James Washer wrote:
> In do_generic_file_read(), when the page is not found, it is created, and

.. with refcount 1

> __add_to_page_cache() is called, which will in turn call page_cache_get()
> which gives us a page->count of 1, no?

That would be 2.

> Now, we "goto reapage", which calls a_op->readpage. If this readpage
> simply returns an error, without any other actions, we drop down to the
> page_cache_release(). Finding the page->count==0, we'll proceed to call
> __free_page(), which calls __free_pages() which decrements and tests
> page->count via put_page_testzero(), returning true as page->count is now
> zero... and were off to __free_pages_ok() and our panic...
>
> What did I miss? Forgive me being dense, if I'm missing something here.
> And thanks again for you help understanding this.

Think for a minute - we allocate a refcounted object. That means
getting (the only) reference to it. That means refcount equal to 1. It's
a fairly common idiom - not just pages are handled that way.

What happens is:
* We created a page. We have a reference to it.
* We put it into cache. Now there are two ways to reach it - our reference
and global search structure.
* We do ->readpage() - we know that we have a reference, so it's not going
away
* We drop our reference. That does not affect the presence in cache - it's
not our responsibility anymore. We are done with this page; if somebody
decides to kick it out of cache, it will be their resposibility to drop the
reference created when putting into cache.