2015-05-17 07:02:29

by NeilBrown

[permalink] [raw]
Subject: Problems with bdev_write_page().



Hi Matthew,
I've just been looking at bdev_write_page().
You can read about why here:

http://marc.info/?t=142984068300001&r=1&w=2

it ends with a "git bisect" which points the finger at you.

If I look at bdev_write_page() it says:

* On entry, the page should be locked and not currently under writeback.
* On exit, if the write started successfully, the page will be unlocked and
* under writeback. If the write failed already (eg the driver failed to
* queue the page to the device), the page will still be locked. If the
* caller is a ->writepage implementation, it will need to unlock the page.

So the page is unlocked on success.

In __mpage_writepage() I find

if (!bdev_write_page(bdev, blocks[0] << (blkbits - 9),
page, wbc)) {
clean_buffers(page, first_unmapped);


so if bdev_write_page() succeeds, i.e. if it returns '0', then
clean_buffers() is called. At this point the page is unlocked remember.

clean_buffers may call

try_to_free_buffers(page);

(without first locking the page, so still unlocked)..
try_to_free_buffers starts:

BUG_ON(!PageLocked(page));


Opps.

Can you propose a fix for Charles, who can trigger this bug and nicely
bisected it for us - thanks Charles!!!

Also while looking at the code, I notice that brd_rw_page() unconditionally
calls page_endio() and, in the WRITE case, page_endio unconditionally calls
end_page_writeback(), which has

if (!test_clear_page_writeback(page))
BUG();


and so cannot tolerate being called twice in a row.
So if brd_rw_page() ever returned an error (which seems possible though not
likely), end_page_writeback() would be called once by page_endio() and once
in the error path of bdev_write_page(), and the BUG above would be triggered.

I'll leave that for you to sort out too :-)

Thanks,
NeilBrown


Attachments:
(No filename) (811.00 B)
OpenPGP digital signature

2015-05-25 23:15:24

by Charles Bertsch

[permalink] [raw]
Subject: Re: Problems with bdev_write_page().

On 05/17/2015 12:02 AM, NeilBrown wrote:
>
>
> Hi Matthew,
> I've just been looking at bdev_write_page().
> You can read about why here:
>
> http://marc.info/?t=142984068300001&r=1&w=2
....
>
> Can you propose a fix for Charles, who can trigger this bug and nicely
> bisected it for us - thanks Charles!!!
>
This problem still occurs with 4.1.0-rc5 -- three stack traces attached.

Charles Bertsch



Attachments:
linprob.stktrace.0525a.4.1-rc5.txt (10.36 kB)
linprob.stktrace.0525b.4.1-rc5.txt (10.56 kB)
linprob.stktrace.0525c.4.1-rc5.txt (8.47 kB)
Download all attachments