2014-12-02 14:28:32

by Dmitry Monakhov

[permalink] [raw]
Subject: [PATCH] ext4: ext4_da_convert_inline_data_to_extent drop locked page after error

Testcase:
xfstests generic/270
MKFS_OPTIONS="-q -I 256 -O inline_data,64bit"

Call Trace:
[<ffffffff81144c76>] lock_page+0x35/0x39 -------> DEADLOCK
[<ffffffff81145260>] pagecache_get_page+0x65/0x15a
[<ffffffff811507fc>] truncate_inode_pages_range+0x1db/0x45c
[<ffffffff8120ea63>] ? ext4_da_get_block_prep+0x439/0x4b6
[<ffffffff811b29b7>] ? __block_write_begin+0x284/0x29c
[<ffffffff8120e62a>] ? ext4_change_inode_journal_flag+0x16b/0x16b
[<ffffffff81150af0>] truncate_inode_pages+0x12/0x14
[<ffffffff81247cb4>] ext4_truncate_failed_write+0x19/0x25
[<ffffffff812488cf>] ext4_da_write_inline_data_begin+0x196/0x31c
[<ffffffff81210dad>] ext4_da_write_begin+0x189/0x302
[<ffffffff810c07ac>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff810ddd13>] ? read_seqcount_begin.clone.1+0x9f/0xcc
[<ffffffff8114309d>] generic_perform_write+0xc7/0x1c6
[<ffffffff810c040e>] ? mark_held_locks+0x59/0x77
[<ffffffff811445d1>] __generic_file_write_iter+0x17f/0x1c5
[<ffffffff8120726b>] ext4_file_write_iter+0x2a5/0x354
[<ffffffff81185656>] ? file_start_write+0x2a/0x2c
[<ffffffff8107bcdb>] ? bad_area_nosemaphore+0x13/0x15
[<ffffffff811858ce>] new_sync_write+0x8a/0xb2
[<ffffffff81186e7b>] vfs_write+0xb5/0x14d
[<ffffffff81186ffb>] SyS_write+0x5c/0x8c
[<ffffffff816f2529>] system_call_fastpath+0x12/0x17

Signed-off-by: Dmitry Monakhov <[email protected]>
---
fs/ext4/inline.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index d3d8192..8edb224 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -811,6 +811,9 @@ static int ext4_da_convert_inline_data_to_extent(struct address_space *mapping,
ret = __block_write_begin(page, 0, inline_size,
ext4_da_get_block_prep);
if (ret) {
+ unlock_page(page);
+ page_cache_release(page);
+ page = NULL;
ext4_truncate_failed_write(inode);
goto out;
}
--
1.7.1



2014-12-02 23:59:20

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: ext4_da_convert_inline_data_to_extent drop locked page after error

On Tue, Dec 02, 2014 at 06:28:20PM +0400, Dmitry Monakhov wrote:
> Testcase:
> xfstests generic/270
> MKFS_OPTIONS="-q -I 256 -O inline_data,64bit"
>
> Call Trace:
> [<ffffffff81144c76>] lock_page+0x35/0x39 -------> DEADLOCK
> [<ffffffff81145260>] pagecache_get_page+0x65/0x15a
> [<ffffffff811507fc>] truncate_inode_pages_range+0x1db/0x45c
> [<ffffffff8120ea63>] ? ext4_da_get_block_prep+0x439/0x4b6
> [<ffffffff811b29b7>] ? __block_write_begin+0x284/0x29c
> [<ffffffff8120e62a>] ? ext4_change_inode_journal_flag+0x16b/0x16b
> [<ffffffff81150af0>] truncate_inode_pages+0x12/0x14
> [<ffffffff81247cb4>] ext4_truncate_failed_write+0x19/0x25
> [<ffffffff812488cf>] ext4_da_write_inline_data_begin+0x196/0x31c
> [<ffffffff81210dad>] ext4_da_write_begin+0x189/0x302
> [<ffffffff810c07ac>] ? trace_hardirqs_on+0xd/0xf
> [<ffffffff810ddd13>] ? read_seqcount_begin.clone.1+0x9f/0xcc
> [<ffffffff8114309d>] generic_perform_write+0xc7/0x1c6
> [<ffffffff810c040e>] ? mark_held_locks+0x59/0x77
> [<ffffffff811445d1>] __generic_file_write_iter+0x17f/0x1c5
> [<ffffffff8120726b>] ext4_file_write_iter+0x2a5/0x354
> [<ffffffff81185656>] ? file_start_write+0x2a/0x2c
> [<ffffffff8107bcdb>] ? bad_area_nosemaphore+0x13/0x15
> [<ffffffff811858ce>] new_sync_write+0x8a/0xb2
> [<ffffffff81186e7b>] vfs_write+0xb5/0x14d
> [<ffffffff81186ffb>] SyS_write+0x5c/0x8c
> [<ffffffff816f2529>] system_call_fastpath+0x12/0x17
>
> Signed-off-by: Dmitry Monakhov <[email protected]>

This patch makes sense but I haven't been able to duplicate the
failure. Maybe it's sensitive to the size of the scratch device, or
some such?

- Ted

2014-12-03 01:37:00

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: ext4_da_convert_inline_data_to_extent drop locked page after error

On Tue, Dec 02, 2014 at 06:59:13PM -0500, Theodore Ts'o wrote:
> This patch makes sense but I haven't been able to duplicate the
> failure. Maybe it's sensitive to the size of the scratch device, or
> some such?

Anyway, I've applied the patch.

- Ted

2014-12-03 08:20:40

by Dmitry Monakhov

[permalink] [raw]
Subject: Re: [PATCH] ext4: ext4_da_convert_inline_data_to_extent drop locked page after error

Theodore Ts'o <[email protected]> writes:

> On Tue, Dec 02, 2014 at 06:28:20PM +0400, Dmitry Monakhov wrote:
>> Testcase:
>> xfstests generic/270
>> MKFS_OPTIONS="-q -I 256 -O inline_data,64bit"
>>
>> Call Trace:
>> [<ffffffff81144c76>] lock_page+0x35/0x39 -------> DEADLOCK
>> [<ffffffff81145260>] pagecache_get_page+0x65/0x15a
>> [<ffffffff811507fc>] truncate_inode_pages_range+0x1db/0x45c
>> [<ffffffff8120ea63>] ? ext4_da_get_block_prep+0x439/0x4b6
>> [<ffffffff811b29b7>] ? __block_write_begin+0x284/0x29c
>> [<ffffffff8120e62a>] ? ext4_change_inode_journal_flag+0x16b/0x16b
>> [<ffffffff81150af0>] truncate_inode_pages+0x12/0x14
>> [<ffffffff81247cb4>] ext4_truncate_failed_write+0x19/0x25
>> [<ffffffff812488cf>] ext4_da_write_inline_data_begin+0x196/0x31c
>> [<ffffffff81210dad>] ext4_da_write_begin+0x189/0x302
>> [<ffffffff810c07ac>] ? trace_hardirqs_on+0xd/0xf
>> [<ffffffff810ddd13>] ? read_seqcount_begin.clone.1+0x9f/0xcc
>> [<ffffffff8114309d>] generic_perform_write+0xc7/0x1c6
>> [<ffffffff810c040e>] ? mark_held_locks+0x59/0x77
>> [<ffffffff811445d1>] __generic_file_write_iter+0x17f/0x1c5
>> [<ffffffff8120726b>] ext4_file_write_iter+0x2a5/0x354
>> [<ffffffff81185656>] ? file_start_write+0x2a/0x2c
>> [<ffffffff8107bcdb>] ? bad_area_nosemaphore+0x13/0x15
>> [<ffffffff811858ce>] new_sync_write+0x8a/0xb2
>> [<ffffffff81186e7b>] vfs_write+0xb5/0x14d
>> [<ffffffff81186ffb>] SyS_write+0x5c/0x8c
>> [<ffffffff816f2529>] system_call_fastpath+0x12/0x17
>>
>> Signed-off-by: Dmitry Monakhov <[email protected]>
>
> This patch makes sense but I haven't been able to duplicate the
> failure. Maybe it's sensitive to the size of the scratch device, or
> some such?
I use xfstest-bld. So our test environment should be equals.
Unfortunately this is is not 100% reproducible, that is why I've missed
that bug on first test-run. Even more yesterday I've scheduled inline_data tests
to run for a whole night and it is deadlocked again in almost the same
place :)

[<ffffffff816f1698>] rwsem_down_write_failed+0x216/0x283 ->want xattr_sem -> DEADLOCK
[<ffffffff81414763>] call_rwsem_down_write_failed+0x13/0x20
[<ffffffff816f0eb3>] ? down_write+0x98/0xbc
[<ffffffff81249c9e>] ? ext4_inline_data_truncate+0x96/0x2c6
[<ffffffff81249c9e>] ext4_inline_data_truncate+0x96/0x2c6
[<ffffffff81210335>] ext4_truncate+0x171/0x432
[<ffffffff8124870c>] ext4_truncate_failed_write+0x21/0x25
[<ffffffff81249332>] ext4_da_write_inline_data_begin+0x1a9/0x33a ->holds xattr_sem

I'll send you updated version in a minute.


Attachments:
signature.asc (472.00 B)

2014-12-03 13:54:54

by Dmitry Monakhov

[permalink] [raw]
Subject: [PATCH] ext4: ext4_da_convert_inline_data_to_extent drop locked page after error V2

Testcase:
xfstests generic/270
MKFS_OPTIONS="-q -I 256 -O inline_data,64bit"

changes from V1
- drop xattr_sem before ext4_truncate_failed_write() because
ext4_inline_data_truncate() depends on it.
Call Trace:
[<ffffffff81144c76>] lock_page+0x35/0x39 -------> DEADLOCK
[<ffffffff81145260>] pagecache_get_page+0x65/0x15a
[<ffffffff811507fc>] truncate_inode_pages_range+0x1db/0x45c
[<ffffffff8120ea63>] ? ext4_da_get_block_prep+0x439/0x4b6
[<ffffffff811b29b7>] ? __block_write_begin+0x284/0x29c
[<ffffffff8120e62a>] ? ext4_change_inode_journal_flag+0x16b/0x16b
[<ffffffff81150af0>] truncate_inode_pages+0x12/0x14
[<ffffffff81247cb4>] ext4_truncate_failed_write+0x19/0x25
[<ffffffff812488cf>] ext4_da_write_inline_data_begin+0x196/0x31c
[<ffffffff81210dad>] ext4_da_write_begin+0x189/0x302
[<ffffffff810c07ac>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff810ddd13>] ? read_seqcount_begin.clone.1+0x9f/0xcc
[<ffffffff8114309d>] generic_perform_write+0xc7/0x1c6
[<ffffffff810c040e>] ? mark_held_locks+0x59/0x77
[<ffffffff811445d1>] __generic_file_write_iter+0x17f/0x1c5
[<ffffffff8120726b>] ext4_file_write_iter+0x2a5/0x354
[<ffffffff81185656>] ? file_start_write+0x2a/0x2c
[<ffffffff8107bcdb>] ? bad_area_nosemaphore+0x13/0x15
[<ffffffff811858ce>] new_sync_write+0x8a/0xb2
[<ffffffff81186e7b>] vfs_write+0xb5/0x14d
[<ffffffff81186ffb>] SyS_write+0x5c/0x8c
[<ffffffff816f2529>] system_call_fastpath+0x12/0x17

Signed-off-by: Dmitry Monakhov <[email protected]>
---
fs/ext4/inline.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index d3d8192..acd79b7 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -811,8 +811,11 @@ static int ext4_da_convert_inline_data_to_extent(struct address_space *mapping,
ret = __block_write_begin(page, 0, inline_size,
ext4_da_get_block_prep);
if (ret) {
+ up_read(&EXT4_I(inode)->xattr_sem);
+ unlock_page(page);
+ page_cache_release(page);
ext4_truncate_failed_write(inode);
- goto out;
+ return ret;
}

SetPageDirty(page);
--
1.7.1


2014-12-03 14:31:23

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: ext4_da_convert_inline_data_to_extent drop locked page after error

On Wed, Dec 03, 2014 at 11:20:12AM +0300, Dmitry Monakhov wrote:
> I use xfstest-bld. So our test environment should be equals.
> Unfortunately this is is not 100% reproducible, that is why I've missed
> that bug on first test-run. Even more yesterday I've scheduled inline_data tests
> to run for a whole night and it is deadlocked again in almost the same
> place :)

I've tried running that specific test several multiple times and I
wasn't ableto see the failure. One potential difference from the
default configuration is that I use a configured LVM volume instead of
a flat file for the test files (i.e., I use /dev/heap/scratch instead
of a file named "vdc", for example). It could be that this is
sufficiently timing sensitive that I don't see the problem because the
race window is pretty narrow.

OK, thanks for sending the updated patch.

- Ted