Hi all,
with 3.16-rc1 rsync stops writing to my btrfs filesystem and stays at a
D+ state.
git bisect showed that the problematic commit is:
762380ad9322951cea4ce9d24864265f9c66a916 is the first bad commit
commit 762380ad9322951cea4ce9d24864265f9c66a916
Author: Jens Axboe <[email protected]>
Date: Thu Jun 5 13:38:39 2014 -0600
block: add notion of a chunk size for request merging
Some drivers have different limits on what size a request should
optimally be, depending on the offset of the request. Similar to
dividing a device into chunks. Add a setting that allows the driver
to inform the block layer of such a chunk size. The block layer will
then prevent merging across the chunks.
This is needed to optimally support NVMe with a non-zero stripe size.
Signed-off-by: Jens Axboe <[email protected]>
sysrq w gives:
[ 3287.169569] SysRq : Show Blocked State
[ 3287.169655] task PC stack pid father
[ 3287.169675] rsync D 0000000000000001 0 626 613
0x00000000
[ 3287.169685] ffff8802037c7d08 0000000000000082 ffff8800cf2e7010
0000000000014500
[ 3287.169693] ffff8802037c7fd8 0000000000014500 ffff8800cf2e7010
ffff8802037c7c98
[ 3287.169700] 0000000000000292 0000000000000292 ffff8802037c7c50
ffff8802037c7c50
[ 3287.169706] Call Trace:
[ 3287.169722] [<ffffffff814f71d2>] ? do_page_fault+0x22/0x30
[ 3287.169733] [<ffffffff8101cd89>] ? read_tsc+0x9/0x20
[ 3287.169743] [<ffffffff811379d0>] ? sleep_on_page+0x20/0x20
[ 3287.169749] [<ffffffff814eff29>] schedule+0x29/0x70
[ 3287.169756] [<ffffffff814f0214>] io_schedule+0x94/0xf0
[ 3287.169763] [<ffffffff811379de>] sleep_on_page_killable+0xe/0x50
[ 3287.169770] [<ffffffff814f0748>] __wait_on_bit_lock+0x48/0xb0
[ 3287.169777] [<ffffffff8128c753>] ? radix_tree_lookup_slot+0x13/0x30
[ 3287.169785] [<ffffffff81137b7a>] __lock_page_killable+0x6a/0x70
[ 3287.169792] [<ffffffff810acec0>] ? autoremove_wake_function+0x40/0x40
[ 3287.169799] [<ffffffff811398c8>] generic_file_aio_read+0x458/0x6a0
[ 3287.169808] [<ffffffff811ab63a>] do_sync_read+0x5a/0x90
[ 3287.169816] [<ffffffff811abc47>] vfs_read+0x97/0x160
[ 3287.169823] [<ffffffff811ac886>] SyS_read+0x46/0xc0
[ 3287.169831] [<ffffffff814fb969>] system_call_fastpath+0x16/0x1b
$ btrfs filesystem show /storage/btrfs
Label: none uuid: bde3c349-9e08-45bb-8517-b9a6dda81e88
Total devices 6 FS bytes used 8.58TiB
devid 1 size 3.64TiB used 3.02TiB path /dev/sdf
devid 2 size 1.82TiB used 1.20TiB path /dev/sda
devid 3 size 1.82TiB used 1.20TiB path /dev/sdb
devid 4 size 1.82TiB used 1.20TiB path /dev/sdc
devid 5 size 1.82TiB used 1.20TiB path /dev/sdd
devid 6 size 1.82TiB used 1.20TiB path /dev/sdh
Btrfs v3.14.2-dirty
$ btrfs fi df /storage/btrfs
Data, single: total=8.89TiB, used=8.51TiB
System, RAID1: total=32.00MiB, used=992.00KiB
Metadata, RAID1: total=69.00GiB, used=67.83GiB
unknown, single: total=512.00MiB, used=0.00
full dmesg attached
Please CC me if replying on LKML, as I am not subscribed there
On 18/6/2014 12:35 ??, Konstantinos Skarlatos wrote:
> Hi all,
> with 3.16-rc1 rsync stops writing to my btrfs filesystem and stays at
> a D+ state.
> git bisect showed that the problematic commit is:
>
> 762380ad9322951cea4ce9d24864265f9c66a916 is the first bad commit
> commit 762380ad9322951cea4ce9d24864265f9c66a916
> Author: Jens Axboe <[email protected]>
> Date: Thu Jun 5 13:38:39 2014 -0600
>
> block: add notion of a chunk size for request merging
>
> Some drivers have different limits on what size a request should
> optimally be, depending on the offset of the request. Similar to
> dividing a device into chunks. Add a setting that allows the driver
> to inform the block layer of such a chunk size. The block layer will
> then prevent merging across the chunks.
>
> This is needed to optimally support NVMe with a non-zero stripe size.
>
> Signed-off-by: Jens Axboe <[email protected]>
>
> sysrq w gives:
Just ran another echo w > /proc/sysrq-trigger, attaching output.
>
> [ 3287.169569] SysRq : Show Blocked State
> [ 3287.169655] task PC stack pid father
> [ 3287.169675] rsync D 0000000000000001 0 626 613
> 0x00000000
> [ 3287.169685] ffff8802037c7d08 0000000000000082 ffff8800cf2e7010
> 0000000000014500
> [ 3287.169693] ffff8802037c7fd8 0000000000014500 ffff8800cf2e7010
> ffff8802037c7c98
> [ 3287.169700] 0000000000000292 0000000000000292 ffff8802037c7c50
> ffff8802037c7c50
> [ 3287.169706] Call Trace:
> [ 3287.169722] [<ffffffff814f71d2>] ? do_page_fault+0x22/0x30
> [ 3287.169733] [<ffffffff8101cd89>] ? read_tsc+0x9/0x20
> [ 3287.169743] [<ffffffff811379d0>] ? sleep_on_page+0x20/0x20
> [ 3287.169749] [<ffffffff814eff29>] schedule+0x29/0x70
> [ 3287.169756] [<ffffffff814f0214>] io_schedule+0x94/0xf0
> [ 3287.169763] [<ffffffff811379de>] sleep_on_page_killable+0xe/0x50
> [ 3287.169770] [<ffffffff814f0748>] __wait_on_bit_lock+0x48/0xb0
> [ 3287.169777] [<ffffffff8128c753>] ? radix_tree_lookup_slot+0x13/0x30
> [ 3287.169785] [<ffffffff81137b7a>] __lock_page_killable+0x6a/0x70
> [ 3287.169792] [<ffffffff810acec0>] ? autoremove_wake_function+0x40/0x40
> [ 3287.169799] [<ffffffff811398c8>] generic_file_aio_read+0x458/0x6a0
> [ 3287.169808] [<ffffffff811ab63a>] do_sync_read+0x5a/0x90
> [ 3287.169816] [<ffffffff811abc47>] vfs_read+0x97/0x160
> [ 3287.169823] [<ffffffff811ac886>] SyS_read+0x46/0xc0
> [ 3287.169831] [<ffffffff814fb969>] system_call_fastpath+0x16/0x1b
>
> $ btrfs filesystem show /storage/btrfs
> Label: none uuid: bde3c349-9e08-45bb-8517-b9a6dda81e88
> Total devices 6 FS bytes used 8.58TiB
> devid 1 size 3.64TiB used 3.02TiB path /dev/sdf
> devid 2 size 1.82TiB used 1.20TiB path /dev/sda
> devid 3 size 1.82TiB used 1.20TiB path /dev/sdb
> devid 4 size 1.82TiB used 1.20TiB path /dev/sdc
> devid 5 size 1.82TiB used 1.20TiB path /dev/sdd
> devid 6 size 1.82TiB used 1.20TiB path /dev/sdh
>
> Btrfs v3.14.2-dirty
>
> $ btrfs fi df /storage/btrfs
> Data, single: total=8.89TiB, used=8.51TiB
> System, RAID1: total=32.00MiB, used=992.00KiB
> Metadata, RAID1: total=69.00GiB, used=67.83GiB
> unknown, single: total=512.00MiB, used=0.00
>
> full dmesg attached
> Please CC me if replying on LKML, as I am not subscribed there
On 2014-06-17 14:35, Konstantinos Skarlatos wrote:
> Hi all,
> with 3.16-rc1 rsync stops writing to my btrfs filesystem and stays at a
> D+ state.
> git bisect showed that the problematic commit is:
>
> 762380ad9322951cea4ce9d24864265f9c66a916 is the first bad commit
> commit 762380ad9322951cea4ce9d24864265f9c66a916
> Author: Jens Axboe <[email protected]>
> Date: Thu Jun 5 13:38:39 2014 -0600
>
> block: add notion of a chunk size for request merging
>
> Some drivers have different limits on what size a request should
> optimally be, depending on the offset of the request. Similar to
> dividing a device into chunks. Add a setting that allows the driver
> to inform the block layer of such a chunk size. The block layer will
> then prevent merging across the chunks.
>
> This is needed to optimally support NVMe with a non-zero stripe size.
>
> Signed-off-by: Jens Axboe <[email protected]>
That's odd, should not have any effect since nobody enables stripe sizes
in the kernel. I'll double check, perhaps it's not always being cleared.
Ah wait, does the attached help?
--
Jens Axboe
On 18/6/2014 5:11 ??, Jens Axboe wrote:
> On 2014-06-17 14:35, Konstantinos Skarlatos wrote:
>> Hi all,
>> with 3.16-rc1 rsync stops writing to my btrfs filesystem and stays at a
>> D+ state.
>> git bisect showed that the problematic commit is:
>>
>> 762380ad9322951cea4ce9d24864265f9c66a916 is the first bad commit
>> commit 762380ad9322951cea4ce9d24864265f9c66a916
>> Author: Jens Axboe <[email protected]>
>> Date: Thu Jun 5 13:38:39 2014 -0600
>>
>> block: add notion of a chunk size for request merging
>>
>> Some drivers have different limits on what size a request should
>> optimally be, depending on the offset of the request. Similar to
>> dividing a device into chunks. Add a setting that allows the driver
>> to inform the block layer of such a chunk size. The block layer
>> will
>> then prevent merging across the chunks.
>>
>> This is needed to optimally support NVMe with a non-zero stripe
>> size.
>>
>> Signed-off-by: Jens Axboe <[email protected]>
>
> That's odd, should not have any effect since nobody enables stripe
> sizes in the kernel. I'll double check, perhaps it's not always being
> cleared.
>
> Ah wait, does the attached help?
Yes, it works! I recompiled at commit
762380ad9322951cea4ce9d24864265f9c66a916 with your patch and it looks
ok. Rebooted back to the unpatched kernel and the bug showed up again
immediately.
The funny thing is that the problem only showed on my (multi-disk) btrfs
filesystem. / which is on ext4 seems to work fine.
>
>
On 2014-06-18 00:21, Konstantinos Skarlatos wrote:
> On 18/6/2014 5:11 ??, Jens Axboe wrote:
>> On 2014-06-17 14:35, Konstantinos Skarlatos wrote:
>>> Hi all,
>>> with 3.16-rc1 rsync stops writing to my btrfs filesystem and stays at a
>>> D+ state.
>>> git bisect showed that the problematic commit is:
>>>
>>> 762380ad9322951cea4ce9d24864265f9c66a916 is the first bad commit
>>> commit 762380ad9322951cea4ce9d24864265f9c66a916
>>> Author: Jens Axboe <[email protected]>
>>> Date: Thu Jun 5 13:38:39 2014 -0600
>>>
>>> block: add notion of a chunk size for request merging
>>>
>>> Some drivers have different limits on what size a request should
>>> optimally be, depending on the offset of the request. Similar to
>>> dividing a device into chunks. Add a setting that allows the driver
>>> to inform the block layer of such a chunk size. The block layer
>>> will
>>> then prevent merging across the chunks.
>>>
>>> This is needed to optimally support NVMe with a non-zero stripe
>>> size.
>>>
>>> Signed-off-by: Jens Axboe <[email protected]>
>>
>> That's odd, should not have any effect since nobody enables stripe
>> sizes in the kernel. I'll double check, perhaps it's not always being
>> cleared.
>>
>> Ah wait, does the attached help?
>
> Yes, it works! I recompiled at commit
> 762380ad9322951cea4ce9d24864265f9c66a916 with your patch and it looks
> ok. Rebooted back to the unpatched kernel and the bug showed up again
> immediately.
>
> The funny thing is that the problem only showed on my (multi-disk) btrfs
> filesystem. / which is on ext4 seems to work fine.
Probably because the multi-disk setup doesn't have hw_sectors set, I'm
guessing. But great, I'll get this upstream asap. Thanks for testing!
--
Jens Axboe