2019-07-31 13:47:53

by Sun Ke

[permalink] [raw]
Subject: [PATCH v2] nbd: replace kill_bdev() with __invalidate_device() again

From: Munehisa Kamata <[email protected]>

Commit abbbdf12497d ("replace kill_bdev() with __invalidate_device()")
once did this, but 29eaadc03649 ("nbd: stop using the bdev everywhere")
resurrected kill_bdev() and it has been there since then. So buffer_head
mappings still get killed on a server disconnection, and we can still
hit the BUG_ON on a filesystem on the top of the nbd device.

EXT4-fs (nbd0): mounted filesystem with ordered data mode. Opts: (null)
block nbd0: Receive control failed (result -32)
block nbd0: shutting down sockets
print_req_error: I/O error, dev nbd0, sector 66264 flags 3000
EXT4-fs warning (device nbd0): htree_dirblock_to_tree:979: inode #2: lblock 0: comm ls: error -5 reading directory block
print_req_error: I/O error, dev nbd0, sector 2264 flags 3000
EXT4-fs error (device nbd0): __ext4_get_inode_loc:4690: inode #2: block 283: comm ls: unable to read itable block
EXT4-fs error (device nbd0) in ext4_reserve_inode_write:5894: IO failure
------------[ cut here ]------------
kernel BUG at fs/buffer.c:3057!
invalid opcode: 0000 [#1] SMP PTI
CPU: 7 PID: 40045 Comm: jbd2/nbd0-8 Not tainted 5.1.0-rc3+ #4
Hardware name: Amazon EC2 m5.12xlarge/, BIOS 1.0 10/16/2017
RIP: 0010:submit_bh_wbc+0x18b/0x190
...
Call Trace:
jbd2_write_superblock+0xf1/0x230 [jbd2]
? account_entity_enqueue+0xc5/0xf0
jbd2_journal_update_sb_log_tail+0x94/0xe0 [jbd2]
jbd2_journal_commit_transaction+0x12f/0x1d20 [jbd2]
? __switch_to_asm+0x40/0x70
...
? lock_timer_base+0x67/0x80
kjournald2+0x121/0x360 [jbd2]
? remove_wait_queue+0x60/0x60
kthread+0xf8/0x130
? commit_timeout+0x10/0x10 [jbd2]
? kthread_bind+0x10/0x10
ret_from_fork+0x35/0x40

With __invalidate_device(), I no longer hit the BUG_ON with sync or
unmount on the disconnected device.

Fixes: 29eaadc03649 ("nbd: stop using the bdev everywhere")
Cc: [email protected]
Cc: Ratna Manoj Bolla <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: David Woodhouse <[email protected]>
Signed-off-by: Munehisa Kamata <[email protected]>

---
I reproduced this phenomenon on the fat file system.
reproduce steps :
1.Establish a nbd connection.
2.Run two threads:one do mount and umount,anther one do clear_sock ioctl
3.Then hit the BUG_ON.

v2: Delete a link.

Signed-off-by: SunKe <[email protected]>

drivers/block/nbd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 9bcde23..e21d2de 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1231,7 +1231,7 @@ static void nbd_clear_sock_ioctl(struct nbd_device *nbd,
struct block_device *bdev)
{
sock_shutdown(nbd);
- kill_bdev(bdev);
+ __invalidate_device(bdev, true);
nbd_bdev_reset(bdev);
if (test_and_clear_bit(NBD_HAS_CONFIG_REF,
&nbd->config->runtime_flags))
--
2.7.4


2019-07-31 15:11:04

by Josef Bacik

[permalink] [raw]
Subject: Re: [PATCH v2] nbd: replace kill_bdev() with __invalidate_device() again

On Wed, Jul 31, 2019 at 08:13:10PM +0800, SunKe wrote:
> From: Munehisa Kamata <[email protected]>
>
> Commit abbbdf12497d ("replace kill_bdev() with __invalidate_device()")
> once did this, but 29eaadc03649 ("nbd: stop using the bdev everywhere")
> resurrected kill_bdev() and it has been there since then. So buffer_head
> mappings still get killed on a server disconnection, and we can still
> hit the BUG_ON on a filesystem on the top of the nbd device.
>
> EXT4-fs (nbd0): mounted filesystem with ordered data mode. Opts: (null)
> block nbd0: Receive control failed (result -32)
> block nbd0: shutting down sockets
> print_req_error: I/O error, dev nbd0, sector 66264 flags 3000
> EXT4-fs warning (device nbd0): htree_dirblock_to_tree:979: inode #2: lblock 0: comm ls: error -5 reading directory block
> print_req_error: I/O error, dev nbd0, sector 2264 flags 3000
> EXT4-fs error (device nbd0): __ext4_get_inode_loc:4690: inode #2: block 283: comm ls: unable to read itable block
> EXT4-fs error (device nbd0) in ext4_reserve_inode_write:5894: IO failure
> ------------[ cut here ]------------
> kernel BUG at fs/buffer.c:3057!
> invalid opcode: 0000 [#1] SMP PTI
> CPU: 7 PID: 40045 Comm: jbd2/nbd0-8 Not tainted 5.1.0-rc3+ #4
> Hardware name: Amazon EC2 m5.12xlarge/, BIOS 1.0 10/16/2017
> RIP: 0010:submit_bh_wbc+0x18b/0x190
> ...
> Call Trace:
> jbd2_write_superblock+0xf1/0x230 [jbd2]
> ? account_entity_enqueue+0xc5/0xf0
> jbd2_journal_update_sb_log_tail+0x94/0xe0 [jbd2]
> jbd2_journal_commit_transaction+0x12f/0x1d20 [jbd2]
> ? __switch_to_asm+0x40/0x70
> ...
> ? lock_timer_base+0x67/0x80
> kjournald2+0x121/0x360 [jbd2]
> ? remove_wait_queue+0x60/0x60
> kthread+0xf8/0x130
> ? commit_timeout+0x10/0x10 [jbd2]
> ? kthread_bind+0x10/0x10
> ret_from_fork+0x35/0x40
>
> With __invalidate_device(), I no longer hit the BUG_ON with sync or
> unmount on the disconnected device.
>

Jeeze I swear I see this same patch go by every 6 months or so, not sure what
happens to it. Anyway

Reviewed-by: Josef Bacik <[email protected]>

Thanks,

Josef

2019-07-31 15:24:32

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v2] nbd: replace kill_bdev() with __invalidate_device() again

On 7/31/19 6:13 AM, SunKe wrote:
> From: Munehisa Kamata <[email protected]>
>
> Commit abbbdf12497d ("replace kill_bdev() with __invalidate_device()")
> once did this, but 29eaadc03649 ("nbd: stop using the bdev everywhere")
> resurrected kill_bdev() and it has been there since then. So buffer_head
> mappings still get killed on a server disconnection, and we can still
> hit the BUG_ON on a filesystem on the top of the nbd device.
>
> EXT4-fs (nbd0): mounted filesystem with ordered data mode. Opts: (null)
> block nbd0: Receive control failed (result -32)
> block nbd0: shutting down sockets
> print_req_error: I/O error, dev nbd0, sector 66264 flags 3000
> EXT4-fs warning (device nbd0): htree_dirblock_to_tree:979: inode #2: lblock 0: comm ls: error -5 reading directory block
> print_req_error: I/O error, dev nbd0, sector 2264 flags 3000
> EXT4-fs error (device nbd0): __ext4_get_inode_loc:4690: inode #2: block 283: comm ls: unable to read itable block
> EXT4-fs error (device nbd0) in ext4_reserve_inode_write:5894: IO failure
> ------------[ cut here ]------------
> kernel BUG at fs/buffer.c:3057!
> invalid opcode: 0000 [#1] SMP PTI
> CPU: 7 PID: 40045 Comm: jbd2/nbd0-8 Not tainted 5.1.0-rc3+ #4
> Hardware name: Amazon EC2 m5.12xlarge/, BIOS 1.0 10/16/2017
> RIP: 0010:submit_bh_wbc+0x18b/0x190
> ...
> Call Trace:
> jbd2_write_superblock+0xf1/0x230 [jbd2]
> ? account_entity_enqueue+0xc5/0xf0
> jbd2_journal_update_sb_log_tail+0x94/0xe0 [jbd2]
> jbd2_journal_commit_transaction+0x12f/0x1d20 [jbd2]
> ? __switch_to_asm+0x40/0x70
> ...
> ? lock_timer_base+0x67/0x80
> kjournald2+0x121/0x360 [jbd2]
> ? remove_wait_queue+0x60/0x60
> kthread+0xf8/0x130
> ? commit_timeout+0x10/0x10 [jbd2]
> ? kthread_bind+0x10/0x10
> ret_from_fork+0x35/0x40
>
> With __invalidate_device(), I no longer hit the BUG_ON with sync or
> unmount on the disconnected device.

Applied, thanks.

--
Jens Axboe

2019-08-01 01:20:11

by Munehisa Kamata

[permalink] [raw]
Subject: Re: [PATCH v2] nbd: replace kill_bdev() with __invalidate_device() again

On 7/31/2019 5:13 AM, SunKe wrote:
> From: Munehisa Kamata <[email protected]>
>
> Commit abbbdf12497d ("replace kill_bdev() with __invalidate_device()")
> once did this, but 29eaadc03649 ("nbd: stop using the bdev everywhere")
> resurrected kill_bdev() and it has been there since then. So buffer_head
> mappings still get killed on a server disconnection, and we can still
> hit the BUG_ON on a filesystem on the top of the nbd device.
>
> EXT4-fs (nbd0): mounted filesystem with ordered data mode. Opts: (null)
> block nbd0: Receive control failed (result -32)
> block nbd0: shutting down sockets
> print_req_error: I/O error, dev nbd0, sector 66264 flags 3000
> EXT4-fs warning (device nbd0): htree_dirblock_to_tree:979: inode #2: lblock 0: comm ls: error -5 reading directory block
> print_req_error: I/O error, dev nbd0, sector 2264 flags 3000
> EXT4-fs error (device nbd0): __ext4_get_inode_loc:4690: inode #2: block 283: comm ls: unable to read itable block
> EXT4-fs error (device nbd0) in ext4_reserve_inode_write:5894: IO failure
> ------------[ cut here ]------------
> kernel BUG at fs/buffer.c:3057!
> invalid opcode: 0000 [#1] SMP PTI
> CPU: 7 PID: 40045 Comm: jbd2/nbd0-8 Not tainted 5.1.0-rc3+ #4
> Hardware name: Amazon EC2 m5.12xlarge/, BIOS 1.0 10/16/2017
> RIP: 0010:submit_bh_wbc+0x18b/0x190
> ...
> Call Trace:
> jbd2_write_superblock+0xf1/0x230 [jbd2]
> ? account_entity_enqueue+0xc5/0xf0
> jbd2_journal_update_sb_log_tail+0x94/0xe0 [jbd2]
> jbd2_journal_commit_transaction+0x12f/0x1d20 [jbd2]
> ? __switch_to_asm+0x40/0x70
> ...
> ? lock_timer_base+0x67/0x80
> kjournald2+0x121/0x360 [jbd2]
> ? remove_wait_queue+0x60/0x60
> kthread+0xf8/0x130
> ? commit_timeout+0x10/0x10 [jbd2]
> ? kthread_bind+0x10/0x10
> ret_from_fork+0x35/0x40
>
> With __invalidate_device(), I no longer hit the BUG_ON with sync or
> unmount on the disconnected device.
>
> Fixes: 29eaadc03649 ("nbd: stop using the bdev everywhere")
> Cc: [email protected]
> Cc: Ratna Manoj Bolla <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: David Woodhouse <[email protected]>
> Signed-off-by: Munehisa Kamata <[email protected]>
>
> ---
> I reproduced this phenomenon on the fat file system.
> reproduce steps :
> 1.Establish a nbd connection.
> 2.Run two threads:one do mount and umount,anther one do clear_sock ioctl
> 3.Then hit the BUG_ON.
>
> v2: Delete a link.
>
> Signed-off-by: SunKe <[email protected]>
>
> drivers/block/nbd.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index 9bcde23..e21d2de 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -1231,7 +1231,7 @@ static void nbd_clear_sock_ioctl(struct nbd_device *nbd,
> struct block_device *bdev)
> {
> sock_shutdown(nbd);
> - kill_bdev(bdev);
> + __invalidate_device(bdev, true);
> nbd_bdev_reset(bdev);
> if (test_and_clear_bit(NBD_HAS_CONFIG_REF,
> &nbd->config->runtime_flags))
>

Hi SunKe,

I accidentally included the link in the original one. Sorry about that and thanks
for picking this up.

Regards,
Munehsia