2014-01-02 05:31:26

by Fengguang Wu

[permalink] [raw]
Subject: [block:for-3.14/core] kernel BUG at fs/bio.c:1748

Greetings,

We hit the below bug when doing write tests to btrfs.
Other filesystems (ext4, xfs) works fine. 2 full dmesgs are attached.

196d38bccfcfa32faed8c561868336fdfa0fe8e4 is the first bad commit
commit 196d38bccfcfa32faed8c561868336fdfa0fe8e4
Author: Kent Overstreet <[email protected]>
AuthorDate: Sat Nov 23 18:34:15 2013 -0800
Commit: Kent Overstreet <[email protected]>
CommitDate: Sat Nov 23 22:33:56 2013 -0800

block: Generic bio chaining

This adds a generic mechanism for chaining bio completions. This is
going to be used for a bio_split() replacement, and it turns out to be
very useful in a fair amount of driver code - a fair number of drivers
were implementing this in their own roundabout ways, often painfully.

Note that this means it's no longer to call bio_endio() more than once
on the same bio! This can cause problems for drivers that save/restore
bi_end_io. Arguably they shouldn't be saving/restoring bi_end_io at all
- in all but the simplest cases they'd be better off just cloning the
bio, and immutable biovecs is making bio cloning cheaper. But for now,
we add a bio_endio_nodec() for these cases.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>

drivers/md/bcache/io.c | 2 +-
drivers/md/dm-cache-target.c | 6 ++++
drivers/md/dm-snap.c | 1 +
drivers/md/dm-thin.c | 8 +++--
drivers/md/dm-verity.c | 2 +-
fs/bio-integrity.c | 2 +-
fs/bio.c | 76 ++++++++++++++++++++++++++++++++++++++++----
include/linux/bio.h | 2 ++
include/linux/blk_types.h | 2 ++
9 files changed, 90 insertions(+), 11 deletions(-)

[ 35.466413] random: nonblocking pool is initialized
[ 196.918039] ------------[ cut here ]------------
[ 196.919770] kernel BUG at fs/bio.c:1748!
[ 196.921505] invalid opcode: 0000 [#1] SMP
[ 196.921788] Modules linked in: microcode processor
[ 196.921788] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.13.0-rc6-01897-g2b48961 #1
[ 196.921788] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 196.921788] task: ffff8804094acad0 ti: ffff8804094e8000 task.ti: ffff8804094e8000
[ 196.921788] RIP: 0010:[<ffffffff811ef01e>] [<ffffffff811ef01e>] bio_endio+0x1e/0x6a
[ 196.921788] RSP: 0018:ffff88041fc83da8 EFLAGS: 00010046
[ 196.921788] RAX: 0000000000000000 RBX: 00000000fffffffb RCX: 00000001802a0002
[ 196.921788] RDX: 00000001802a0003 RSI: 0000000000000000 RDI: ffff8800299ff9e8
[ 196.921788] RBP: ffff88041fc83dc0 R08: ffffea00096cc980 R09: ffff8804097f5100
[ 196.921788] R10: ffffea000aeb8280 R11: ffffffff8143841e R12: ffff88025b326780
[ 196.921788] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000003000
[ 196.921788] FS: 0000000000000000(0000) GS:ffff88041fc80000(0000) knlGS:0000000000000000
[ 196.921788] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 196.921788] CR2: 00007f16e7a1948f CR3: 000000007f85e000 CR4: 00000000000006e0
[ 196.921788] Stack:
[ 196.921788] ffff8800299ff9e8 ffff8800299ff9e8 ffff88025b326780 ffff88041fc83de8
[ 196.921788] ffffffff81438429 00000000fffffffb ffff8803d36e6c00 0000000000000000
[ 196.921788] ffff88041fc83e10 ffffffff811ef063 ffff8802bae0a1e8 ffff8802bae0a1e8
[ 196.921788] Call Trace:
[ 196.921788] <IRQ>
[ 196.921788] [<ffffffff81438429>] btrfs_end_bio+0x116/0x11d
[ 196.921788] [<ffffffff811ef063>] bio_endio+0x63/0x6a
[ 196.921788] [<ffffffff814cb712>] blk_mq_complete_request+0x89/0xfe
[ 196.921788] [<ffffffff814cb79d>] __blk_mq_end_io+0x16/0x18
[ 196.921788] [<ffffffff814cb7bf>] blk_mq_end_io+0x20/0xb1
[ 196.921788] [<ffffffff815a1ba9>] virtblk_done+0xa4/0xf6
[ 196.921788] [<ffffffff8155c463>] vring_interrupt+0x7c/0x8a
[ 196.921788] [<ffffffff81107427>] handle_irq_event_percpu+0x4a/0x1bc
[ 196.921788] [<ffffffff811075de>] handle_irq_event+0x45/0x61
[ 196.921788] [<ffffffff81109f40>] handle_edge_irq+0xd9/0xfb
[ 196.921788] [<ffffffff81039f56>] handle_irq+0x21/0x2a
[ 196.921788] [<ffffffff81a0c3fd>] do_IRQ+0x4d/0xb4
[ 196.921788] [<ffffffff81a034f2>] common_interrupt+0x72/0x72
[ 196.921788] <EOI>
[ 196.921788] [<ffffffff81065bfa>] ? native_safe_halt+0x6/0x8
[ 196.921788] [<ffffffff8103f5d8>] default_idle+0x38/0xc1
[ 196.921788] [<ffffffff8103fd04>] arch_cpu_idle+0x18/0x28
[ 196.921788] [<ffffffff81106b6b>] cpu_startup_entry+0x178/0x269
[ 196.921788] [<ffffffff81116954>] ? clockevents_register_device+0x112/0x117
[ 196.921788] [<ffffffff8105ba60>] start_secondary+0x277/0x279
[ 196.921788] Code: ff ff eb bb 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 53 bb fb ff ff ff 48 85 ff 74 4c 8b 47 44 85 c0 7f 02 <0f> 0b 85 f6 74 07 f0 80 67 10 fe eb 09 48 8b 47 10 a8 01 0f 44
[ 196.921788] RIP [<ffffffff811ef01e>] bio_endio+0x1e/0x6a
[ 196.921788] RSP <ffff88041fc83da8>
[ 196.921788] ---[ end trace 0ec0fc28f7931a30 ]---
[ 196.921788] Kernel panic - not syncing: Fatal exception in interrupt
[ 196.921788] Rebooting in 10 seconds..

Thanks,
Fengguang


Attachments:
(No filename) (4.95 kB)
dmesg-dd (132.45 kB)
dmesg-fileio (65.96 kB)
Download all attachments

2014-01-03 19:51:34

by Muthu Kumar

[permalink] [raw]
Subject: Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748

Looks like Kent missed the btrfs endio in the original commit. How
about this patch:

---------

In btrfs_end_bio, call bio_endio_nodec on the restored bio so the
bi_remaining is accounted for correctly.

Reported-by: [email protected]
Cc: Kent Overstreet <[email protected]>
CC: Jens Axboe <[email protected]>
Signed-off-by: Muthukumar Ratty <[email protected]>
--------

fs/btrfs/volumes.c | 6 +++++-
1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f2130de..edfed52 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5316,7 +5316,11 @@ static void btrfs_end_bio(struct bio *bio, int err)
}
kfree(bbio);

- bio_endio(bio, err);
+ /*
+ * Call endio_nodec on the restored bio so the bi_remaining is
+ * accounted for correctly
+ */
+ bio_endio_nodec(bio, err);
} else if (!is_orig_bio) {
bio_put(bio);
}

On Wed, Jan 1, 2014 at 9:31 PM, <[email protected]> wrote:
> Greetings,
>
> We hit the below bug when doing write tests to btrfs.
> Other filesystems (ext4, xfs) works fine. 2 full dmesgs are attached.
>
> 196d38bccfcfa32faed8c561868336fdfa0fe8e4 is the first bad commit
> commit 196d38bccfcfa32faed8c561868336fdfa0fe8e4
> Author: Kent Overstreet <[email protected]>
> AuthorDate: Sat Nov 23 18:34:15 2013 -0800
> Commit: Kent Overstreet <[email protected]>
> CommitDate: Sat Nov 23 22:33:56 2013 -0800
>
> block: Generic bio chaining
>
> This adds a generic mechanism for chaining bio completions. This is
> going to be used for a bio_split() replacement, and it turns out to be
> very useful in a fair amount of driver code - a fair number of drivers
> were implementing this in their own roundabout ways, often painfully.
>
> Note that this means it's no longer to call bio_endio() more than once
> on the same bio! This can cause problems for drivers that save/restore
> bi_end_io. Arguably they shouldn't be saving/restoring bi_end_io at all
> - in all but the simplest cases they'd be better off just cloning the
> bio, and immutable biovecs is making bio cloning cheaper. But for now,
> we add a bio_endio_nodec() for these cases.
>
> Signed-off-by: Kent Overstreet <[email protected]>
> Cc: Jens Axboe <[email protected]>
>
> drivers/md/bcache/io.c | 2 +-
> drivers/md/dm-cache-target.c | 6 ++++
> drivers/md/dm-snap.c | 1 +
> drivers/md/dm-thin.c | 8 +++--
> drivers/md/dm-verity.c | 2 +-
> fs/bio-integrity.c | 2 +-
> fs/bio.c | 76 ++++++++++++++++++++++++++++++++++++++++----
> include/linux/bio.h | 2 ++
> include/linux/blk_types.h | 2 ++
> 9 files changed, 90 insertions(+), 11 deletions(-)
>
> [ 35.466413] random: nonblocking pool is initialized
> [ 196.918039] ------------[ cut here ]------------
> [ 196.919770] kernel BUG at fs/bio.c:1748!
> [ 196.921505] invalid opcode: 0000 [#1] SMP
> [ 196.921788] Modules linked in: microcode processor
> [ 196.921788] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.13.0-rc6-01897-g2b48961 #1
> [ 196.921788] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [ 196.921788] task: ffff8804094acad0 ti: ffff8804094e8000 task.ti: ffff8804094e8000
> [ 196.921788] RIP: 0010:[<ffffffff811ef01e>] [<ffffffff811ef01e>] bio_endio+0x1e/0x6a
> [ 196.921788] RSP: 0018:ffff88041fc83da8 EFLAGS: 00010046
> [ 196.921788] RAX: 0000000000000000 RBX: 00000000fffffffb RCX: 00000001802a0002
> [ 196.921788] RDX: 00000001802a0003 RSI: 0000000000000000 RDI: ffff8800299ff9e8
> [ 196.921788] RBP: ffff88041fc83dc0 R08: ffffea00096cc980 R09: ffff8804097f5100
> [ 196.921788] R10: ffffea000aeb8280 R11: ffffffff8143841e R12: ffff88025b326780
> [ 196.921788] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000003000
> [ 196.921788] FS: 0000000000000000(0000) GS:ffff88041fc80000(0000) knlGS:0000000000000000
> [ 196.921788] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 196.921788] CR2: 00007f16e7a1948f CR3: 000000007f85e000 CR4: 00000000000006e0
> [ 196.921788] Stack:
> [ 196.921788] ffff8800299ff9e8 ffff8800299ff9e8 ffff88025b326780 ffff88041fc83de8
> [ 196.921788] ffffffff81438429 00000000fffffffb ffff8803d36e6c00 0000000000000000
> [ 196.921788] ffff88041fc83e10 ffffffff811ef063 ffff8802bae0a1e8 ffff8802bae0a1e8
> [ 196.921788] Call Trace:
> [ 196.921788] <IRQ>
> [ 196.921788] [<ffffffff81438429>] btrfs_end_bio+0x116/0x11d
> [ 196.921788] [<ffffffff811ef063>] bio_endio+0x63/0x6a
> [ 196.921788] [<ffffffff814cb712>] blk_mq_complete_request+0x89/0xfe
> [ 196.921788] [<ffffffff814cb79d>] __blk_mq_end_io+0x16/0x18
> [ 196.921788] [<ffffffff814cb7bf>] blk_mq_end_io+0x20/0xb1
> [ 196.921788] [<ffffffff815a1ba9>] virtblk_done+0xa4/0xf6
> [ 196.921788] [<ffffffff8155c463>] vring_interrupt+0x7c/0x8a
> [ 196.921788] [<ffffffff81107427>] handle_irq_event_percpu+0x4a/0x1bc
> [ 196.921788] [<ffffffff811075de>] handle_irq_event+0x45/0x61
> [ 196.921788] [<ffffffff81109f40>] handle_edge_irq+0xd9/0xfb
> [ 196.921788] [<ffffffff81039f56>] handle_irq+0x21/0x2a
> [ 196.921788] [<ffffffff81a0c3fd>] do_IRQ+0x4d/0xb4
> [ 196.921788] [<ffffffff81a034f2>] common_interrupt+0x72/0x72
> [ 196.921788] <EOI>
> [ 196.921788] [<ffffffff81065bfa>] ? native_safe_halt+0x6/0x8
> [ 196.921788] [<ffffffff8103f5d8>] default_idle+0x38/0xc1
> [ 196.921788] [<ffffffff8103fd04>] arch_cpu_idle+0x18/0x28
> [ 196.921788] [<ffffffff81106b6b>] cpu_startup_entry+0x178/0x269
> [ 196.921788] [<ffffffff81116954>] ? clockevents_register_device+0x112/0x117
> [ 196.921788] [<ffffffff8105ba60>] start_secondary+0x277/0x279
> [ 196.921788] Code: ff ff eb bb 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 53 bb fb ff ff ff 48 85 ff 74 4c 8b 47 44 85 c0 7f 02 <0f> 0b 85 f6 74 07 f0 80 67 10 fe eb 09 48 8b 47 10 a8 01 0f 44
> [ 196.921788] RIP [<ffffffff811ef01e>] bio_endio+0x1e/0x6a
> [ 196.921788] RSP <ffff88041fc83da8>
> [ 196.921788] ---[ end trace 0ec0fc28f7931a30 ]---
> [ 196.921788] Kernel panic - not syncing: Fatal exception in interrupt
> [ 196.921788] Rebooting in 10 seconds..
>
> Thanks,
> Fengguang
>

2014-01-05 09:47:14

by Fengguang Wu

[permalink] [raw]
Subject: Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748

Hi Muthu,

On Fri, Jan 03, 2014 at 11:51:31AM -0800, Muthu Kumar wrote:
> Looks like Kent missed the btrfs endio in the original commit. How
> about this patch:
>
> ---------
>
> In btrfs_end_bio, call bio_endio_nodec on the restored bio so the
> bi_remaining is accounted for correctly.
>
> Reported-by: [email protected]
> Cc: Kent Overstreet <[email protected]>
> CC: Jens Axboe <[email protected]>
> Signed-off-by: Muthukumar Ratty <[email protected]>
> --------
>
> fs/btrfs/volumes.c | 6 +++++-
> 1 files changed, 5 insertions(+), 1 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index f2130de..edfed52 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -5316,7 +5316,11 @@ static void btrfs_end_bio(struct bio *bio, int err)
> }
> kfree(bbio);
>
> - bio_endio(bio, err);
> + /*
> + * Call endio_nodec on the restored bio so the bi_remaining is
> + * accounted for correctly
> + */
> + bio_endio_nodec(bio, err);
> } else if (!is_orig_bio) {
> bio_put(bio);
> }

Interestingly, the BUG message disappeared but it blocks the test run.
In the end, the test watchdog reboots the machine with SysRq:

2014-01-04 23:13:02 mount -t btrfs /dev/vda /fs/vda
[ 20.184264] btrfs: device fsid f0e06999-0518-47e0-a622-21b8749438be devid 1 transid 4 /dev/vda
[ 20.186552] btrfs: disk space caching is enabled
[ 131.360457] random: nonblocking pool is initialized
==> [ 1465.069342] SysRq : Emergency Sync
==> [ 1475.071055] SysRq : Resetting

Attached is the full dmesg for a good run (v3.13-rc7) and a bad run
(this patch).

Thanks,
Fengguang


Attachments:
(No filename) (1.75 kB)
dmesg-v3.13-rc7 (69.33 kB)
dmesg-bio_endio_nodec (62.32 kB)
Download all attachments

2014-01-05 16:28:59

by Muthu Kumar

[permalink] [raw]
Subject: Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748

Fengguang,
Instead of rebooting, can you trigger a crash dump when this happens
and send us the backtrace (to start with)?

Kent,
Did you do any btrfs test with your changes?

Regards,
Muthu

On Sun, Jan 5, 2014 at 1:46 AM, Fengguang Wu <[email protected]> wrote:
> Hi Muthu,
>
> On Fri, Jan 03, 2014 at 11:51:31AM -0800, Muthu Kumar wrote:
>> Looks like Kent missed the btrfs endio in the original commit. How
>> about this patch:
>>
>> ---------
>>
>> In btrfs_end_bio, call bio_endio_nodec on the restored bio so the
>> bi_remaining is accounted for correctly.
>>
>> Reported-by: [email protected]
>> Cc: Kent Overstreet <[email protected]>
>> CC: Jens Axboe <[email protected]>
>> Signed-off-by: Muthukumar Ratty <[email protected]>
>> --------
>>
>> fs/btrfs/volumes.c | 6 +++++-
>> 1 files changed, 5 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index f2130de..edfed52 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -5316,7 +5316,11 @@ static void btrfs_end_bio(struct bio *bio, int err)
>> }
>> kfree(bbio);
>>
>> - bio_endio(bio, err);
>> + /*
>> + * Call endio_nodec on the restored bio so the bi_remaining is
>> + * accounted for correctly
>> + */
>> + bio_endio_nodec(bio, err);
>> } else if (!is_orig_bio) {
>> bio_put(bio);
>> }
>
> Interestingly, the BUG message disappeared but it blocks the test run.
> In the end, the test watchdog reboots the machine with SysRq:
>
> 2014-01-04 23:13:02 mount -t btrfs /dev/vda /fs/vda
> [ 20.184264] btrfs: device fsid f0e06999-0518-47e0-a622-21b8749438be devid 1 transid 4 /dev/vda
> [ 20.186552] btrfs: disk space caching is enabled
> [ 131.360457] random: nonblocking pool is initialized
> ==> [ 1465.069342] SysRq : Emergency Sync
> ==> [ 1475.071055] SysRq : Resetting
>
> Attached is the full dmesg for a good run (v3.13-rc7) and a bad run
> (this patch).
>
> Thanks,
> Fengguang

2014-01-06 02:21:37

by Fengguang Wu

[permalink] [raw]
Subject: Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748

On Sun, Jan 05, 2014 at 08:28:57AM -0800, Muthu Kumar wrote:
> Fengguang,
> Instead of rebooting, can you trigger a crash dump when this happens
> and send us the backtrace (to start with)?

Muthu, good point! Attached is the full dmesg with backtrace:

[ 1398.988324] SysRq : Show Blocked State
[ 1398.992007] task PC stack pid father
[ 1398.992007] mount D 0000000000000002 0 2875 2870 0x00000000
[ 1398.992007] ffff88007f859a70 0000000000000082 ffff88007f859fd8 ffff8803d21c6c10
[ 1398.992007] 0000000000012fc0 ffff8803d21c6c10 0000000000000000 0000000000000000
[ 1398.992007] ffff8803d2d22068 0000000000000008 ffff88007f859a18 ffffffff814c2b62
[ 1398.992007] Call Trace:
[ 1398.992007] [<ffffffff814c2b62>] ? submit_bio+0x106/0x159
[ 1398.992007] [<ffffffff81431c6a>] ? __do_readpage+0x4b9/0x50e
[ 1398.992007] [<ffffffff81064a03>] ? kvm_clock_read+0x27/0x31
[ 1398.992007] [<ffffffff81064a16>] ? kvm_clock_get_cycles+0x9/0xb
[ 1398.992007] [<ffffffff811651a1>] ? filemap_fdatawait+0x23/0x23
[ 1398.992007] [<ffffffff819ff356>] schedule+0x6f/0x71
[ 1398.992007] [<ffffffff819ff59b>] io_schedule+0x8f/0xd6
[ 1398.992007] [<ffffffff811651af>] sleep_on_page+0xe/0x12
[ 1398.992007] [<ffffffff819ff861>] __wait_on_bit+0x48/0x7b
[ 1398.992007] [<ffffffff81165002>] wait_on_page_bit+0x7a/0x7c
[ 1398.992007] [<ffffffff810f7ee3>] ? autoremove_wake_function+0x34/0x34
[ 1398.992007] [<ffffffff81433eee>] read_extent_buffer_pages+0x1ae/0x23b
[ 1398.992007] [<ffffffff81410da7>] ? free_root_pointers+0x5b/0x5b
[ 1398.992007] [<ffffffff814123e5>] btree_read_extent_buffer_pages.constprop.48+0x66/0x100
[ 1398.992007] [<ffffffff814129d1>] read_tree_block+0x2f/0x47
[ 1398.992007] [<ffffffff814163e6>] open_ctree+0x1271/0x1adf
[ 1398.992007] [<ffffffff813f4243>] btrfs_mount+0x47b/0x771
[ 1398.992007] [<ffffffff814e1f8c>] ? get_from_free_list+0x41/0x4b
[ 1398.992007] [<ffffffff811c40bf>] mount_fs+0x15/0xae
[ 1398.992007] [<ffffffff811d9a52>] vfs_kern_mount+0x64/0xf6
[ 1398.992007] [<ffffffff811dbff6>] do_mount+0x781/0x878
[ 1398.992007] [<ffffffff8117d6c2>] ? strndup_user+0x3a/0xd6
[ 1398.992007] [<ffffffff811dc317>] SyS_mount+0x85/0xbe
[ 1398.992007] [<ffffffff81a09529>] system_call_fastpath+0x16/0x1b
[ 1398.992007] Sched Debug Version: v0.11, 3.13.0-rc6-00148-gc05f7ce #1

> Kent,
> Did you do any btrfs test with your changes?

Just try simple dd writes.

Thanks,
Fengguang


> Regards,
> Muthu
>
> On Sun, Jan 5, 2014 at 1:46 AM, Fengguang Wu <[email protected]> wrote:
> > Hi Muthu,
> >
> > On Fri, Jan 03, 2014 at 11:51:31AM -0800, Muthu Kumar wrote:
> >> Looks like Kent missed the btrfs endio in the original commit. How
> >> about this patch:
> >>
> >> ---------
> >>
> >> In btrfs_end_bio, call bio_endio_nodec on the restored bio so the
> >> bi_remaining is accounted for correctly.
> >>
> >> Reported-by: [email protected]
> >> Cc: Kent Overstreet <[email protected]>
> >> CC: Jens Axboe <[email protected]>
> >> Signed-off-by: Muthukumar Ratty <[email protected]>
> >> --------
> >>
> >> fs/btrfs/volumes.c | 6 +++++-
> >> 1 files changed, 5 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> >> index f2130de..edfed52 100644
> >> --- a/fs/btrfs/volumes.c
> >> +++ b/fs/btrfs/volumes.c
> >> @@ -5316,7 +5316,11 @@ static void btrfs_end_bio(struct bio *bio, int err)
> >> }
> >> kfree(bbio);
> >>
> >> - bio_endio(bio, err);
> >> + /*
> >> + * Call endio_nodec on the restored bio so the bi_remaining is
> >> + * accounted for correctly
> >> + */
> >> + bio_endio_nodec(bio, err);
> >> } else if (!is_orig_bio) {
> >> bio_put(bio);
> >> }
> >
> > Interestingly, the BUG message disappeared but it blocks the test run.
> > In the end, the test watchdog reboots the machine with SysRq:
> >
> > 2014-01-04 23:13:02 mount -t btrfs /dev/vda /fs/vda
> > [ 20.184264] btrfs: device fsid f0e06999-0518-47e0-a622-21b8749438be devid 1 transid 4 /dev/vda
> > [ 20.186552] btrfs: disk space caching is enabled
> > [ 131.360457] random: nonblocking pool is initialized
> > ==> [ 1465.069342] SysRq : Emergency Sync
> > ==> [ 1475.071055] SysRq : Resetting
> >
> > Attached is the full dmesg for a good run (v3.13-rc7) and a bad run
> > (this patch).
> >
> > Thanks,
> > Fengguang


Attachments:
(No filename) (4.40 kB)
dmesg-bio_endio_nodec-w (93.12 kB)
Download all attachments

2014-01-06 22:10:46

by Kent Overstreet

[permalink] [raw]
Subject: Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748

Chris, the patch below seems to be incorrect - with it we get hangs, so
bi_remaining (probably) isn't getting decremented when it should be. You sent
Jens fixes for btrfs which I somehow lost when I rebased, do you remember how
this is supposed to work? Looking at the code I'm not quite sure what's going on
here.

On Fri, Jan 03, 2014 at 11:51:31AM -0800, Muthu Kumar wrote:
> Looks like Kent missed the btrfs endio in the original commit. How
> about this patch:
>
> ---------
>
> In btrfs_end_bio, call bio_endio_nodec on the restored bio so the
> bi_remaining is accounted for correctly.
>
> Reported-by: [email protected]
> Cc: Kent Overstreet <[email protected]>
> CC: Jens Axboe <[email protected]>
> Signed-off-by: Muthukumar Ratty <[email protected]>
> --------
>
> fs/btrfs/volumes.c | 6 +++++-
> 1 files changed, 5 insertions(+), 1 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index f2130de..edfed52 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -5316,7 +5316,11 @@ static void btrfs_end_bio(struct bio *bio, int err)
> }
> kfree(bbio);
>
> - bio_endio(bio, err);
> + /*
> + * Call endio_nodec on the restored bio so the bi_remaining is
> + * accounted for correctly
> + */
> + bio_endio_nodec(bio, err);
> } else if (!is_orig_bio) {
> bio_put(bio);
> }

2014-01-07 00:47:43

by Muthu Kumar

[permalink] [raw]
Subject: Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748

OK, after a bit more staring I believe the correct fix is the following.

Fengguang, Please try this one?

Regards,
Muthu

------------
In btrfs_end_bio(), we increment bi_remaining if is_orig_bio. If not,
we restore the orig_bio but failed to increment bi_remaining for
orig_bio, which triggers a BUG_ON later when bio_endio is called. Fix
is to increment bi_remaining when we restore the orig bio as well.

Reported-by: [email protected]
CC: Kent Overstreet <[email protected]>
CC: Jens Axboe <[email protected]>
CC: Chris Mason <clm@fv
Signed-off-by: Muthukumar Ratty <[email protected]>
----------------
fs/btrfs/volumes.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 37972d5..2011cc0 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5297,9 +5297,9 @@ static void btrfs_end_bio(struct bio *bio, int err)
if (!is_orig_bio) {
bio_put(bio);
bio = bbio->orig_bio;
- } else {
- atomic_inc(&bio->bi_remaining);
}
+ atomic_inc(&bio->bi_remaining);
+
bio->bi_private = bbio->private;
bio->bi_end_io = bbio->end_io;
btrfs_io_bio(bio)->mirror_num = bbio->mirror_num;

--------------------------



On Mon, Jan 6, 2014 at 2:10 PM, Kent Overstreet <[email protected]> wrote:
> Chris, the patch below seems to be incorrect - with it we get hangs, so
> bi_remaining (probably) isn't getting decremented when it should be. You sent
> Jens fixes for btrfs which I somehow lost when I rebased, do you remember how
> this is supposed to work? Looking at the code I'm not quite sure what's going on
> here.
>
> On Fri, Jan 03, 2014 at 11:51:31AM -0800, Muthu Kumar wrote:
>> Looks like Kent missed the btrfs endio in the original commit. How
>> about this patch:
>>
>> ---------
>>
>> In btrfs_end_bio, call bio_endio_nodec on the restored bio so the
>> bi_remaining is accounted for correctly.
>>
>> Reported-by: [email protected]
>> Cc: Kent Overstreet <[email protected]>
>> CC: Jens Axboe <[email protected]>
>> Signed-off-by: Muthukumar Ratty <[email protected]>
>> --------
>>
>> fs/btrfs/volumes.c | 6 +++++-
>> 1 files changed, 5 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index f2130de..edfed52 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -5316,7 +5316,11 @@ static void btrfs_end_bio(struct bio *bio, int err)
>> }
>> kfree(bbio);
>>
>> - bio_endio(bio, err);
>> + /*
>> + * Call endio_nodec on the restored bio so the bi_remaining is
>> + * accounted for correctly
>> + */
>> + bio_endio_nodec(bio, err);
>> } else if (!is_orig_bio) {
>> bio_put(bio);
>> }

2014-01-07 02:52:19

by Kent Overstreet

[permalink] [raw]
Subject: Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748

On Mon, Jan 06, 2014 at 04:47:38PM -0800, Muthu Kumar wrote:
> OK, after a bit more staring I believe the correct fix is the following.

This code still confuses me but I think you're correct, the fix certainly
matches the evidence we have.

> Fengguang, Please try this one?
>
> Regards,
> Muthu
>
> ------------
> In btrfs_end_bio(), we increment bi_remaining if is_orig_bio. If not,
> we restore the orig_bio but failed to increment bi_remaining for
> orig_bio, which triggers a BUG_ON later when bio_endio is called. Fix
> is to increment bi_remaining when we restore the orig bio as well.
>
> Reported-by: [email protected]
> CC: Kent Overstreet <[email protected]>
> CC: Jens Axboe <[email protected]>
> CC: Chris Mason <clm@fv
> Signed-off-by: Muthukumar Ratty <[email protected]>
> ----------------
> fs/btrfs/volumes.c | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 37972d5..2011cc0 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -5297,9 +5297,9 @@ static void btrfs_end_bio(struct bio *bio, int err)
> if (!is_orig_bio) {
> bio_put(bio);
> bio = bbio->orig_bio;
> - } else {
> - atomic_inc(&bio->bi_remaining);
> }
> + atomic_inc(&bio->bi_remaining);
> +
> bio->bi_private = bbio->private;
> bio->bi_end_io = bbio->end_io;
> btrfs_io_bio(bio)->mirror_num = bbio->mirror_num;
>
> --------------------------
>
>
>
> On Mon, Jan 6, 2014 at 2:10 PM, Kent Overstreet <[email protected]> wrote:
> > Chris, the patch below seems to be incorrect - with it we get hangs, so
> > bi_remaining (probably) isn't getting decremented when it should be. You sent
> > Jens fixes for btrfs which I somehow lost when I rebased, do you remember how
> > this is supposed to work? Looking at the code I'm not quite sure what's going on
> > here.
> >
> > On Fri, Jan 03, 2014 at 11:51:31AM -0800, Muthu Kumar wrote:
> >> Looks like Kent missed the btrfs endio in the original commit. How
> >> about this patch:
> >>
> >> ---------
> >>
> >> In btrfs_end_bio, call bio_endio_nodec on the restored bio so the
> >> bi_remaining is accounted for correctly.
> >>
> >> Reported-by: [email protected]
> >> Cc: Kent Overstreet <[email protected]>
> >> CC: Jens Axboe <[email protected]>
> >> Signed-off-by: Muthukumar Ratty <[email protected]>
> >> --------
> >>
> >> fs/btrfs/volumes.c | 6 +++++-
> >> 1 files changed, 5 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> >> index f2130de..edfed52 100644
> >> --- a/fs/btrfs/volumes.c
> >> +++ b/fs/btrfs/volumes.c
> >> @@ -5316,7 +5316,11 @@ static void btrfs_end_bio(struct bio *bio, int err)
> >> }
> >> kfree(bbio);
> >>
> >> - bio_endio(bio, err);
> >> + /*
> >> + * Call endio_nodec on the restored bio so the bi_remaining is
> >> + * accounted for correctly
> >> + */
> >> + bio_endio_nodec(bio, err);
> >> } else if (!is_orig_bio) {
> >> bio_put(bio);
> >> }

2014-01-07 05:53:29

by Fengguang Wu

[permalink] [raw]
Subject: Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748

On Mon, Jan 06, 2014 at 04:47:38PM -0800, Muthu Kumar wrote:
> OK, after a bit more staring I believe the correct fix is the following.
>
> Fengguang, Please try this one?

Yes, it runs fine now!

Tested-by: Fengguang Wu <[email protected]>

Thanks,
Fengguang

> ------------
> In btrfs_end_bio(), we increment bi_remaining if is_orig_bio. If not,
> we restore the orig_bio but failed to increment bi_remaining for
> orig_bio, which triggers a BUG_ON later when bio_endio is called. Fix
> is to increment bi_remaining when we restore the orig bio as well.
>
> Reported-by: [email protected]
> CC: Kent Overstreet <[email protected]>
> CC: Jens Axboe <[email protected]>
> CC: Chris Mason <clm@fv
> Signed-off-by: Muthukumar Ratty <[email protected]>
> ----------------
> fs/btrfs/volumes.c | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 37972d5..2011cc0 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -5297,9 +5297,9 @@ static void btrfs_end_bio(struct bio *bio, int err)
> if (!is_orig_bio) {
> bio_put(bio);
> bio = bbio->orig_bio;
> - } else {
> - atomic_inc(&bio->bi_remaining);
> }
> + atomic_inc(&bio->bi_remaining);
> +
> bio->bi_private = bbio->private;
> bio->bi_end_io = bbio->end_io;
> btrfs_io_bio(bio)->mirror_num = bbio->mirror_num;
>
> --------------------------
>
>
>
> On Mon, Jan 6, 2014 at 2:10 PM, Kent Overstreet <[email protected]> wrote:
> > Chris, the patch below seems to be incorrect - with it we get hangs, so
> > bi_remaining (probably) isn't getting decremented when it should be. You sent
> > Jens fixes for btrfs which I somehow lost when I rebased, do you remember how
> > this is supposed to work? Looking at the code I'm not quite sure what's going on
> > here.
> >
> > On Fri, Jan 03, 2014 at 11:51:31AM -0800, Muthu Kumar wrote:
> >> Looks like Kent missed the btrfs endio in the original commit. How
> >> about this patch:
> >>
> >> ---------
> >>
> >> In btrfs_end_bio, call bio_endio_nodec on the restored bio so the
> >> bi_remaining is accounted for correctly.
> >>
> >> Reported-by: [email protected]
> >> Cc: Kent Overstreet <[email protected]>
> >> CC: Jens Axboe <[email protected]>
> >> Signed-off-by: Muthukumar Ratty <[email protected]>
> >> --------
> >>
> >> fs/btrfs/volumes.c | 6 +++++-
> >> 1 files changed, 5 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> >> index f2130de..edfed52 100644
> >> --- a/fs/btrfs/volumes.c
> >> +++ b/fs/btrfs/volumes.c
> >> @@ -5316,7 +5316,11 @@ static void btrfs_end_bio(struct bio *bio, int err)
> >> }
> >> kfree(bbio);
> >>
> >> - bio_endio(bio, err);
> >> + /*
> >> + * Call endio_nodec on the restored bio so the bi_remaining is
> >> + * accounted for correctly
> >> + */
> >> + bio_endio_nodec(bio, err);
> >> } else if (!is_orig_bio) {
> >> bio_put(bio);
> >> }

2014-01-07 20:15:49

by Muthu Kumar

[permalink] [raw]
Subject: Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748

Thanks Fengguang. Final patch with added comment. BTW, fengguang
mentioned that git-am has trouble with the inline patch and "quilt
import" worked fine for him...

------------
In btrfs_end_bio(), we increment bi_remaining if is_orig_bio. If not,
we restore the orig_bio but failed to increment bi_remaining for
orig_bio, which triggers a BUG_ON later when bio_endio is called. Fix
is to increment bi_remaining when we restore the orig bio as well.

Reported-and-Tested-by: Fengguang wu <[email protected]>
CC: Kent Overstreet <[email protected]>
CC: Jens Axboe <[email protected]>
CC: Chris Mason <[email protected]>
Signed-off-by: Muthukumar Ratty <[email protected]>

-----------
fs/btrfs/volumes.c | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 37972d5..34aba2b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5297,9 +5297,13 @@ static void btrfs_end_bio(struct bio *bio, int err)
if (!is_orig_bio) {
bio_put(bio);
bio = bbio->orig_bio;
- } else {
- atomic_inc(&bio->bi_remaining);
}
+ /*
+ * We have original bio now. So increment bi_remaining to
+ * account for it in endio
+ */
+ atomic_inc(&bio->bi_remaining);
+
bio->bi_private = bbio->private;
bio->bi_end_io = bbio->end_io;
btrfs_io_bio(bio)->mirror_num = bbio->mirror_num;

-------------------------------------

2014-01-07 20:30:07

by Chris Mason

[permalink] [raw]
Subject: Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748

On Tue, 2014-01-07 at 12:15 -0800, Muthu Kumar wrote:
+AD4- Thanks Fengguang. Final patch with added comment. BTW, fengguang
+AD4- mentioned that git-am has trouble with the inline patch and +ACI-quilt
+AD4- import+ACI- worked fine for him...
+AD4-
+AD4- ------------
+AD4- In btrfs+AF8-end+AF8-bio(), we increment bi+AF8-remaining if is+AF8-orig+AF8-bio. If not,
+AD4- we restore the orig+AF8-bio but failed to increment bi+AF8-remaining for
+AD4- orig+AF8-bio, which triggers a BUG+AF8-ON later when bio+AF8-endio is called. Fix
+AD4- is to increment bi+AF8-remaining when we restore the orig bio as well.
+AD4-

Hi everyone,

Which git tree is this against? Just Jens or some extra code too?

I'll run some tests here. My original patch is below (it's slightly
different from Muthu's).

Btrfs is sometimes calling bio+AF8-endio twice on the same bio while
we chain things. This makes sure we don't trip over new assertions in
fs/bio.c

Signed-off-by: Chris Mason +ADw-clm+AEA-fb.com+AD4-

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index 7fcac70..5b30360 100644
--- a/fs/btrfs/check-integrity.c
+-+-+- b/fs/btrfs/check-integrity.c
+AEAAQA- -2289,6 +-2289,10 +AEAAQA- static void btrfsic+AF8-bio+AF8-end+AF8-io(struct bio +ACo-bp, int bio+AF8-error+AF8-status)
block +AD0- next+AF8-block+ADs-
+AH0- while (NULL +ACEAPQ- block)+ADs-

+- /+ACo-
+- +ACo- since we're not using bio+AF8-endio here, we don't need to worry about
+- +ACo- the remaining count
+- +ACo-/
bp-+AD4-bi+AF8-end+AF8-io(bp, bio+AF8-error+AF8-status)+ADs-
+AH0-

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 62176ad..786ddac 100644
--- a/fs/btrfs/disk-io.c
+-+-+- b/fs/btrfs/disk-io.c
+AEAAQA- -1684,7 +-1684,7 +AEAAQA- static void end+AF8-workqueue+AF8-fn(struct btrfs+AF8-work +ACo-work)
bio-+AD4-bi+AF8-private +AD0- end+AF8-io+AF8-wq-+AD4-private+ADs-
bio-+AD4-bi+AF8-end+AF8-io +AD0- end+AF8-io+AF8-wq-+AD4-end+AF8-io+ADs-
kfree(end+AF8-io+AF8-wq)+ADs-
- bio+AF8-endio(bio, error)+ADs-
+- bio+AF8-endio+AF8-nodec(bio, error)+ADs-
+AH0-

static int cleaner+AF8-kthread(void +ACo-arg)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ef48947..a31448f 100644
--- a/fs/btrfs/volumes.c
+-+-+- b/fs/btrfs/volumes.c
+AEAAQA- -5284,9 +-5284,17 +AEAAQA- static void btrfs+AF8-end+AF8-bio(struct bio +ACo-bio, int err)
+AH0-
+AH0-

- if (bio +AD0APQ- bbio-+AD4-orig+AF8-bio)
+- if (bio +AD0APQ- bbio-+AD4-orig+AF8-bio) +AHs-
is+AF8-orig+AF8-bio +AD0- 1+ADs-

+- /+ACo-
+- +ACo- eventually we will call the bi+AF8-endio for the original bio,
+- +ACo- make sure that we've properly bumped bi+AF8-remaining to reflect
+- +ACo- our chain of endios here
+- +ACo-/
+- atomic+AF8-inc(+ACY-bio-+AD4-bi+AF8-remaining)+ADs-
+- +AH0-
+-
if (atomic+AF8-dec+AF8-and+AF8-test(+ACY-bbio-+AD4-stripes+AF8-pending)) +AHs-
if (+ACE-is+AF8-orig+AF8-bio) +AHs-
bio+AF8-put(bio)+ADs-

2014-01-07 21:23:13

by Muthu Kumar

[permalink] [raw]
Subject: Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748

Chris,
This is based off of Jens block tree, for-3.14/core branch...

Regards,
Muthu

On Tue, Jan 7, 2014 at 12:29 PM, Chris Mason <[email protected]> wrote:
> On Tue, 2014-01-07 at 12:15 -0800, Muthu Kumar wrote:
>> Thanks Fengguang. Final patch with added comment. BTW, fengguang
>> mentioned that git-am has trouble with the inline patch and "quilt
>> import" worked fine for him...
>>
>> ------------
>> In btrfs_end_bio(), we increment bi_remaining if is_orig_bio. If not,
>> we restore the orig_bio but failed to increment bi_remaining for
>> orig_bio, which triggers a BUG_ON later when bio_endio is called. Fix
>> is to increment bi_remaining when we restore the orig bio as well.
>>
>
> Hi everyone,
>
> Which git tree is this against? Just Jens or some extra code too?
>
> I'll run some tests here. My original patch is below (it's slightly
> different from Muthu's).
>
> Btrfs is sometimes calling bio_endio twice on the same bio while
> we chain things. This makes sure we don't trip over new assertions in
> fs/bio.c
>
> Signed-off-by: Chris Mason <[email protected]>
>
> diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
> index 7fcac70..5b30360 100644
> --- a/fs/btrfs/check-integrity.c
> +++ b/fs/btrfs/check-integrity.c
> @@ -2289,6 +2289,10 @@ static void btrfsic_bio_end_io(struct bio *bp, int bio_error_status)
> block = next_block;
> } while (NULL != block);
>
> + /*
> + * since we're not using bio_endio here, we don't need to worry about
> + * the remaining count
> + */
> bp->bi_end_io(bp, bio_error_status);
> }
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 62176ad..786ddac 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1684,7 +1684,7 @@ static void end_workqueue_fn(struct btrfs_work *work)
> bio->bi_private = end_io_wq->private;
> bio->bi_end_io = end_io_wq->end_io;
> kfree(end_io_wq);
> - bio_endio(bio, error);
> + bio_endio_nodec(bio, error);
> }
>
> static int cleaner_kthread(void *arg)
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index ef48947..a31448f 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -5284,9 +5284,17 @@ static void btrfs_end_bio(struct bio *bio, int err)
> }
> }
>
> - if (bio == bbio->orig_bio)
> + if (bio == bbio->orig_bio) {
> is_orig_bio = 1;
>
> + /*
> + * eventually we will call the bi_endio for the original bio,
> + * make sure that we've properly bumped bi_remaining to reflect
> + * our chain of endios here
> + */
> + atomic_inc(&bio->bi_remaining);
> + }
> +
> if (atomic_dec_and_test(&bbio->stripes_pending)) {
> if (!is_orig_bio) {
> bio_put(bio);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2014-01-08 19:41:28

by Chris Mason

[permalink] [raw]
Subject: Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748

On Tue, 2014-01-07 at 13:23 -0800, Muthu Kumar wrote:
+AD4- Chris,
+AD4- This is based off of Jens block tree, for-3.14/core branch...
+AD4-

Ok, Kent did pull in one of my hunks, one was a comment and the third
was effectively the same as your patch. I tried to test the end result
today, but get these on boot with ext4:

+AFs- 8.336061+AF0- WARNING: CPU: 0 PID: 0 at fs/bio.c:1778 bio+AF8-endio+-0xbe/0x100()
+AFs- 8.336062+AF0- bio+AF8-endio: bio for (unknown) without endio
+AFs- 8.336063+AF0- Modules linked in: megaraid+AF8-sas(+-)
+AFs- 8.336065+AF0- CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.0-rc7-mason+- +ACM-1
+AFs- 8.336066+AF0- Hardware name: ZTSYSTEMS Echo Ridge T4 /A9DRPF-10D, BIOS 1.07 05/10/2012
+AFs- 8.336069+AF0- 00000000000006f2 ffff88087fc03c28 ffffffff815cb8c6 00000000000006f2
+AFs- 8.336071+AF0- ffff88087fc03c78 ffff88087fc03c68 ffffffff81047497 ffff88085561a8e8
+AFs- 8.336073+AF0- ffff8808582b6d80 00000000000000fe 00000000fffffffb ffff8808582b6d80
+AFs- 8.336073+AF0- Call Trace:
+AFs- 8.336078+AF0- +ADw-IRQ+AD4- +AFsAPA-ffffffff815cb8c6+AD4AXQ- dump+AF8-stack+-0x49/0x5b
+AFs- 8.336082+AF0- +AFsAPA-ffffffff81047497+AD4AXQ- warn+AF8-slowpath+AF8-common+-0x87/0xb0
+AFs- 8.336084+AF0- +AFsAPA-ffffffff81047561+AD4AXQ- warn+AF8-slowpath+AF8-fmt+-0x41/0x50
+AFs- 8.336086+AF0- +AFsAPA-ffffffff813aa6b8+AD4AXQ- ? scsi+AF8-request+AF8-fn+-0xc8/0x6a0
+AFs- 8.336087+AF0- +AFsAPA-ffffffff8119bc8e+AD4AXQ- bio+AF8-endio+-0xbe/0x100
+AFs- 8.336091+AF0- +AFsAPA-ffffffff8128c1d3+AD4AXQ- blk+AF8-update+AF8-request+-0x243/0x3a0
+AFs- 8.336092+AF0- +AFsAPA-ffffffff8128c352+AD4AXQ- blk+AF8-update+AF8-bidi+AF8-request+-0x22/0xa0
+AFs- 8.336094+AF0- +AFsAPA-ffffffff8128ceca+AD4AXQ- blk+AF8-end+AF8-bidi+AF8-request+-0x2a/0x80
+AFs- 8.336096+AF0- +AFsAPA-ffffffff8128cf5b+AD4AXQ- blk+AF8-end+AF8-request+-0xb/0x10
+AFs- 8.336098+AF0- +AFsAPA-ffffffff813ab916+AD4AXQ- scsi+AF8-io+AF8-completion+-0xa6/0x700
+AFs- 8.336100+AF0- +AFsAPA-ffffffff813a2b68+AD4AXQ- scsi+AF8-finish+AF8-command+-0xc8/0x130
+AFs- 8.336101+AF0- +AFsAPA-ffffffff813ac0bf+AD4AXQ- scsi+AF8-softirq+AF8-done+-0x13f/0x160
+AFs- 8.336104+AF0- +AFsAPA-ffffffff812937ad+AD4AXQ- blk+AF8-done+AF8-softirq+-0x6d/0x80
+AFs- 8.336106+AF0- +AFsAPA-ffffffff8104c26b+AD4AXQ- +AF8AXw-do+AF8-softirq+-0xdb/0x290
+AFs- 8.336108+AF0- +AFsAPA-ffffffff8104c51d+AD4AXQ- irq+AF8-exit+-0xbd/0xd0
+AFs- 8.336110+AF0- +AFsAPA-ffffffff81003db1+AD4AXQ- do+AF8-IRQ+-0x61/0xe0
+AFs- 8.336112+AF0- +AFsAPA-ffffffff815d012a+AD4AXQ- common+AF8-interrupt+-0x6a/0x6a
+AFs- 8.336117+AF0- +ADw-EOI+AD4- +AFsAPA-ffffffff814e213a+AD4AXQ- ? cpuidle+AF8-enter+AF8-state+-0x4a/0xc0
+AFs- 8.336119+AF0- +AFsAPA-ffffffff814e2136+AD4AXQ- ? cpuidle+AF8-enter+AF8-state+-0x46/0xc0
+AFs- 8.336121+AF0- +AFsAPA-ffffffff814e2277+AD4AXQ- cpuidle+AF8-idle+AF8-call+-0xc7/0x160
+AFs- 8.336123+AF0- +AFsAPA-ffffffff8100b2c9+AD4AXQ- arch+AF8-cpu+AF8-idle+-0x9/0x20
+AFs- 8.336126+AF0- +AFsAPA-ffffffff8108fd8a+AD4AXQ- cpu+AF8-startup+AF8-entry+-0x9a/0x250
+AFs- 8.336128+AF0- +AFsAPA-ffffffff815c3702+AD4AXQ- rest+AF8-init+-0x72/0x80
+AFs- 8.336131+AF0- +AFsAPA-ffffffff81ac2047+AD4AXQ- start+AF8-kernel+-0x3fd/0x40a
+AFs- 8.336133+AF0- +AFsAPA-ffffffff81ac1a78+AD4AXQ- ? repair+AF8-env+AF8-string+-0x5b/0x5b
+AFs- 8.336134+AF0- +AFsAPA-ffffffff81ac159d+AD4AXQ- x86+AF8-64+AF8-start+AF8-reservations+-0x2a/0x2c
+AFs- 8.336136+AF0- +AFsAPA-ffffffff81ac16df+AD4AXQ- x86+AF8-64+AF8-start+AF8-kernel+-0x140/0x147
+AFs- 8.336137+AF0- ---+AFs- end trace d0966e2430ea53b4 +AF0----
+AFs- 8.336146+AF0- ------------+AFs- cut here +AF0-------------
+AFs- 8.336146+AF0- kernel BUG at fs/bio.c:523+ACE-
+AFs- 8.336148+AF0- invalid opcode: 0000 +AFsAIw-1+AF0- SMP
+AFs- 8.336148+AF0- Modules linked in: megaraid+AF8-sas(+-)
+AFs- 8.336150+AF0- CPU: 0 PID: 2911 Comm: scsi+AF8-id Tainted: G W 3.13.0-rc7-mason+- +ACM-1
+AFs- 8.336150+AF0- Hardware name: ZTSYSTEMS Echo Ridge T4 /A9DRPF-10D, BIOS 1.07 05/10/2012
+AFs- 8.336151+AF0- task: ffff8808556b4150 ti: ffff8808556b6000 task.ti: ffff8808556b6000
+AFs- 8.336153+AF0- RIP: 0010:+AFsAPA-ffffffff8119bbba+AD4AXQ- +AFsAPA-ffffffff8119bbba+AD4AXQ- bio+AF8-put+-0x8a/0xa0
+AFs- 8.336153+AF0- RSP: 0018:ffff8808556b7b68 EFLAGS: 00010246
+AFs- 8.336154+AF0- RAX: 0000000000000000 RBX: ffff8808582b6d80 RCX: 0000000000000000
+AFs- 8.336155+AF0- RDX: ffff8808582b6dec RSI: 0000000000000003 RDI: ffff8808582b6d80
+AFs- 8.336155+AF0- RBP: ffff8808556b7b78 R08: 0000000000000004 R09: 0000000000000000
+AFs- 8.336156+AF0- R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
+AFs- 8.336156+AF0- R13: 0000000000000000 R14: ffff8808567ebe28 R15: ffff8808582b6d80
+AFs- 8.336157+AF0- FS: 00007f16056bd700(0000) GS:ffff88087fc00000(0000) knlGS:0000000000000000
+AFs- 8.336158+AF0- CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
+AFs- 8.336159+AF0- CR2: ffffe8f7ffc00000 CR3: 0000000856303000 CR4: 00000000000407f0
+AFs- 8.336159+AF0- Stack:
+AFs- 8.336164+AF0- ffff8808582b6d80 0000000000000000 ffff8808556b7ba8 ffffffff81291b37
+AFs- 8.336168+AF0- ffff8808556b7b88 ffff8808556b7cf8 ffff88085561a8e8 ffff880855685400
+AFs- 8.336172+AF0- ffff8808556b7c78 ffffffff8129b42d ffff8808556b7be8 ffffffff8119e09b
+AFs- 8.336172+AF0- Call Trace:
+AFs- 8.336174+AF0- +AFsAPA-ffffffff81291b37+AD4AXQ- blk+AF8-rq+AF8-unmap+AF8-user+-0x47/0x60
+AFs- 8.336177+AF0- +AFsAPA-ffffffff8129b42d+AD4AXQ- sg+AF8-io+-0x26d/0x370
+AFs- 8.336179+AF0- +AFsAPA-ffffffff8119e09b+AD4AXQ- ? bdget+-0x11b/0x130
+AFs- 8.336183+AF0- +AFsAPA-ffffffff811068c9+AD4AXQ- ? find+AF8-get+AF8-page+-0x19/0xa0
+AFs- 8.336185+AF0- +AFsAPA-ffffffff8129bc79+AD4AXQ- scsi+AF8-cmd+AF8-ioctl+-0x409/0x480
+AFs- 8.336186+AF0- +AFsAPA-ffffffff81106af2+AD4AXQ- ? unlock+AF8-page+-0x22/0x30
+AFs- 8.336189+AF0- +AFsAPA-ffffffff81130949+AD4AXQ- ? +AF8AXw-do+AF8-fault+-0x439/0x560
+AFs- 8.336191+AF0- +AFsAPA-ffffffff8129bd3c+AD4AXQ- scsi+AF8-cmd+AF8-blk+AF8-ioctl+-0x4c/0x70
+AFs- 8.336194+AF0- +AFsAPA-ffffffff81437d6f+AD4AXQ- sd+AF8-ioctl+-0xcf/0x160
+AFs- 8.336196+AF0- +AFsAPA-ffffffff81298003+AD4AXQ- +AF8AXw-blkdev+AF8-driver+AF8-ioctl+-0x23/0x30
+AFs- 8.336198+AF0- +AFsAPA-ffffffff81298638+AD4AXQ- blkdev+AF8-ioctl+-0x1f8/0x790
+AFs- 8.336199+AF0- +AFsAPA-ffffffff8119d717+AD4AXQ- block+AF8-ioctl+-0x37/0x40
+AFs- 8.336201+AF0- +AFsAPA-ffffffff811790c7+AD4AXQ- do+AF8-vfs+AF8-ioctl+-0x87/0x4f0
+AFs- 8.336204+AF0- +AFsAPA-ffffffff8126374a+AD4AXQ- ? file+AF8-has+AF8-perm+-0x8a/0xa0
+AFs- 8.336205+AF0- +AFsAPA-ffffffff811795c1+AD4AXQ- SyS+AF8-ioctl+-0x91/0xa0
+AFs- 8.336207+AF0- +AFsAPA-ffffffff815d77e2+AD4AXQ- system+AF8-call+AF8-fastpath+-0x16/0x1b
+AFs- 8.336218+AF0- Code: 8b 74 24 10 48 29 fb 48 89 df e8 a2 d2 f6 ff 48 8b 1c 24 4c 8b 64 24 08 c9 c3 0f 1f 80 00 00 00 00 48 89 df e8 38 60 fb ff eb 9a +ADw-0f+AD4- 0b 0f 1f 40 00 eb f
a 66 66 66 66 66 2e 0f 1f 84 00 00 00 00
+AFs- 8.336220+AF0- RIP +AFsAPA-ffffffff8119bbba+AD4AXQ- bio+AF8-put+-0x8a/0xa0
+AFs- 8.336220+AF0- RSP +ADw-ffff8808556b7b68+AD4-
+AFs- 8.336221+AF0- ---+AFs- end trace d0966e2430ea53b5 +AF0----

Trying to track it down.

-chris

2014-01-08 19:54:16

by Muthu Kumar

[permalink] [raw]
Subject: Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748

Chris,

[ 8.336061] WARNING: CPU: 0 PID: 0 at fs/bio.c:1778 bio_endio+0xbe/0x100()
[ 8.336062] bio_endio: bio for (unknown) without endio

This is my recent change to avoid memory leak in bio_endio. But I
think the problem is higher up, most likely bio_endio is called twice
on the same bio (which was freed before).

Are you running the unmodified for-3.14/core or do you have local patches?


Regards,
Muthu

On Wed, Jan 8, 2014 at 11:41 AM, Chris Mason <[email protected]> wrote:
> On Tue, 2014-01-07 at 13:23 -0800, Muthu Kumar wrote:
>> Chris,
>> This is based off of Jens block tree, for-3.14/core branch...
>>
>
> Ok, Kent did pull in one of my hunks, one was a comment and the third
> was effectively the same as your patch. I tried to test the end result
> today, but get these on boot with ext4:
>
> [ 8.336061] WARNING: CPU: 0 PID: 0 at fs/bio.c:1778 bio_endio+0xbe/0x100()
> [ 8.336062] bio_endio: bio for (unknown) without endio
> [ 8.336063] Modules linked in: megaraid_sas(+)
> [ 8.336065] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.0-rc7-mason+ #1
> [ 8.336066] Hardware name: ZTSYSTEMS Echo Ridge T4 /A9DRPF-10D, BIOS 1.07 05/10/2012
> [ 8.336069] 00000000000006f2 ffff88087fc03c28 ffffffff815cb8c6 00000000000006f2
> [ 8.336071] ffff88087fc03c78 ffff88087fc03c68 ffffffff81047497 ffff88085561a8e8
> [ 8.336073] ffff8808582b6d80 00000000000000fe 00000000fffffffb ffff8808582b6d80
> [ 8.336073] Call Trace:
> [ 8.336078] <IRQ> [<ffffffff815cb8c6>] dump_stack+0x49/0x5b
> [ 8.336082] [<ffffffff81047497>] warn_slowpath_common+0x87/0xb0
> [ 8.336084] [<ffffffff81047561>] warn_slowpath_fmt+0x41/0x50
> [ 8.336086] [<ffffffff813aa6b8>] ? scsi_request_fn+0xc8/0x6a0
> [ 8.336087] [<ffffffff8119bc8e>] bio_endio+0xbe/0x100
> [ 8.336091] [<ffffffff8128c1d3>] blk_update_request+0x243/0x3a0
> [ 8.336092] [<ffffffff8128c352>] blk_update_bidi_request+0x22/0xa0
> [ 8.336094] [<ffffffff8128ceca>] blk_end_bidi_request+0x2a/0x80
> [ 8.336096] [<ffffffff8128cf5b>] blk_end_request+0xb/0x10
> [ 8.336098] [<ffffffff813ab916>] scsi_io_completion+0xa6/0x700
> [ 8.336100] [<ffffffff813a2b68>] scsi_finish_command+0xc8/0x130
> [ 8.336101] [<ffffffff813ac0bf>] scsi_softirq_done+0x13f/0x160
> [ 8.336104] [<ffffffff812937ad>] blk_done_softirq+0x6d/0x80
> [ 8.336106] [<ffffffff8104c26b>] __do_softirq+0xdb/0x290
> [ 8.336108] [<ffffffff8104c51d>] irq_exit+0xbd/0xd0
> [ 8.336110] [<ffffffff81003db1>] do_IRQ+0x61/0xe0
> [ 8.336112] [<ffffffff815d012a>] common_interrupt+0x6a/0x6a
> [ 8.336117] <EOI> [<ffffffff814e213a>] ? cpuidle_enter_state+0x4a/0xc0
> [ 8.336119] [<ffffffff814e2136>] ? cpuidle_enter_state+0x46/0xc0
> [ 8.336121] [<ffffffff814e2277>] cpuidle_idle_call+0xc7/0x160
> [ 8.336123] [<ffffffff8100b2c9>] arch_cpu_idle+0x9/0x20
> [ 8.336126] [<ffffffff8108fd8a>] cpu_startup_entry+0x9a/0x250
> [ 8.336128] [<ffffffff815c3702>] rest_init+0x72/0x80
> [ 8.336131] [<ffffffff81ac2047>] start_kernel+0x3fd/0x40a
> [ 8.336133] [<ffffffff81ac1a78>] ? repair_env_string+0x5b/0x5b
> [ 8.336134] [<ffffffff81ac159d>] x86_64_start_reservations+0x2a/0x2c
> [ 8.336136] [<ffffffff81ac16df>] x86_64_start_kernel+0x140/0x147
> [ 8.336137] ---[ end trace d0966e2430ea53b4 ]---
> [ 8.336146] ------------[ cut here ]------------
> [ 8.336146] kernel BUG at fs/bio.c:523!
> [ 8.336148] invalid opcode: 0000 [#1] SMP
> [ 8.336148] Modules linked in: megaraid_sas(+)
> [ 8.336150] CPU: 0 PID: 2911 Comm: scsi_id Tainted: G W 3.13.0-rc7-mason+ #1
> [ 8.336150] Hardware name: ZTSYSTEMS Echo Ridge T4 /A9DRPF-10D, BIOS 1.07 05/10/2012
> [ 8.336151] task: ffff8808556b4150 ti: ffff8808556b6000 task.ti: ffff8808556b6000
> [ 8.336153] RIP: 0010:[<ffffffff8119bbba>] [<ffffffff8119bbba>] bio_put+0x8a/0xa0
> [ 8.336153] RSP: 0018:ffff8808556b7b68 EFLAGS: 00010246
> [ 8.336154] RAX: 0000000000000000 RBX: ffff8808582b6d80 RCX: 0000000000000000
> [ 8.336155] RDX: ffff8808582b6dec RSI: 0000000000000003 RDI: ffff8808582b6d80
> [ 8.336155] RBP: ffff8808556b7b78 R08: 0000000000000004 R09: 0000000000000000
> [ 8.336156] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
> [ 8.336156] R13: 0000000000000000 R14: ffff8808567ebe28 R15: ffff8808582b6d80
> [ 8.336157] FS: 00007f16056bd700(0000) GS:ffff88087fc00000(0000) knlGS:0000000000000000
> [ 8.336158] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 8.336159] CR2: ffffe8f7ffc00000 CR3: 0000000856303000 CR4: 00000000000407f0
> [ 8.336159] Stack:
> [ 8.336164] ffff8808582b6d80 0000000000000000 ffff8808556b7ba8 ffffffff81291b37
> [ 8.336168] ffff8808556b7b88 ffff8808556b7cf8 ffff88085561a8e8 ffff880855685400
> [ 8.336172] ffff8808556b7c78 ffffffff8129b42d ffff8808556b7be8 ffffffff8119e09b
> [ 8.336172] Call Trace:
> [ 8.336174] [<ffffffff81291b37>] blk_rq_unmap_user+0x47/0x60
> [ 8.336177] [<ffffffff8129b42d>] sg_io+0x26d/0x370
> [ 8.336179] [<ffffffff8119e09b>] ? bdget+0x11b/0x130
> [ 8.336183] [<ffffffff811068c9>] ? find_get_page+0x19/0xa0
> [ 8.336185] [<ffffffff8129bc79>] scsi_cmd_ioctl+0x409/0x480
> [ 8.336186] [<ffffffff81106af2>] ? unlock_page+0x22/0x30
> [ 8.336189] [<ffffffff81130949>] ? __do_fault+0x439/0x560
> [ 8.336191] [<ffffffff8129bd3c>] scsi_cmd_blk_ioctl+0x4c/0x70
> [ 8.336194] [<ffffffff81437d6f>] sd_ioctl+0xcf/0x160
> [ 8.336196] [<ffffffff81298003>] __blkdev_driver_ioctl+0x23/0x30
> [ 8.336198] [<ffffffff81298638>] blkdev_ioctl+0x1f8/0x790
> [ 8.336199] [<ffffffff8119d717>] block_ioctl+0x37/0x40
> [ 8.336201] [<ffffffff811790c7>] do_vfs_ioctl+0x87/0x4f0
> [ 8.336204] [<ffffffff8126374a>] ? file_has_perm+0x8a/0xa0
> [ 8.336205] [<ffffffff811795c1>] SyS_ioctl+0x91/0xa0
> [ 8.336207] [<ffffffff815d77e2>] system_call_fastpath+0x16/0x1b
> [ 8.336218] Code: 8b 74 24 10 48 29 fb 48 89 df e8 a2 d2 f6 ff 48 8b 1c 24 4c 8b 64 24 08 c9 c3 0f 1f 80 00 00 00 00 48 89 df e8 38 60 fb ff eb 9a <0f> 0b 0f 1f 40 00 eb f
> a 66 66 66 66 66 2e 0f 1f 84 00 00 00 00
> [ 8.336220] RIP [<ffffffff8119bbba>] bio_put+0x8a/0xa0
> [ 8.336220] RSP <ffff8808556b7b68>
> [ 8.336221] ---[ end trace d0966e2430ea53b5 ]---
>
> Trying to track it down.
>
> -chris

2014-01-08 20:16:59

by Chris Mason

[permalink] [raw]
Subject: Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748

On Wed, 2014-01-08 at 11:54 -0800, Muthu Kumar wrote:
+AD4- Chris,
+AD4-
+AD4- +AFs- 8.336061+AF0- WARNING: CPU: 0 PID: 0 at fs/bio.c:1778 bio+AF8-endio+-0xbe/0x100()
+AD4- +AFs- 8.336062+AF0- bio+AF8-endio: bio for (unknown) without endio
+AD4-
+AD4- This is my recent change to avoid memory leak in bio+AF8-endio. But I
+AD4- think the problem is higher up, most likely bio+AF8-endio is called twice
+AD4- on the same bio (which was freed before).
+AD4-

I think these are just two separate problems. Lets ignore the WARN+AF8-ON
for now.

+AD4- Are you running the unmodified for-3.14/core or do you have local patches?
+AD4-

It's for-3.14/core with my btrfs branch. Basically rc7 instead of rc6
but no changes to the block layer. I hadn't mounted btrfs yet.

-chris

2014-01-08 20:41:00

by Muthu Kumar

[permalink] [raw]
Subject: Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748

On Wed, Jan 8, 2014 at 12:16 PM, Chris Mason <[email protected]> wrote:
> On Wed, 2014-01-08 at 11:54 -0800, Muthu Kumar wrote:
>> Chris,
>>
>> [ 8.336061] WARNING: CPU: 0 PID: 0 at fs/bio.c:1778 bio_endio+0xbe/0x100()
>> [ 8.336062] bio_endio: bio for (unknown) without endio
>>
>> This is my recent change to avoid memory leak in bio_endio. But I
>> think the problem is higher up, most likely bio_endio is called twice
>> on the same bio (which was freed before).
>>
>
> I think these are just two separate problems. Lets ignore the WARN_ON
> for now.
>

Not really... the BUG that is triggered:

kernel BUG at fs/bio.c:523!

It is in bio_put() (added to bio_endio() as part of recent change)
which gets an already freed bio.

>> Are you running the unmodified for-3.14/core or do you have local patches?
>>
>
> It's for-3.14/core with my btrfs branch. Basically rc7 instead of rc6
> but no changes to the block layer. I hadn't mounted btrfs yet.
>
> -chris
>

2014-01-08 20:51:50

by Chris Mason

[permalink] [raw]
Subject: Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748

On Wed, 2014-01-08 at 12:40 -0800, Muthu Kumar wrote:
+AD4- On Wed, Jan 8, 2014 at 12:16 PM, Chris Mason +ADw-clm+AEA-fb.com+AD4- wrote:
+AD4- +AD4- On Wed, 2014-01-08 at 11:54 -0800, Muthu Kumar wrote:
+AD4- +AD4APg- Chris,
+AD4- +AD4APg-
+AD4- +AD4APg- +AFs- 8.336061+AF0- WARNING: CPU: 0 PID: 0 at fs/bio.c:1778 bio+AF8-endio+-0xbe/0x100()
+AD4- +AD4APg- +AFs- 8.336062+AF0- bio+AF8-endio: bio for (unknown) without endio
+AD4- +AD4APg-
+AD4- +AD4APg- This is my recent change to avoid memory leak in bio+AF8-endio. But I
+AD4- +AD4APg- think the problem is higher up, most likely bio+AF8-endio is called twice
+AD4- +AD4APg- on the same bio (which was freed before).
+AD4- +AD4APg-
+AD4- +AD4-
+AD4- +AD4- I think these are just two separate problems. Lets ignore the WARN+AF8-ON
+AD4- +AD4- for now.
+AD4- +AD4-
+AD4-
+AD4- Not really... the BUG that is triggered:
+AD4-
+AD4- kernel BUG at fs/bio.c:523+ACE-
+AD4-
+AD4- It is in bio+AF8-put() (added to bio+AF8-endio() as part of recent change)
+AD4- which gets an already freed bio.
+AD4-

Oh+ACE- I see. Let me try with that one reverted. Thanks+ACE-

-chris

2014-01-08 21:01:14

by Muthu Kumar

[permalink] [raw]
Subject: Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748

On Wed, Jan 8, 2014 at 12:51 PM, Chris Mason <[email protected]> wrote:
> On Wed, 2014-01-08 at 12:40 -0800, Muthu Kumar wrote:
>> On Wed, Jan 8, 2014 at 12:16 PM, Chris Mason <[email protected]> wrote:
>> > On Wed, 2014-01-08 at 11:54 -0800, Muthu Kumar wrote:
>> >> Chris,
>> >>
>> >> [ 8.336061] WARNING: CPU: 0 PID: 0 at fs/bio.c:1778 bio_endio+0xbe/0x100()
>> >> [ 8.336062] bio_endio: bio for (unknown) without endio
>> >>
>> >> This is my recent change to avoid memory leak in bio_endio. But I
>> >> think the problem is higher up, most likely bio_endio is called twice
>> >> on the same bio (which was freed before).
>> >>
>> >
>> > I think these are just two separate problems. Lets ignore the WARN_ON
>> > for now.
>> >
>>
>> Not really... the BUG that is triggered:
>>
>> kernel BUG at fs/bio.c:523!
>>
>> It is in bio_put() (added to bio_endio() as part of recent change)
>> which gets an already freed bio.
>>
>
> Oh! I see. Let me try with that one reverted. Thanks!
>
> -chris
>

But, like I said, problem is in different place. I am running a "dd"
on ext4 fs for a while now, but didn't hit the problem. Any idea to
repro locally? I would also suggest running just the for-3.1/core to
isolate the issue.

2014-01-08 21:12:10

by Chris Mason

[permalink] [raw]
Subject: Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748

On Wed, 2014-01-08 at 13:01 -0800, Muthu Kumar wrote:
+AD4- On Wed, Jan 8, 2014 at 12:51 PM, Chris Mason +ADw-clm+AEA-fb.com+AD4- wrote:
+AD4- +AD4- On Wed, 2014-01-08 at 12:40 -0800, Muthu Kumar wrote:
+AD4- +AD4APg- On Wed, Jan 8, 2014 at 12:16 PM, Chris Mason +ADw-clm+AEA-fb.com+AD4- wrote:
+AD4- +AD4APg- +AD4- On Wed, 2014-01-08 at 11:54 -0800, Muthu Kumar wrote:
+AD4- +AD4APg- +AD4APg- Chris,
+AD4- +AD4APg- +AD4APg-
+AD4- +AD4APg- +AD4APg- +AFs- 8.336061+AF0- WARNING: CPU: 0 PID: 0 at fs/bio.c:1778 bio+AF8-endio+-0xbe/0x100()
+AD4- +AD4APg- +AD4APg- +AFs- 8.336062+AF0- bio+AF8-endio: bio for (unknown) without endio
+AD4- +AD4APg- +AD4APg-
+AD4- +AD4APg- +AD4APg- This is my recent change to avoid memory leak in bio+AF8-endio. But I
+AD4- +AD4APg- +AD4APg- think the problem is higher up, most likely bio+AF8-endio is called twice
+AD4- +AD4APg- +AD4APg- on the same bio (which was freed before).
+AD4- +AD4APg- +AD4APg-
+AD4- +AD4APg- +AD4-
+AD4- +AD4APg- +AD4- I think these are just two separate problems. Lets ignore the WARN+AF8-ON
+AD4- +AD4APg- +AD4- for now.
+AD4- +AD4APg- +AD4-
+AD4- +AD4APg-
+AD4- +AD4APg- Not really... the BUG that is triggered:
+AD4- +AD4APg-
+AD4- +AD4APg- kernel BUG at fs/bio.c:523+ACE-
+AD4- +AD4APg-
+AD4- +AD4APg- It is in bio+AF8-put() (added to bio+AF8-endio() as part of recent change)
+AD4- +AD4APg- which gets an already freed bio.
+AD4- +AD4APg-
+AD4- +AD4-
+AD4- +AD4- Oh+ACE- I see. Let me try with that one reverted. Thanks+ACE-
+AD4- +AD4-
+AD4- +AD4- -chris
+AD4- +AD4-
+AD4-
+AD4- But, like I said, problem is in different place. I am running a +ACI-dd+ACI-
+AD4- on ext4 fs for a while now, but didn't hit the problem. Any idea to
+AD4- repro locally? I would also suggest running just the for-3.1/core to
+AD4- isolate the issue.

Just reverting that change fixes it for me. Jens mentioned it was
broken for on-stack bios.

-chris

2014-01-08 21:13:39

by Kent Overstreet

[permalink] [raw]
Subject: Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748

On Wed, Jan 08, 2014 at 09:11:49PM +0000, Chris Mason wrote:
> On Wed, 2014-01-08 at 13:01 -0800, Muthu Kumar wrote:
> > On Wed, Jan 8, 2014 at 12:51 PM, Chris Mason <[email protected]> wrote:
> > > On Wed, 2014-01-08 at 12:40 -0800, Muthu Kumar wrote:
> > >> On Wed, Jan 8, 2014 at 12:16 PM, Chris Mason <[email protected]> wrote:
> > >> > On Wed, 2014-01-08 at 11:54 -0800, Muthu Kumar wrote:
> > >> >> Chris,
> > >> >>
> > >> >> [ 8.336061] WARNING: CPU: 0 PID: 0 at fs/bio.c:1778 bio_endio+0xbe/0x100()
> > >> >> [ 8.336062] bio_endio: bio for (unknown) without endio
> > >> >>
> > >> >> This is my recent change to avoid memory leak in bio_endio. But I
> > >> >> think the problem is higher up, most likely bio_endio is called twice
> > >> >> on the same bio (which was freed before).
> > >> >>
> > >> >
> > >> > I think these are just two separate problems. Lets ignore the WARN_ON
> > >> > for now.
> > >> >
> > >>
> > >> Not really... the BUG that is triggered:
> > >>
> > >> kernel BUG at fs/bio.c:523!
> > >>
> > >> It is in bio_put() (added to bio_endio() as part of recent change)
> > >> which gets an already freed bio.
> > >>
> > >
> > > Oh! I see. Let me try with that one reverted. Thanks!
> > >
> > > -chris
> > >
> >
> > But, like I said, problem is in different place. I am running a "dd"
> > on ext4 fs for a while now, but didn't hit the problem. Any idea to
> > repro locally? I would also suggest running just the for-3.1/core to
> > isolate the issue.
>
> Just reverting that change fixes it for me. Jens mentioned it was
> broken for on-stack bios.

On-stack bios? I don't recall ever coming across such a thing, who what
where why?

i would expect on stack bios to work though, i'm really curious how it
was broken

2014-01-08 21:14:00

by Chris Mason

[permalink] [raw]
Subject: Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748

On Tue, 2014-01-07 at 12:15 -0800, Muthu Kumar wrote:
+AD4- Thanks Fengguang. Final patch with added comment. BTW, fengguang
+AD4- mentioned that git-am has trouble with the inline patch and +ACI-quilt
+AD4- import+ACI- worked fine for him...
+AD4-
+AD4- ------------
+AD4- In btrfs+AF8-end+AF8-bio(), we increment bi+AF8-remaining if is+AF8-orig+AF8-bio. If not,
+AD4- we restore the orig+AF8-bio but failed to increment bi+AF8-remaining for
+AD4- orig+AF8-bio, which triggers a BUG+AF8-ON later when bio+AF8-endio is called. Fix
+AD4- is to increment bi+AF8-remaining when we restore the orig bio as well.
+AD4-
+AD4- Reported-and-Tested-by: Fengguang wu +ADw-fengguang.wu+AEA-intel.com+AD4-
+AD4- CC: Kent Overstreet +ADw-kmo+AEA-daterainc.com+AD4-
+AD4- CC: Jens Axboe +ADw-axboe+AEA-kernel.dk+AD4-
+AD4- CC: Chris Mason +ADw-clm+AEA-fb.com+AD4-
+AD4- Signed-off-by: Muthukumar Ratty +ADw-muthur+AEA-gmail.com+AD4-
+AD4-

Reviewed-by: Chris Mason +ADw-clm+AEA-fb.com+AD4-

Jens, please pull this one in.

+AD4- -----------
+AD4- fs/btrfs/volumes.c +AHw- 8 +-+-+-+-+-+---
+AD4- 1 files changed, 6 insertions(+-), 2 deletions(-)
+AD4-
+AD4- diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
+AD4- index 37972d5..34aba2b 100644
+AD4- --- a/fs/btrfs/volumes.c
+AD4- +-+-+- b/fs/btrfs/volumes.c
+AD4- +AEAAQA- -5297,9 +-5297,13 +AEAAQA- static void btrfs+AF8-end+AF8-bio(struct bio +ACo-bio, int err)
+AD4- if (+ACE-is+AF8-orig+AF8-bio) +AHs-
+AD4- bio+AF8-put(bio)+ADs-
+AD4- bio +AD0- bbio-+AD4-orig+AF8-bio+ADs-
+AD4- - +AH0- else +AHs-
+AD4- - atomic+AF8-inc(+ACY-bio-+AD4-bi+AF8-remaining)+ADs-
+AD4- +AH0-
+AD4- +- /+ACo-
+AD4- +- +ACo- We have original bio now. So increment bi+AF8-remaining to
+AD4- +- +ACo- account for it in endio
+AD4- +- +ACo-/
+AD4- +- atomic+AF8-inc(+ACY-bio-+AD4-bi+AF8-remaining)+ADs-
+AD4- +-
+AD4- bio-+AD4-bi+AF8-private +AD0- bbio-+AD4-private+ADs-
+AD4- bio-+AD4-bi+AF8-end+AF8-io +AD0- bbio-+AD4-end+AF8-io+ADs-
+AD4- btrfs+AF8-io+AF8-bio(bio)-+AD4-mirror+AF8-num +AD0- bbio-+AD4-mirror+AF8-num+ADs-
+AD4-
+AD4- -------------------------------------

2014-01-08 21:18:55

by Muthu Kumar

[permalink] [raw]
Subject: Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748

On Wed, Jan 8, 2014 at 1:14 PM, Kent Overstreet <[email protected]> wrote:
> On Wed, Jan 08, 2014 at 09:11:49PM +0000, Chris Mason wrote:
>> On Wed, 2014-01-08 at 13:01 -0800, Muthu Kumar wrote:
>> > On Wed, Jan 8, 2014 at 12:51 PM, Chris Mason <[email protected]> wrote:
>> > > On Wed, 2014-01-08 at 12:40 -0800, Muthu Kumar wrote:
>> > >> On Wed, Jan 8, 2014 at 12:16 PM, Chris Mason <[email protected]> wrote:
>> > >> > On Wed, 2014-01-08 at 11:54 -0800, Muthu Kumar wrote:
>> > >> >> Chris,
>> > >> >>
>> > >> >> [ 8.336061] WARNING: CPU: 0 PID: 0 at fs/bio.c:1778 bio_endio+0xbe/0x100()
>> > >> >> [ 8.336062] bio_endio: bio for (unknown) without endio
>> > >> >>
>> > >> >> This is my recent change to avoid memory leak in bio_endio. But I
>> > >> >> think the problem is higher up, most likely bio_endio is called twice
>> > >> >> on the same bio (which was freed before).
>> > >> >>
>> > >> >
>> > >> > I think these are just two separate problems. Lets ignore the WARN_ON
>> > >> > for now.
>> > >> >
>> > >>
>> > >> Not really... the BUG that is triggered:
>> > >>
>> > >> kernel BUG at fs/bio.c:523!
>> > >>
>> > >> It is in bio_put() (added to bio_endio() as part of recent change)
>> > >> which gets an already freed bio.
>> > >>
>> > >
>> > > Oh! I see. Let me try with that one reverted. Thanks!
>> > >
>> > > -chris
>> > >
>> >
>> > But, like I said, problem is in different place. I am running a "dd"
>> > on ext4 fs for a while now, but didn't hit the problem. Any idea to
>> > repro locally? I would also suggest running just the for-3.1/core to
>> > isolate the issue.
>>
>> Just reverting that change fixes it for me. Jens mentioned it was
>> broken for on-stack bios.
>
> On-stack bios? I don't recall ever coming across such a thing, who what
> where why?
>
> i would expect on stack bios to work though, i'm really curious how it
> was broken

New change added a bio_put() which might not work if the bio is on stack.

I don't remember seeing a on-stack-bio either, any help to jog my memory?

2014-01-08 21:22:08

by Jens Axboe

[permalink] [raw]
Subject: Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748

On 01/08/2014 02:13 PM, Chris Mason wrote:
> On Tue, 2014-01-07 at 12:15 -0800, Muthu Kumar wrote:
>> Thanks Fengguang. Final patch with added comment. BTW, fengguang
>> mentioned that git-am has trouble with the inline patch and "quilt
>> import" worked fine for him...
>>
>> ------------
>> In btrfs_end_bio(), we increment bi_remaining if is_orig_bio. If not,
>> we restore the orig_bio but failed to increment bi_remaining for
>> orig_bio, which triggers a BUG_ON later when bio_endio is called. Fix
>> is to increment bi_remaining when we restore the orig bio as well.
>>
>> Reported-and-Tested-by: Fengguang wu <[email protected]>
>> CC: Kent Overstreet <[email protected]>
>> CC: Jens Axboe <[email protected]>
>> CC: Chris Mason <[email protected]>
>> Signed-off-by: Muthukumar Ratty <[email protected]>
>>
>
> Reviewed-by: Chris Mason <[email protected]>
>
> Jens, please pull this one in.

Done, with the added reviewed and tested-by's.

--
Jens Axboe

2014-01-08 21:23:16

by Kent Overstreet

[permalink] [raw]
Subject: Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748

On Wed, Jan 08, 2014 at 01:18:46PM -0800, Muthu Kumar wrote:
> On Wed, Jan 8, 2014 at 1:14 PM, Kent Overstreet <[email protected]> wrote:
> > On Wed, Jan 08, 2014 at 09:11:49PM +0000, Chris Mason wrote:
> >> On Wed, 2014-01-08 at 13:01 -0800, Muthu Kumar wrote:
> >> > On Wed, Jan 8, 2014 at 12:51 PM, Chris Mason <[email protected]> wrote:
> >> > > On Wed, 2014-01-08 at 12:40 -0800, Muthu Kumar wrote:
> >> > >> On Wed, Jan 8, 2014 at 12:16 PM, Chris Mason <[email protected]> wrote:
> >> > >> > On Wed, 2014-01-08 at 11:54 -0800, Muthu Kumar wrote:
> >> > >> >> Chris,
> >> > >> >>
> >> > >> >> [ 8.336061] WARNING: CPU: 0 PID: 0 at fs/bio.c:1778 bio_endio+0xbe/0x100()
> >> > >> >> [ 8.336062] bio_endio: bio for (unknown) without endio
> >> > >> >>
> >> > >> >> This is my recent change to avoid memory leak in bio_endio. But I
> >> > >> >> think the problem is higher up, most likely bio_endio is called twice
> >> > >> >> on the same bio (which was freed before).
> >> > >> >>
> >> > >> >
> >> > >> > I think these are just two separate problems. Lets ignore the WARN_ON
> >> > >> > for now.
> >> > >> >
> >> > >>
> >> > >> Not really... the BUG that is triggered:
> >> > >>
> >> > >> kernel BUG at fs/bio.c:523!
> >> > >>
> >> > >> It is in bio_put() (added to bio_endio() as part of recent change)
> >> > >> which gets an already freed bio.
> >> > >>
> >> > >
> >> > > Oh! I see. Let me try with that one reverted. Thanks!
> >> > >
> >> > > -chris
> >> > >
> >> >
> >> > But, like I said, problem is in different place. I am running a "dd"
> >> > on ext4 fs for a while now, but didn't hit the problem. Any idea to
> >> > repro locally? I would also suggest running just the for-3.1/core to
> >> > isolate the issue.
> >>
> >> Just reverting that change fixes it for me. Jens mentioned it was
> >> broken for on-stack bios.
> >
> > On-stack bios? I don't recall ever coming across such a thing, who what
> > where why?
> >
> > i would expect on stack bios to work though, i'm really curious how it
> > was broken
>
> New change added a bio_put() which might not work if the bio is on stack.
>
> I don't remember seeing a on-stack-bio either, any help to jog my memory?

That's code that logically belongs in bio_chain_endio(), it's just a
hack to avoid blowing the stack since the kernel is compiled with
-fno-sibling-call-optimization when you enable frame pointers (otherwise
would optimize those tail calls to jumps and we'd have no stack blowing
issues).