LinuxLists.cc - kernel BUG at fs/btrfs/volumes.c:LINE!

2018-06-06 13:33:33

Subject: kernel BUG at fs/btrfs/volumes.c:LINE!

Hello,

syzbot found the following crash on:

HEAD commit: af6c5d5e01ad Merge branch 'for-4.18' of git://git.kernel.o..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=15f700af800000
kernel config: https://syzkaller.appspot.com/x/.config?x=12ff770540994680
dashboard link: https://syzkaller.appspot.com/bug?extid=5b658d997a83984507a6
compiler: gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: [email protected]

RDX: 0000000020000080 RSI: 0000000020000040 RDI: 00007f787067fbf0
RBP: 0000000000000001 R08: 00000000200000c0 R09: 0000000020000080
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000014
R13: 0000000000000001 R14: 0000000000700008 R15: 0000000000000043
------------[ cut here ]------------
kernel BUG at fs/btrfs/volumes.c:1032!
invalid opcode: 0000 [#1] SMP KASAN
CPU: 1 PID: 22303 Comm: syz-executor1 Not tainted 4.17.0+ #86
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:btrfs_prepare_close_one_device fs/btrfs/volumes.c:1032 [inline]
RIP: 0010:close_fs_devices+0xba7/0xfa0 fs/btrfs/volumes.c:1052
Code: 56 18 48 89 f8 48 c1 e8 03 80 3c 18 00 0f 85 2b 03 00 00 49 83 6c 24
30 01 e9 25 f8 ff ff e8 90 f4 b3 fe 0f 0b e8 89 f4 b3 fe <0f> 0b 48 89 f7
e8 ef 64 f0 fe e9 f6 f5 ff ff e8 75 f4 b3 fe 0f 0b
RSP: 0018:ffff8801af6ff050 EFLAGS: 00010246
RAX: 0000000000040000 RBX: dffffc0000000000 RCX: ffffc9000c70c000
RDX: 0000000000040000 RSI: ffffffff82c56437 RDI: 0000000000000286
RBP: ffff8801af6ff350 R08: ffffed003b5e46d7 R09: ffffed003b5e46d6
R10: ffffed003b5e46d6 R11: ffff8801daf236b3 R12: ffff8801c58ac190
R13: 0000000000000000 R14: ffff8801b1a6a940 R15: ffff8801b4d7d680
FS: 00007f7870680700(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000704094 CR3: 00000001c51e8000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
btrfs_close_devices+0x29/0x150 fs/btrfs/volumes.c:1085
btrfs_mount_root+0x1419/0x1e70 fs/btrfs/super.c:1610
mount_fs+0xae/0x328 fs/super.c:1277
vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037
vfs_kern_mount+0x40/0x60 fs/namespace.c:1027
btrfs_mount+0x4a1/0x213e fs/btrfs/super.c:1661
mount_fs+0xae/0x328 fs/super.c:1277
vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037
vfs_kern_mount fs/namespace.c:1027 [inline]
do_new_mount fs/namespace.c:2518 [inline]
do_mount+0x564/0x30b0 fs/namespace.c:2848
ksys_mount+0x12d/0x140 fs/namespace.c:3064
__do_sys_mount fs/namespace.c:3078 [inline]
__se_sys_mount fs/namespace.c:3075 [inline]
__x64_sys_mount+0xbe/0x150 fs/namespace.c:3075
do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x45843a
Code: b8 a6 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 dd 8f fb ff c3 66 2e 0f
1f 84 00 00 00 00 00 66 90 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff
ff 0f 83 ba 8f fb ff c3 66 0f 1f 84 00 00 00 00 00
RSP: 002b:00007f787067fba8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 0000000020000080 RCX: 000000000045843a
RDX: 0000000020000080 RSI: 0000000020000040 RDI: 00007f787067fbf0
RBP: 0000000000000001 R08: 00000000200000c0 R09: 0000000020000080
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000014
R13: 0000000000000001 R14: 0000000000700008 R15: 0000000000000043
Modules linked in:
Dumping ftrace buffer:
(ftrace buffer empty)
---[ end trace 383b0406a01f2edd ]---
RIP: 0010:btrfs_prepare_close_one_device fs/btrfs/volumes.c:1032 [inline]
RIP: 0010:close_fs_devices+0xba7/0xfa0 fs/btrfs/volumes.c:1052
Code: 56 18 48 89 f8 48 c1 e8 03 80 3c 18 00 0f 85 2b 03 00 00 49 83 6c 24
30 01 e9 25 f8 ff ff e8 90 f4 b3 fe 0f 0b e8 89 f4 b3 fe <0f> 0b 48 89 f7
e8 ef 64 f0 fe e9 f6 f5 ff ff e8 75 f4 b3 fe 0f 0b
RSP: 0018:ffff8801af6ff050 EFLAGS: 00010246
RAX: 0000000000040000 RBX: dffffc0000000000 RCX: ffffc9000c70c000
RDX: 0000000000040000 RSI: ffffffff82c56437 RDI: 0000000000000286
RBP: ffff8801af6ff350 R08: ffffed003b5e46d7 R09: ffffed003b5e46d6
R10: ffffed003b5e46d6 R11: ffff8801daf236b3 R12: ffff8801c58ac190
R13: 0000000000000000 R14: ffff8801b1a6a940 R15: ffff8801b4d7d680
FS: 00007f7870680700(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000704094 CR3: 00000001c51e8000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
syzbot.

2018-06-06 16:14:49

by Anand Jain

[permalink] [raw]

Subject: Re: kernel BUG at fs/btrfs/volumes.c:LINE!

On 06/06/2018 09:31 PM, syzbot wrote:
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:    af6c5d5e01ad Merge branch 'for-4.18' of
> git://git.kernel.o..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=15f700af800000
> kernel config: https://syzkaller.appspot.com/x/.config?x=12ff770540994680
> dashboard link:
> https://syzkaller.appspot.com/bug?extid=5b658d997a83984507a6
> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>
> Unfortunately, I don't have any reproducer for this crash yet.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: [email protected]
>
> RDX: 0000000020000080 RSI: 0000000020000040 RDI: 00007f787067fbf0
> RBP: 0000000000000001 R08: 00000000200000c0 R09: 0000000020000080
> R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000014
> R13: 0000000000000001 R14: 0000000000700008 R15: 0000000000000043
> ------------[ cut here ]------------
> kernel BUG at fs/btrfs/volumes.c:1032!
> invalid opcode: 0000 [#1] SMP KASAN
> CPU: 1 PID: 22303 Comm: syz-executor1 Not tainted 4.17.0+ #86
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> RIP: 0010:btrfs_prepare_close_one_device fs/btrfs/volumes.c:1032 [inline]

btrfs_prepare_close_one_device()
::
1031 name = rcu_string_strdup(device->name->str, GFP_NOFS);
1032 BUG_ON(!name); /* -ENOMEM */

The way we close our devices needs new memory allocations
at the time of device close. By doing this apart from the BUG_ON
reported here, there _were_ other complications like managing the sysfs
links and moving them to the newly allocated btrfs_fs_devices.
So sometime back I attempted to correct this approach to a simple
device close without fresh allocation, however it wasn't successful.
I am going to try that again, but its not p1.

Thanks, Anand

> RIP: 0010:close_fs_devices+0xba7/0xfa0 fs/btrfs/volumes.c:1052
> Code: 56 18 48 89 f8 48 c1 e8 03 80 3c 18 00 0f 85 2b 03 00 00 49 83 6c
> 24 30 01 e9 25 f8 ff ff e8 90 f4 b3 fe 0f 0b e8 89 f4 b3 fe <0f> 0b 48
> 89 f7 e8 ef 64 f0 fe e9 f6 f5 ff ff e8 75 f4 b3 fe 0f 0b
> RSP: 0018:ffff8801af6ff050 EFLAGS: 00010246
> RAX: 0000000000040000 RBX: dffffc0000000000 RCX: ffffc9000c70c000
> RDX: 0000000000040000 RSI: ffffffff82c56437 RDI: 0000000000000286
> RBP: ffff8801af6ff350 R08: ffffed003b5e46d7 R09: ffffed003b5e46d6
> R10: ffffed003b5e46d6 R11: ffff8801daf236b3 R12: ffff8801c58ac190
> R13: 0000000000000000 R14: ffff8801b1a6a940 R15: ffff8801b4d7d680
> FS: 00007f7870680700(0000) GS:ffff8801daf00000(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000704094 CR3: 00000001c51e8000 CR4: 00000000001406e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> btrfs_close_devices+0x29/0x150 fs/btrfs/volumes.c:1085
> btrfs_mount_root+0x1419/0x1e70 fs/btrfs/super.c:1610
> mount_fs+0xae/0x328 fs/super.c:1277
> vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037
> vfs_kern_mount+0x40/0x60 fs/namespace.c:1027
> btrfs_mount+0x4a1/0x213e fs/btrfs/super.c:1661
> mount_fs+0xae/0x328 fs/super.c:1277
> vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037
> vfs_kern_mount fs/namespace.c:1027 [inline]
> do_new_mount fs/namespace.c:2518 [inline]
> do_mount+0x564/0x30b0 fs/namespace.c:2848
> ksys_mount+0x12d/0x140 fs/namespace.c:3064
> __do_sys_mount fs/namespace.c:3078 [inline]
> __se_sys_mount fs/namespace.c:3075 [inline]
> __x64_sys_mount+0xbe/0x150 fs/namespace.c:3075
> do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x45843a
> Code: b8 a6 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 dd 8f fb ff c3 66 2e
> 0f 1f 84 00 00 00 00 00 66 90 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01
> f0 ff ff 0f 83 ba 8f fb ff c3 66 0f 1f 84 00 00 00 00 00
> RSP: 002b:00007f787067fba8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
> RAX: ffffffffffffffda RBX: 0000000020000080 RCX: 000000000045843a
> RDX: 0000000020000080 RSI: 0000000020000040 RDI: 00007f787067fbf0
> RBP: 0000000000000001 R08: 00000000200000c0 R09: 0000000020000080
> R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000014
> R13: 0000000000000001 R14: 0000000000700008 R15: 0000000000000043
> Modules linked in:
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> ---[ end trace 383b0406a01f2edd ]---
> RIP: 0010:btrfs_prepare_close_one_device fs/btrfs/volumes.c:1032 [inline]
> RIP: 0010:close_fs_devices+0xba7/0xfa0 fs/btrfs/volumes.c:1052
> Code: 56 18 48 89 f8 48 c1 e8 03 80 3c 18 00 0f 85 2b 03 00 00 49 83 6c
> 24 30 01 e9 25 f8 ff ff e8 90 f4 b3 fe 0f 0b e8 89 f4 b3 fe <0f> 0b 48
> 89 f7 e8 ef 64 f0 fe e9 f6 f5 ff ff e8 75 f4 b3 fe 0f 0b
> RSP: 0018:ffff8801af6ff050 EFLAGS: 00010246
> RAX: 0000000000040000 RBX: dffffc0000000000 RCX: ffffc9000c70c000
> RDX: 0000000000040000 RSI: ffffffff82c56437 RDI: 0000000000000286
> RBP: ffff8801af6ff350 R08: ffffed003b5e46d7 R09: ffffed003b5e46d6
> R10: ffffed003b5e46d6 R11: ffff8801daf236b3 R12: ffff8801c58ac190
> R13: 0000000000000000 R14: ffff8801b1a6a940 R15: ffff8801b4d7d680
> FS: 00007f7870680700(0000) GS:ffff8801daf00000(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000704094 CR3: 00000001c51e8000 CR4: 00000000001406e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>
>
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at [email protected].
>
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
> syzbot.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2018-06-07 18:54:12

by David Sterba

[permalink] [raw]

Subject: Re: kernel BUG at fs/btrfs/volumes.c:LINE!

On Thu, Jun 07, 2018 at 12:15:04AM +0800, Anand Jain wrote:
>
>
> On 06/06/2018 09:31 PM, syzbot wrote:
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit:??? af6c5d5e01ad Merge branch 'for-4.18' of
> > git://git.kernel.o..
> > git tree:?????? upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=15f700af800000
> > kernel config:? https://syzkaller.appspot.com/x/.config?x=12ff770540994680
> > dashboard link:
> > https://syzkaller.appspot.com/bug?extid=5b658d997a83984507a6
> > compiler:?????? gcc (GCC) 8.0.1 20180413 (experimental)
> >
> > Unfortunately, I don't have any reproducer for this crash yet.
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: [email protected]
> >
> > RDX: 0000000020000080 RSI: 0000000020000040 RDI: 00007f787067fbf0
> > RBP: 0000000000000001 R08: 00000000200000c0 R09: 0000000020000080
> > R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000014
> > R13: 0000000000000001 R14: 0000000000700008 R15: 0000000000000043
> > ------------[ cut here ]------------
> > kernel BUG at fs/btrfs/volumes.c:1032!
> > invalid opcode: 0000 [#1] SMP KASAN
> > CPU: 1 PID: 22303 Comm: syz-executor1 Not tainted 4.17.0+ #86
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > Google 01/01/2011
> > RIP: 0010:btrfs_prepare_close_one_device fs/btrfs/volumes.c:1032 [inline]
>
> btrfs_prepare_close_one_device()
> ::
> 1031 name = rcu_string_strdup(device->name->str, GFP_NOFS);
> 1032 BUG_ON(!name); /* -ENOMEM */
>
> The way we close our devices needs new memory allocations
> at the time of device close. By doing this apart from the BUG_ON
> reported here, there _were_ other complications like managing the sysfs
> links and moving them to the newly allocated btrfs_fs_devices.
> So sometime back I attempted to correct this approach to a simple
> device close without fresh allocation, however it wasn't successful.
> I am going to try that again, but its not p1.

Yeah, getting rid of the allocations while freeing device would be great
but unfortunatelly is not simple.

Normally the GFP_NOFS allocations do not fail so I think the fuzzer
environment is tuned to allow that, which is fine for coverage but does
not happen in practice. This will be fixed eventually.

2018-06-07 19:00:24

by Dmitry Vyukov

[permalink] [raw]

Subject: Re: kernel BUG at fs/btrfs/volumes.c:LINE!

On Thu, Jun 7, 2018 at 5:34 PM, David Sterba <[email protected]> wrote:
> On Thu, Jun 07, 2018 at 12:15:04AM +0800, Anand Jain wrote:
>>
>>
>> On 06/06/2018 09:31 PM, syzbot wrote:
>> > Hello,
>> >
>> > syzbot found the following crash on:
>> >
>> > HEAD commit: af6c5d5e01ad Merge branch 'for-4.18' of
>> > git://git.kernel.o..
>> > git tree: upstream
>> > console output: https://syzkaller.appspot.com/x/log.txt?x=15f700af800000
>> > kernel config: https://syzkaller.appspot.com/x/.config?x=12ff770540994680
>> > dashboard link:
>> > https://syzkaller.appspot.com/bug?extid=5b658d997a83984507a6
>> > compiler: gcc (GCC) 8.0.1 20180413 (experimental)
>> >
>> > Unfortunately, I don't have any reproducer for this crash yet.
>> >
>> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> > Reported-by: [email protected]
>> >
>> > RDX: 0000000020000080 RSI: 0000000020000040 RDI: 00007f787067fbf0
>> > RBP: 0000000000000001 R08: 00000000200000c0 R09: 0000000020000080
>> > R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000014
>> > R13: 0000000000000001 R14: 0000000000700008 R15: 0000000000000043
>> > ------------[ cut here ]------------
>> > kernel BUG at fs/btrfs/volumes.c:1032!
>> > invalid opcode: 0000 [#1] SMP KASAN
>> > CPU: 1 PID: 22303 Comm: syz-executor1 Not tainted 4.17.0+ #86
>> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> > Google 01/01/2011
>> > RIP: 0010:btrfs_prepare_close_one_device fs/btrfs/volumes.c:1032 [inline]
>>
>> btrfs_prepare_close_one_device()
>> ::
>> 1031 name = rcu_string_strdup(device->name->str, GFP_NOFS);
>> 1032 BUG_ON(!name); /* -ENOMEM */
>>
>> The way we close our devices needs new memory allocations
>> at the time of device close. By doing this apart from the BUG_ON
>> reported here, there _were_ other complications like managing the sysfs
>> links and moving them to the newly allocated btrfs_fs_devices.
>> So sometime back I attempted to correct this approach to a simple
>> device close without fresh allocation, however it wasn't successful.
>> I am going to try that again, but its not p1.
>
> Yeah, getting rid of the allocations while freeing device would be great
> but unfortunatelly is not simple.
>
> Normally the GFP_NOFS allocations do not fail so I think the fuzzer
> environment is tuned to allow that, which is fine for coverage but does
> not happen in practice. This will be fixed eventually.

Isn't GFP_NOFS more restricted than normal allocations? Are these
allocations accounted against memcg? It's easy to fail any allocation
within a memory container.

2018-06-07 19:03:54

by David Sterba

[permalink] [raw]

Subject: Re: kernel BUG at fs/btrfs/volumes.c:LINE!

On Thu, Jun 07, 2018 at 06:28:02PM +0200, Dmitry Vyukov wrote:
> > Normally the GFP_NOFS allocations do not fail so I think the fuzzer
> > environment is tuned to allow that, which is fine for coverage but does
> > not happen in practice. This will be fixed eventually.
>
> Isn't GFP_NOFS more restricted than normal allocations? Are these
> allocations accounted against memcg? It's easy to fail any allocation
> within a memory container.

https://lwn.net/Articles/723317/ The 'too small to fail' and some
unwritten semantics of GFP_NOFS but I think you're right about the
memory controler that can fail any allocation though.

Error handling is being improved over time, the memory allocation
failures are in some cases hard and this one would need to update some
logic so it's not a oneliner.

2019-06-10 23:15:12

by Eric Biggers

[permalink] [raw]

Subject: Re: kernel BUG at fs/btrfs/volumes.c:LINE!

On Thu, Jun 07, 2018 at 06:52:13PM +0200, David Sterba wrote:
> On Thu, Jun 07, 2018 at 06:28:02PM +0200, Dmitry Vyukov wrote:
> > > Normally the GFP_NOFS allocations do not fail so I think the fuzzer
> > > environment is tuned to allow that, which is fine for coverage but does
> > > not happen in practice. This will be fixed eventually.
> >
> > Isn't GFP_NOFS more restricted than normal allocations? Are these
> > allocations accounted against memcg? It's easy to fail any allocation
> > within a memory container.
>
> https://lwn.net/Articles/723317/ The 'too small to fail' and some
> unwritten semantics of GFP_NOFS but I think you're right about the
> memory controler that can fail any allocation though.
>
> Error handling is being improved over time, the memory allocation
> failures are in some cases hard and this one would need to update some
> logic so it's not a oneliner.
>

This bug is still there. In btrfs_close_one_device():

if (device->name) {
name = rcu_string_strdup(device->name->str, GFP_NOFS);
BUG_ON(!name); /* -ENOMEM */
rcu_assign_pointer(new_device->name, name);
}

It assumes that the memory allocation succeeded.

See syzbot report from v5.2-rc3 here: https://syzkaller.appspot.com/text?tag=CrashReport&x=16c839c1a00000

Is there any plan to fix this?

- Eric

2019-06-11 12:19:25

by David Sterba

[permalink] [raw]

Subject: Re: kernel BUG at fs/btrfs/volumes.c:LINE!

On Mon, Jun 10, 2019 at 04:14:04PM -0700, Eric Biggers wrote:
> On Thu, Jun 07, 2018 at 06:52:13PM +0200, David Sterba wrote:
> > On Thu, Jun 07, 2018 at 06:28:02PM +0200, Dmitry Vyukov wrote:
> > > > Normally the GFP_NOFS allocations do not fail so I think the fuzzer
> > > > environment is tuned to allow that, which is fine for coverage but does
> > > > not happen in practice. This will be fixed eventually.
> > >
> > > Isn't GFP_NOFS more restricted than normal allocations? Are these
> > > allocations accounted against memcg? It's easy to fail any allocation
> > > within a memory container.
> >
> > https://lwn.net/Articles/723317/ The 'too small to fail' and some
> > unwritten semantics of GFP_NOFS but I think you're right about the
> > memory controler that can fail any allocation though.
> >
> > Error handling is being improved over time, the memory allocation
> > failures are in some cases hard and this one would need to update some
> > logic so it's not a oneliner.
> >
>
> This bug is still there. In btrfs_close_one_device():
>
> if (device->name) {
> name = rcu_string_strdup(device->name->str, GFP_NOFS);
> BUG_ON(!name); /* -ENOMEM */
> rcu_assign_pointer(new_device->name, name);
> }
>
> It assumes that the memory allocation succeeded.
>
> See syzbot report from v5.2-rc3 here: https://syzkaller.appspot.com/text?tag=CrashReport&x=16c839c1a00000
>
> Is there any plan to fix this?

Yes there is, to avoid allocations when closing the device and tracking
the state in another way. As this has never been reported in practice
the priority to fix it is rather low so I can't give you an ETA.

2019-12-04 15:48:32

by Johannes Thumshirn

[permalink] [raw]

Subject: Re: kernel BUG at fs/btrfs/volumes.c:LINE!

#syz-test git://git.kernel.org/pub/scm/linux/kernel/git/jth/linux.git
close_fs_devices

2019-12-05 10:03:36

by Johannes Thumshirn

[permalink] [raw]

Subject: Re: kernel BUG at fs/btrfs/volumes.c:LINE!

On Wed, Dec 04, 2019 at 03:59:01PM +0100, Johannes Thumshirn wrote:
> #syz-test git://git.kernel.org/pub/scm/linux/kernel/git/jth/linux.git
> close_fs_devices

Ok this doesn't look like it worked, let's retry w/o line wrapping

#syz-test git://git.kernel.org/pub/scm/linux/kernel/git/jth/linux.git close_fs_devices

2019-12-05 10:08:21

by Dmitry Vyukov

[permalink] [raw]

Subject: Re: kernel BUG at fs/btrfs/volumes.c:LINE!

On Thu, Dec 5, 2019 at 11:00 AM Johannes Thumshirn <[email protected]> wrote:
>
> On Wed, Dec 04, 2019 at 03:59:01PM +0100, Johannes Thumshirn wrote:
> > #syz-test git://git.kernel.org/pub/scm/linux/kernel/git/jth/linux.git
> > close_fs_devices
>
> Ok this doesn't look like it worked, let's retry w/o line wrapping
>
> #syz-test git://git.kernel.org/pub/scm/linux/kernel/git/jth/linux.git close_fs_devices

The correct syntax would be (no dash + colon):

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/jth/linux.git
close_fs_devices

2019-12-05 10:08:25

by syzbot

[permalink] [raw]

Subject: Re: Re: kernel BUG at fs/btrfs/volumes.c:LINE!

> On Thu, Dec 5, 2019 at 11:00 AM Johannes Thumshirn <[email protected]>
> wrote:

>> On Wed, Dec 04, 2019 at 03:59:01PM +0100, Johannes Thumshirn wrote:
>> > #syz-test git://git.kernel.org/pub/scm/linux/kernel/git/jth/linux.git
>> > close_fs_devices

>> Ok this doesn't look like it worked, let's retry w/o line wrapping

>> #syz-test git://git.kernel.org/pub/scm/linux/kernel/git/jth/linux.git
>> close_fs_devices

> The correct syntax would be (no dash + colon):

> #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/jth/linux.git

This crash does not have a reproducer. I cannot test it.

> close_fs_devices

2019-12-05 11:39:36

by Johannes Thumshirn

[permalink] [raw]

Subject: Re: kernel BUG at fs/btrfs/volumes.c:LINE!

On Thu, Dec 05, 2019 at 11:07:27AM +0100, Dmitry Vyukov wrote:
> The correct syntax would be (no dash + colon):
>
> #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/jth/linux.git
> close_fs_devices

Ah ok, thanks.

Although syzbot already said it can't test because it has no reproducer.
Anyways good to know for future reports.

Byte,
Johannes

2019-12-05 11:51:20

by David Sterba

[permalink] [raw]

Subject: Re: kernel BUG at fs/btrfs/volumes.c:LINE!

On Thu, Dec 05, 2019 at 12:38:38PM +0100, Johannes Thumshirn wrote:
> On Thu, Dec 05, 2019 at 11:07:27AM +0100, Dmitry Vyukov wrote:
> > The correct syntax would be (no dash + colon):
> >
> > #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/jth/linux.git
> > close_fs_devices
>
> Ah ok, thanks.
>
> Although syzbot already said it can't test because it has no reproducer.
> Anyways good to know for future reports.

According to

https://syzkaller.appspot.com/bug?id=d50670eeb21302915bde3f25871dfb7ea43db1e4

there is a way how to test it, many reports and the last one about a
week old. Is there a way to instruct syzbot to run the same tests on a
given branch?

(The reproducer is basically setting up environment with limited amount
of memory available for allocation and this hits the BUG_ON.)

2019-12-05 12:08:36

by Dmitry Vyukov

[permalink] [raw]

Subject: Re: kernel BUG at fs/btrfs/volumes.c:LINE!

On Thu, Dec 5, 2019 at 12:50 PM David Sterba <[email protected]> wrote:
>
> On Thu, Dec 05, 2019 at 12:38:38PM +0100, Johannes Thumshirn wrote:
> > On Thu, Dec 05, 2019 at 11:07:27AM +0100, Dmitry Vyukov wrote:
> > > The correct syntax would be (no dash + colon):
> > >
> > > #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/jth/linux.git
> > > close_fs_devices
> >
> > Ah ok, thanks.
> >
> > Although syzbot already said it can't test because it has no reproducer.
> > Anyways good to know for future reports.
>
> According to
>
> https://syzkaller.appspot.com/bug?id=d50670eeb21302915bde3f25871dfb7ea43db1e4
>
> there is a way how to test it, many reports and the last one about a
> week old. Is there a way to instruct syzbot to run the same tests on a
> given branch?
>
> (The reproducer is basically setting up environment with limited amount
> of memory available for allocation and this hits the BUG_ON.)

syzkaller does this ("rerun the same tests") for every bug always. If
it succeeds (kernel crashes again), it results in a reproducer, that
can later be used for cause/fix bisection and patch testing. In this
case it does not reproduce, so rerunning the same tests will not lead
to anything useful (only if to false confirmation that a patch fixes
the crash).

There is a large number of reasons why a kernel crash may not
reproduce. It may be global accumulated state, non-hermetic tests,
poor syzkaller btrfs descriptions (most likely true) and others.

Need to take a closer look, on first sight it looks like something
that should be reproduced...

2019-12-10 15:13:05

by Dmitry Vyukov

[permalink] [raw]

Subject: Re: kernel BUG at fs/btrfs/volumes.c:LINE!

On Thu, Dec 5, 2019 at 1:06 PM Dmitry Vyukov <[email protected]> wrote:
> > > > The correct syntax would be (no dash + colon):
> > > >
> > > > #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/jth/linux.git
> > > > close_fs_devices
> > >
> > > Ah ok, thanks.
> > >
> > > Although syzbot already said it can't test because it has no reproducer.
> > > Anyways good to know for future reports.
> >
> > According to
> >
> > https://syzkaller.appspot.com/bug?id=d50670eeb21302915bde3f25871dfb7ea43db1e4
> >
> > there is a way how to test it, many reports and the last one about a
> > week old. Is there a way to instruct syzbot to run the same tests on a
> > given branch?
> >
> > (The reproducer is basically setting up environment with limited amount
> > of memory available for allocation and this hits the BUG_ON.)
>
> syzkaller does this ("rerun the same tests") for every bug always. If
> it succeeds (kernel crashes again), it results in a reproducer, that
> can later be used for cause/fix bisection and patch testing. In this
> case it does not reproduce, so rerunning the same tests will not lead
> to anything useful (only if to false confirmation that a patch fixes
> the crash).
>
> There is a large number of reasons why a kernel crash may not
> reproduce. It may be global accumulated state, non-hermetic tests,
> poor syzkaller btrfs descriptions (most likely true) and others.
>
> Need to take a closer look, on first sight it looks like something
> that should be reproduced...

Yes, there was a bug around image mount reproduction. Should be fixed
now by https://github.com/google/syzkaller/commit/cb704a294c54aed90281c016a6dc0c40ae295601