2021-10-21 23:21:32

by syzbot

[permalink] [raw]
Subject: [syzbot] WARNING in batadv_nc_mesh_free

Hello,

syzbot found the following issue on:

HEAD commit: 2f111a6fd5b5 Merge tag 'ceph-for-5.15-rc7' of git://github..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=115750acb00000
kernel config: https://syzkaller.appspot.com/x/.config?x=d95853dad8472c91
dashboard link: https://syzkaller.appspot.com/bug?extid=28b0702ada0bf7381f58
compiler: Debian clang version 11.0.1-2, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1026ef2cb00000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15c9c162b00000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

RBP: 00007ffef262e230 R08: 0000000000000002 R09: 00007fddc8003531
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
------------[ cut here ]------------
ODEBUG: assert_init not available (active state 0) object type: timer_list hint: 0x0
WARNING: CPU: 0 PID: 6517 at lib/debugobjects.c:508 debug_print_object lib/debugobjects.c:505 [inline]
WARNING: CPU: 0 PID: 6517 at lib/debugobjects.c:508 debug_object_assert_init+0x1fa/0x250 lib/debugobjects.c:895
Modules linked in:
CPU: 0 PID: 6517 Comm: syz-executor011 Not tainted 5.15.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:debug_print_object lib/debugobjects.c:505 [inline]
RIP: 0010:debug_object_assert_init+0x1fa/0x250 lib/debugobjects.c:895
Code: e8 4b 15 b8 fd 4c 8b 45 00 48 c7 c7 a0 31 b4 8a 48 c7 c6 00 2e b4 8a 48 c7 c2 e0 33 b4 8a 31 c9 49 89 d9 31 c0 e8 b6 c6 36 fd <0f> 0b ff 05 3a 5c c5 09 48 83 c5 38 48 89 e8 48 c1 e8 03 42 80 3c
RSP: 0018:ffffc90002c7e698 EFLAGS: 00010046
RAX: cffa606352c78700 RBX: 0000000000000000 RCX: ffff888076ce9c80
RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
RBP: ffffffff8a512d00 R08: ffffffff81693402 R09: ffffed1017383f2c
R10: ffffed1017383f2c R11: 0000000000000000 R12: dffffc0000000000
R13: ffff88801bcd1720 R14: 0000000000000002 R15: ffffffff90ba5a20
FS: 0000555557087300(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f5473f3c000 CR3: 0000000070ca6000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
debug_timer_assert_init kernel/time/timer.c:739 [inline]
debug_assert_init kernel/time/timer.c:784 [inline]
del_timer+0xa5/0x3d0 kernel/time/timer.c:1204
try_to_grab_pending+0x151/0xbb0 kernel/workqueue.c:1270
__cancel_work_timer+0x14c/0x710 kernel/workqueue.c:3129
batadv_nc_mesh_free+0x4a/0xf0 net/batman-adv/network-coding.c:1869
batadv_mesh_free+0x6f/0x140 net/batman-adv/main.c:245
batadv_mesh_init+0x4e5/0x550 net/batman-adv/main.c:226
batadv_softif_init_late+0x8fe/0xd70 net/batman-adv/soft-interface.c:804
register_netdevice+0x826/0x1c30 net/core/dev.c:10229
__rtnl_newlink net/core/rtnetlink.c:3458 [inline]
rtnl_newlink+0x14b3/0x1d10 net/core/rtnetlink.c:3506
rtnetlink_rcv_msg+0x934/0xe60 net/core/rtnetlink.c:5572
netlink_rcv_skb+0x200/0x470 net/netlink/af_netlink.c:2510
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x814/0x9f0 net/netlink/af_netlink.c:1345
netlink_sendmsg+0xa29/0xe50 net/netlink/af_netlink.c:1935
sock_sendmsg_nosec net/socket.c:704 [inline]
sock_sendmsg net/socket.c:724 [inline]
____sys_sendmsg+0x5b9/0x910 net/socket.c:2409
___sys_sendmsg net/socket.c:2463 [inline]
__sys_sendmsg+0x36f/0x450 net/socket.c:2492
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fddc82bc7e9
Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffef262e228 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fddc82bc7e9
RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003
RBP: 00007ffef262e230 R08: 0000000000000002 R09: 00007fddc8003531
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches


2021-10-22 18:37:14

by Pavel Skripkin

[permalink] [raw]
Subject: Re: [syzbot] WARNING in batadv_nc_mesh_free

On 10/22/21 02:19, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 2f111a6fd5b5 Merge tag 'ceph-for-5.15-rc7' of git://github..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=115750acb00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=d95853dad8472c91
> dashboard link: https://syzkaller.appspot.com/bug?extid=28b0702ada0bf7381f58
> compiler: Debian clang version 11.0.1-2, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1026ef2cb00000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15c9c162b00000
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]
>
> RBP: 00007ffef262e230 R08: 0000000000000002 R09: 00007fddc8003531
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> ------------[ cut here ]------------
> ODEBUG: assert_init not available (active state 0) object type: timer_list hint: 0x0
> WARNING: CPU: 0 PID: 6517 at lib/debugobjects.c:508 debug_print_object lib/debugobjects.c:505 [inline]
> WARNING: CPU: 0 PID: 6517 at lib/debugobjects.c:508 debug_object_assert_init+0x1fa/0x250 lib/debugobjects.c:895
> Modules linked in:
> CPU: 0 PID: 6517 Comm: syz-executor011 Not tainted 5.15.0-rc6-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:debug_print_object lib/debugobjects.c:505 [inline]
> RIP: 0010:debug_object_assert_init+0x1fa/0x250 lib/debugobjects.c:895
> Code: e8 4b 15 b8 fd 4c 8b 45 00 48 c7 c7 a0 31 b4 8a 48 c7 c6 00 2e b4 8a 48 c7 c2 e0 33 b4 8a 31 c9 49 89 d9 31 c0 e8 b6 c6 36 fd <0f> 0b ff 05 3a 5c c5 09 48 83 c5 38 48 89 e8 48 c1 e8 03 42 80 3c
> RSP: 0018:ffffc90002c7e698 EFLAGS: 00010046
> RAX: cffa606352c78700 RBX: 0000000000000000 RCX: ffff888076ce9c80
> RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
> RBP: ffffffff8a512d00 R08: ffffffff81693402 R09: ffffed1017383f2c
> R10: ffffed1017383f2c R11: 0000000000000000 R12: dffffc0000000000
> R13: ffff88801bcd1720 R14: 0000000000000002 R15: ffffffff90ba5a20
> FS: 0000555557087300(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f5473f3c000 CR3: 0000000070ca6000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> debug_timer_assert_init kernel/time/timer.c:739 [inline]
> debug_assert_init kernel/time/timer.c:784 [inline]
> del_timer+0xa5/0x3d0 kernel/time/timer.c:1204
> try_to_grab_pending+0x151/0xbb0 kernel/workqueue.c:1270
> __cancel_work_timer+0x14c/0x710 kernel/workqueue.c:3129
> batadv_nc_mesh_free+0x4a/0xf0 net/batman-adv/network-coding.c:1869
> batadv_mesh_free+0x6f/0x140 net/batman-adv/main.c:245
> batadv_mesh_init+0x4e5/0x550 net/batman-adv/main.c:226

Looks like cancel_delayed_work_sync() is called before
INIT_DELAYED_WORK(), so calltrace looks like

batadv_mesh_init()
batadv_originator_init() <- injected allocation failure
batadv_mesh_free()
batadv_nc_mesh_free()
cancel_delayed_work_sync()


Quick fix can be moving INIT_DELAYED_WORK() from batadv_nc_init() to
batadv_mesh_init(), since there is complex dependencies between each
mech part, if I understood comments correctly


Just for thoughts and syzbot testing

#syz test
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master



With regards,
Pavel Skripkin



Attachments:
ph (2.40 kB)

2021-10-22 20:22:40

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] WARNING in batadv_nc_mesh_free

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
general protection fault in batadv_nc_purge_paths

RBP: 00007fe7b40631d0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
R13: 00007ffe7ffd3def R14: 00007fe7b4063300 R15: 0000000000022000
general protection fault, probably for non-canonical address 0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
CPU: 1 PID: 9061 Comm: syz-executor.0 Not tainted 5.15.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:batadv_nc_purge_paths+0x38/0x3f0 net/batman-adv/network-coding.c:437
Code: 48 89 d3 49 89 f6 48 89 7c 24 58 49 bd 00 00 00 00 00 fc ff df e8 38 48 ab f7 4d 8d 7e 10 4c 89 f8 48 c1 e8 03 48 89 44 24 48 <42> 8a 04 28 84 c0 0f 85 88 03 00 00 41 8b 2f 31 ff 89 ee e8 20 4c
RSP: 0018:ffffc9000d04eac0 EFLAGS: 00010202
RAX: 0000000000000002 RBX: 0000000000000000 RCX: ffff888078270000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88807ec2cc80
RBP: 00000000fffffff4 R08: ffffffff8154e5b4 R09: ffffed100fd85adc
R10: ffffed100fd85adc R11: 0000000000000000 R12: ffff88807ec2cc80
R13: dffffc0000000000 R14: 0000000000000000 R15: 0000000000000010
FS: 00007fe7b4063700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f359172e000 CR3: 000000005e749000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
batadv_nc_mesh_free+0x7a/0xf0 net/batman-adv/network-coding.c:1869
batadv_mesh_free+0x6f/0x140 net/batman-adv/main.c:249
batadv_mesh_init+0x5b1/0x620 net/batman-adv/main.c:230
batadv_softif_init_late+0x8fe/0xd70 net/batman-adv/soft-interface.c:804
register_netdevice+0x826/0x1c30 net/core/dev.c:10229
__rtnl_newlink net/core/rtnetlink.c:3458 [inline]
rtnl_newlink+0x14b3/0x1d10 net/core/rtnetlink.c:3506
rtnetlink_rcv_msg+0x934/0xe60 net/core/rtnetlink.c:5572
netlink_rcv_skb+0x200/0x470 net/netlink/af_netlink.c:2510
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x814/0x9f0 net/netlink/af_netlink.c:1345
netlink_sendmsg+0xa29/0xe50 net/netlink/af_netlink.c:1935
sock_sendmsg_nosec net/socket.c:704 [inline]
sock_sendmsg net/socket.c:724 [inline]
____sys_sendmsg+0x5b9/0x910 net/socket.c:2409
___sys_sendmsg net/socket.c:2463 [inline]
__sys_sendmsg+0x36f/0x450 net/socket.c:2492
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fe7b48eda39
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fe7b4063188 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007fe7b49f0f60 RCX: 00007fe7b48eda39
RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003
RBP: 00007fe7b40631d0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
R13: 00007ffe7ffd3def R14: 00007fe7b4063300 R15: 0000000000022000
Modules linked in:
---[ end trace 67ff054734964acf ]---
RIP: 0010:batadv_nc_purge_paths+0x38/0x3f0 net/batman-adv/network-coding.c:437
Code: 48 89 d3 49 89 f6 48 89 7c 24 58 49 bd 00 00 00 00 00 fc ff df e8 38 48 ab f7 4d 8d 7e 10 4c 89 f8 48 c1 e8 03 48 89 44 24 48 <42> 8a 04 28 84 c0 0f 85 88 03 00 00 41 8b 2f 31 ff 89 ee e8 20 4c
RSP: 0018:ffffc9000d04eac0 EFLAGS: 00010202
RAX: 0000000000000002 RBX: 0000000000000000 RCX: ffff888078270000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88807ec2cc80
RBP: 00000000fffffff4 R08: ffffffff8154e5b4 R09: ffffed100fd85adc
R10: ffffed100fd85adc R11: 0000000000000000 R12: ffff88807ec2cc80
R13: dffffc0000000000 R14: 0000000000000000 R15: 0000000000000010
FS: 00007fe7b4063700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc230f87020 CR3: 000000005e749000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
----------------
Code disassembly (best guess):
0: 48 89 d3 mov %rdx,%rbx
3: 49 89 f6 mov %rsi,%r14
6: 48 89 7c 24 58 mov %rdi,0x58(%rsp)
b: 49 bd 00 00 00 00 00 movabs $0xdffffc0000000000,%r13
12: fc ff df
15: e8 38 48 ab f7 callq 0xf7ab4852
1a: 4d 8d 7e 10 lea 0x10(%r14),%r15
1e: 4c 89 f8 mov %r15,%rax
21: 48 c1 e8 03 shr $0x3,%rax
25: 48 89 44 24 48 mov %rax,0x48(%rsp)
* 2a: 42 8a 04 28 mov (%rax,%r13,1),%al <-- trapping instruction
2e: 84 c0 test %al,%al
30: 0f 85 88 03 00 00 jne 0x3be
36: 41 8b 2f mov (%r15),%ebp
39: 31 ff xor %edi,%edi
3b: 89 ee mov %ebp,%esi
3d: e8 .byte 0xe8
3e: 20 .byte 0x20
3f: 4c rex.WR


Tested on:

commit: 1d4590f5 Merge tag 'acpi-5.15-rc7' of git://git.kernel..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=156e86c4b00000
kernel config: https://syzkaller.appspot.com/x/.config?x=d95853dad8472c91
dashboard link: https://syzkaller.appspot.com/bug?extid=28b0702ada0bf7381f58
compiler: Debian clang version 11.0.1-2, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=16cb3daf300000

2021-10-22 20:59:34

by Pavel Skripkin

[permalink] [raw]
Subject: Re: [syzbot] WARNING in batadv_nc_mesh_free

On 10/22/21 23:20, syzbot wrote:
> Hello,
>
> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> general protection fault in batadv_nc_purge_paths


Oh, ok. Next clean up call in batadv_nc_mesh_free() caused GPF, since
fields are not initialized. Let's try to clean up one by one and do not
break dependencies.

Quite ugly one, but idea is correct, I guess

Also, make each *_init() call clean up all allocated stuff to not call
corresponding *_free() on error handling path, since it introduces
problems, as syzbot reported




With regards,
Pavel Skripkin


Attachments:
ph (3.72 kB)

2021-10-22 21:00:24

by Pavel Skripkin

[permalink] [raw]
Subject: Re: [syzbot] WARNING in batadv_nc_mesh_free

On 10/22/21 23:57, Pavel Skripkin wrote:
> On 10/22/21 23:20, syzbot wrote:
>> Hello,
>>
>> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
>> general protection fault in batadv_nc_purge_paths
>
>
> Oh, ok. Next clean up call in batadv_nc_mesh_free() caused GPF, since
> fields are not initialized. Let's try to clean up one by one and do not
> break dependencies.
>
> Quite ugly one, but idea is correct, I guess
>
> Also, make each *_init() call clean up all allocated stuff to not call
> corresponding *_free() on error handling path, since it introduces
> problems, as syzbot reported
>
>
>
>

Whooops.... Forgot to ask syzbot to test the patch

#syz test
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master




With regards,
Pavel Skripkin


Attachments:
ph (3.72 kB)

2021-10-23 05:15:26

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] WARNING in batadv_nc_mesh_free

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: [email protected]

Tested on:

commit: 9c0c4d24 Merge tag 'block-5.15-2021-10-22' of git://gi..
git tree: upstream
kernel config: https://syzkaller.appspot.com/x/.config?x=d95853dad8472c91
dashboard link: https://syzkaller.appspot.com/bug?extid=28b0702ada0bf7381f58
compiler: Debian clang version 11.0.1-2, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=1553d4c4b00000

Note: testing is done by a robot and is best-effort only.

2021-10-23 07:43:31

by Sven Eckelmann

[permalink] [raw]
Subject: Re: [syzbot] WARNING in batadv_nc_mesh_free

On Friday, 22 October 2021 22:58:15 CEST Pavel Skripkin wrote:
[...]
> > Oh, ok. Next clean up call in batadv_nc_mesh_free() caused GPF, since
> > fields are not initialized. Let's try to clean up one by one and do not
> > break dependencies.
> >
> > Quite ugly one, but idea is correct, I guess
> >
> > Also, make each *_init() call clean up all allocated stuff to not call
> > corresponding *_free() on error handling path, since it introduces
> > problems, as syzbot reported

Thanks for the patch + syzbot interactions. I just wanted to implement a
change - which would most likely have ended up the same way. Can you please
send it to netdev and Cc [email protected]? We don't have
anything else to submit at the moment for netdev and this patch can be applied
by netdev directly. I will add my Acked-by in this process.

Not sure about the Fixes. It is definitely wrong in the initial commit.... but
it got only really problematic when other features got introduced. I would
still say that the initial one should be mentioned.

Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")

@Linus, @Marek, @Antonio: Please check whether it is ok to move the
batadv_v_mesh_init after batadv_tt_init + batadv_originator_init.
batadv_v_mesh_init is basically there to initialize:

* bat_priv->bat_v.ogm_buff(|_len|_mutex)
* bat_priv->bat_v.ogm_seqno
* bat_priv->bat_v.ogm_wq

batadv_originator_init is there to initialize the

* bat_priv->orig_hash
* bat_priv->orig_work (batadv_purge_orig) + queue it up

batadv_tt_init is a lot more complex but should in theory not interact with
ogm specific algo ops.

I wouldn't know why there could be a problem but I would leave it to the
experts.

Kind regards,
Sven


Attachments:
signature.asc (849.00 B)
This is a digitally signed message part.

2021-10-24 13:18:42

by Pavel Skripkin

[permalink] [raw]
Subject: [PATCH] net: batman-adv: fix error handling

Syzbot reported ODEBUG warning in batadv_nc_mesh_free(). The problem was
in wrong error handling in batadv_mesh_init().

Before this patch batadv_mesh_init() was calling batadv_mesh_free() in case
of any batadv_*_init() calls failure. This approach may work well, when
there is some kind of indicator, which can tell which parts of batadv are
initialized; but there isn't any.

All written above lead to cleaning up uninitialized fields. Even if we hide
ODEBUG warning by initializing bat_priv->nc.work, syzbot was able to hit
GPF in batadv_nc_purge_paths(), because hash pointer in still NULL. [1]

To fix these bugs we can unwind batadv_*_init() calls one by one.
It is good approach for 2 reasons: 1) It fixes bugs on error handling
path 2) It improves the performance, since we won't call unneeded
batadv_*_free() functions.

So, this patch makes all batadv_*_init() clean up all allocated memory
before returning with an error to no call correspoing batadv_*_free()
and open-codes batadv_mesh_free() with proper order to avoid touching
uninitialized fields.

Link: https://lore.kernel.org/netdev/[email protected]/ [1]
Reported-and-tested-by: [email protected]
Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")
Signed-off-by: Pavel Skripkin <[email protected]>
---
net/batman-adv/bridge_loop_avoidance.c | 8 +++-
net/batman-adv/main.c | 56 ++++++++++++++++++--------
net/batman-adv/network-coding.c | 4 +-
net/batman-adv/translation-table.c | 4 +-
4 files changed, 52 insertions(+), 20 deletions(-)

diff --git a/net/batman-adv/bridge_loop_avoidance.c b/net/batman-adv/bridge_loop_avoidance.c
index 1669744304c5..17687848daec 100644
--- a/net/batman-adv/bridge_loop_avoidance.c
+++ b/net/batman-adv/bridge_loop_avoidance.c
@@ -1560,10 +1560,14 @@ int batadv_bla_init(struct batadv_priv *bat_priv)
return 0;

bat_priv->bla.claim_hash = batadv_hash_new(128);
- bat_priv->bla.backbone_hash = batadv_hash_new(32);
+ if (!bat_priv->bla.claim_hash)
+ return -ENOMEM;

- if (!bat_priv->bla.claim_hash || !bat_priv->bla.backbone_hash)
+ bat_priv->bla.backbone_hash = batadv_hash_new(32);
+ if (!bat_priv->bla.backbone_hash) {
+ batadv_hash_destroy(bat_priv->bla.claim_hash);
return -ENOMEM;
+ }

batadv_hash_set_lock_class(bat_priv->bla.claim_hash,
&batadv_claim_hash_lock_class_key);
diff --git a/net/batman-adv/main.c b/net/batman-adv/main.c
index 3ddd66e4c29e..5207cd8d6ad8 100644
--- a/net/batman-adv/main.c
+++ b/net/batman-adv/main.c
@@ -190,29 +190,41 @@ int batadv_mesh_init(struct net_device *soft_iface)

bat_priv->gw.generation = 0;

- ret = batadv_v_mesh_init(bat_priv);
- if (ret < 0)
- goto err;
-
ret = batadv_originator_init(bat_priv);
- if (ret < 0)
- goto err;
+ if (ret < 0) {
+ atomic_set(&bat_priv->mesh_state, BATADV_MESH_DEACTIVATING);
+ goto err_orig;
+ }

ret = batadv_tt_init(bat_priv);
- if (ret < 0)
- goto err;
+ if (ret < 0) {
+ atomic_set(&bat_priv->mesh_state, BATADV_MESH_DEACTIVATING);
+ goto err_tt;
+ }
+
+ ret = batadv_v_mesh_init(bat_priv);
+ if (ret < 0) {
+ atomic_set(&bat_priv->mesh_state, BATADV_MESH_DEACTIVATING);
+ goto err_v;
+ }

ret = batadv_bla_init(bat_priv);
- if (ret < 0)
- goto err;
+ if (ret < 0) {
+ atomic_set(&bat_priv->mesh_state, BATADV_MESH_DEACTIVATING);
+ goto err_bla;
+ }

ret = batadv_dat_init(bat_priv);
- if (ret < 0)
- goto err;
+ if (ret < 0) {
+ atomic_set(&bat_priv->mesh_state, BATADV_MESH_DEACTIVATING);
+ goto err_dat;
+ }

ret = batadv_nc_mesh_init(bat_priv);
- if (ret < 0)
- goto err;
+ if (ret < 0) {
+ atomic_set(&bat_priv->mesh_state, BATADV_MESH_DEACTIVATING);
+ goto err_nc;
+ }

batadv_gw_init(bat_priv);
batadv_mcast_init(bat_priv);
@@ -222,8 +234,20 @@ int batadv_mesh_init(struct net_device *soft_iface)

return 0;

-err:
- batadv_mesh_free(soft_iface);
+err_nc:
+ batadv_dat_free(bat_priv);
+err_dat:
+ batadv_bla_free(bat_priv);
+err_bla:
+ batadv_v_mesh_free(bat_priv);
+err_v:
+ batadv_tt_free(bat_priv);
+err_tt:
+ batadv_originator_free(bat_priv);
+err_orig:
+ batadv_purge_outstanding_packets(bat_priv, NULL);
+ atomic_set(&bat_priv->mesh_state, BATADV_MESH_INACTIVE);
+
return ret;
}

diff --git a/net/batman-adv/network-coding.c b/net/batman-adv/network-coding.c
index 9f06132e007d..0a7f1d36a6a8 100644
--- a/net/batman-adv/network-coding.c
+++ b/net/batman-adv/network-coding.c
@@ -152,8 +152,10 @@ int batadv_nc_mesh_init(struct batadv_priv *bat_priv)
&batadv_nc_coding_hash_lock_class_key);

bat_priv->nc.decoding_hash = batadv_hash_new(128);
- if (!bat_priv->nc.decoding_hash)
+ if (!bat_priv->nc.decoding_hash) {
+ batadv_hash_destroy(bat_priv->nc.coding_hash);
goto err;
+ }

batadv_hash_set_lock_class(bat_priv->nc.decoding_hash,
&batadv_nc_decoding_hash_lock_class_key);
diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
index e0b3dace2020..4b7ad6684bc4 100644
--- a/net/batman-adv/translation-table.c
+++ b/net/batman-adv/translation-table.c
@@ -4162,8 +4162,10 @@ int batadv_tt_init(struct batadv_priv *bat_priv)
return ret;

ret = batadv_tt_global_init(bat_priv);
- if (ret < 0)
+ if (ret < 0) {
+ batadv_tt_local_table_free(bat_priv);
return ret;
+ }

batadv_tvlv_handler_register(bat_priv, batadv_tt_tvlv_ogm_handler_v1,
batadv_tt_tvlv_unicast_handler_v1,
--
2.33.1

2021-10-24 15:05:17

by Sven Eckelmann

[permalink] [raw]
Subject: Re: [PATCH] net: batman-adv: fix error handling

On Sunday, 24 October 2021 15:13:56 CEST Pavel Skripkin wrote:
> Syzbot reported ODEBUG warning in batadv_nc_mesh_free(). The problem was
> in wrong error handling in batadv_mesh_init().
>
> Before this patch batadv_mesh_init() was calling batadv_mesh_free() in case
> of any batadv_*_init() calls failure. This approach may work well, when
> there is some kind of indicator, which can tell which parts of batadv are
> initialized; but there isn't any.
>
> All written above lead to cleaning up uninitialized fields. Even if we hide
> ODEBUG warning by initializing bat_priv->nc.work, syzbot was able to hit
> GPF in batadv_nc_purge_paths(), because hash pointer in still NULL. [1]
>
> To fix these bugs we can unwind batadv_*_init() calls one by one.
> It is good approach for 2 reasons: 1) It fixes bugs on error handling
> path 2) It improves the performance, since we won't call unneeded
> batadv_*_free() functions.
>
> So, this patch makes all batadv_*_init() clean up all allocated memory
> before returning with an error to no call correspoing batadv_*_free()
> and open-codes batadv_mesh_free() with proper order to avoid touching
> uninitialized fields.
>
> Link: https://lore.kernel.org/netdev/[email protected]/ [1]
> Reported-and-tested-by: [email protected]
> Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")
> Signed-off-by: Pavel Skripkin <[email protected]>

Acked-by: Sven Eckelmann <[email protected]>


Kind regards,
Sven


Attachments:
signature.asc (849.00 B)
This is a digitally signed message part.

2021-10-26 08:43:09

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH] net: batman-adv: fix error handling

On Sun, 24 Oct 2021 16:58:30 +0200 Sven Eckelmann wrote:
> On Sunday, 24 October 2021 15:13:56 CEST Pavel Skripkin wrote:
> > Syzbot reported ODEBUG warning in batadv_nc_mesh_free(). The problem was
> > in wrong error handling in batadv_mesh_init().
> >
> > Before this patch batadv_mesh_init() was calling batadv_mesh_free() in case
> > of any batadv_*_init() calls failure. This approach may work well, when
> > there is some kind of indicator, which can tell which parts of batadv are
> > initialized; but there isn't any.
> >
> > All written above lead to cleaning up uninitialized fields. Even if we hide
> > ODEBUG warning by initializing bat_priv->nc.work, syzbot was able to hit
> > GPF in batadv_nc_purge_paths(), because hash pointer in still NULL. [1]
> >
> > To fix these bugs we can unwind batadv_*_init() calls one by one.
> > It is good approach for 2 reasons: 1) It fixes bugs on error handling
> > path 2) It improves the performance, since we won't call unneeded
> > batadv_*_free() functions.
> >
> > So, this patch makes all batadv_*_init() clean up all allocated memory
> > before returning with an error to no call correspoing batadv_*_free()
> > and open-codes batadv_mesh_free() with proper order to avoid touching
> > uninitialized fields.
> >
> > Link: https://lore.kernel.org/netdev/[email protected]/ [1]
> > Reported-and-tested-by: [email protected]
> > Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")
> > Signed-off-by: Pavel Skripkin <[email protected]>
>
> Acked-by: Sven Eckelmann <[email protected]>

FWIW I'm marking this as "Awaiting upstream" in netdev patchwork,
please LMK if you prefer for it to be applied directly.

2021-10-26 09:53:48

by Sven Eckelmann

[permalink] [raw]
Subject: Re: [PATCH] net: batman-adv: fix error handling

On Tuesday, 26 October 2021 02:49:50 CEST Jakub Kicinski wrote:
[...]
> > Acked-by: Sven Eckelmann <[email protected]>
>
> FWIW I'm marking this as "Awaiting upstream" in netdev patchwork,
> please LMK if you prefer for it to be applied directly.

Please apply this directly. Thanks

Kind regards,
Sven


Attachments:
signature.asc (849.00 B)
This is a digitally signed message part.

2021-10-26 17:13:51

by patchwork-bot+netdevbpf

[permalink] [raw]
Subject: Re: [PATCH] net: batman-adv: fix error handling

Hello:

This patch was applied to netdev/net.git (master)
by David S. Miller <[email protected]>:

On Sun, 24 Oct 2021 16:13:56 +0300 you wrote:
> Syzbot reported ODEBUG warning in batadv_nc_mesh_free(). The problem was
> in wrong error handling in batadv_mesh_init().
>
> Before this patch batadv_mesh_init() was calling batadv_mesh_free() in case
> of any batadv_*_init() calls failure. This approach may work well, when
> there is some kind of indicator, which can tell which parts of batadv are
> initialized; but there isn't any.
>
> [...]

Here is the summary with links:
- net: batman-adv: fix error handling
https://git.kernel.org/netdev/net/c/6f68cd634856

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html