2020-01-17 13:47:22

by syzbot

[permalink] [raw]
Subject: general protection fault in can_rx_register

Hello,

syzbot found the following crash on:

HEAD commit: f5ae2ea6 Fix built-in early-load Intel microcode alignment
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1033df15e00000
kernel config: https://syzkaller.appspot.com/x/.config?x=cfbb8fa33f49f9f3
dashboard link: https://syzkaller.appspot.com/bug?extid=c3ea30e1e2485573f953
compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/
c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13204f15e00000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=138f5db9e00000

The bug was bisected to:

commit 9868b5d44f3df9dd75247acd23dddff0a42f79be
Author: Kurt Van Dijck <[email protected]>
Date: Mon Oct 8 09:48:33 2018 +0000

can: introduce CAN_REQUIRED_SIZE macro

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=129bfdb9e00000
final crash: https://syzkaller.appspot.com/x/report.txt?x=119bfdb9e00000
console output: https://syzkaller.appspot.com/x/log.txt?x=169bfdb9e00000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: [email protected]
Fixes: 9868b5d44f3d ("can: introduce CAN_REQUIRED_SIZE macro")

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 9593 Comm: syz-executor302 Not tainted 5.5.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:hlist_add_head_rcu include/linux/rculist.h:528 [inline]
RIP: 0010:can_rx_register+0x43b/0x600 net/can/af_can.c:476
Code: 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 89 22 8a fa 4c
89 33 4d 89 e5 49 c1 ed 03 48 b8 00 00 00 00 00 fc ff df <41> 80 7c 05 00
00 74 08 4c 89 e7 e8 c5 21 8a fa 4d 8b 34 24 4c 89
RSP: 0018:ffffc90003e27d00 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: ffff8880a77336c8 RCX: ffff88809306a100
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a77336c0
RBP: ffffc90003e27d58 R08: ffffffff87289cd6 R09: fffff520007c4f94
R10: fffff520007c4f94 R11: 0000000000000000 R12: 0000000000000008
R13: 0000000000000001 R14: ffff88809fbcf000 R15: ffff8880a7733690
FS: 00007fb132f26700(0000) GS:ffff8880aec00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000178f590 CR3: 00000000996d6000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
raw_enable_filters net/can/raw.c:189 [inline]
raw_enable_allfilters net/can/raw.c:255 [inline]
raw_bind+0x326/0x1230 net/can/raw.c:428
__sys_bind+0x2bd/0x3a0 net/socket.c:1649
__do_sys_bind net/socket.c:1660 [inline]
__se_sys_bind net/socket.c:1658 [inline]
__x64_sys_bind+0x7a/0x90 net/socket.c:1658
do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x446ba9
Code: e8 0c e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 5b 07 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fb132f25d98 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
RAX: ffffffffffffffda RBX: 00000000006dbc88 RCX: 0000000000446ba9
RDX: 0000000000000008 RSI: 0000000020000180 RDI: 0000000000000003
RBP: 00000000006dbc80 R08: 00007fb132f26700 R09: 0000000000000000
R10: 00007fb132f26700 R11: 0000000000000246 R12: 00000000006dbc8c
R13: 0000000000000000 R14: 0000000000000000 R15: 068500100000003c
Modules linked in:
---[ end trace 0dedabb13ca8e7d7 ]---
RIP: 0010:hlist_add_head_rcu include/linux/rculist.h:528 [inline]
RIP: 0010:can_rx_register+0x43b/0x600 net/can/af_can.c:476
Code: 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 89 22 8a fa 4c
89 33 4d 89 e5 49 c1 ed 03 48 b8 00 00 00 00 00 fc ff df <41> 80 7c 05 00
00 74 08 4c 89 e7 e8 c5 21 8a fa 4d 8b 34 24 4c 89
RSP: 0018:ffffc90003e27d00 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: ffff8880a77336c8 RCX: ffff88809306a100
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a77336c0
RBP: ffffc90003e27d58 R08: ffffffff87289cd6 R09: fffff520007c4f94
R10: fffff520007c4f94 R11: 0000000000000000 R12: 0000000000000008
R13: 0000000000000001 R14: ffff88809fbcf000 R15: ffff8880a7733690
FS: 00007fb132f26700(0000) GS:ffff8880aec00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000178f590 CR3: 00000000996d6000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches


2020-01-17 20:17:09

by Oliver Hartkopp

[permalink] [raw]
Subject: Re: general protection fault in can_rx_register

Hi Marc, Oleksij, Kurt,

On 17/01/2020 14.46, syzbot wrote:
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:    f5ae2ea6 Fix built-in early-load Intel microcode alignment
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1033df15e00000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=cfbb8fa33f49f9f3
> dashboard link:
> https://syzkaller.appspot.com/bug?extid=c3ea30e1e2485573f953
> compiler:       clang version 10.0.0
> (https://github.com/llvm/llvm-project/
> c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=13204f15e00000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=138f5db9e00000
>
> The bug was bisected to:
>
> commit 9868b5d44f3df9dd75247acd23dddff0a42f79be
> Author: Kurt Van Dijck <[email protected]>
> Date:   Mon Oct 8 09:48:33 2018 +0000
>
>     can: introduce CAN_REQUIRED_SIZE macro
>
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=129bfdb9e00000
> final crash:    https://syzkaller.appspot.com/x/report.txt?x=119bfdb9e00000
> console output: https://syzkaller.appspot.com/x/log.txt?x=169bfdb9e00000
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: [email protected]
> Fixes: 9868b5d44f3d ("can: introduce CAN_REQUIRED_SIZE macro")
>
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault: 0000 [#1] PREEMPT SMP KASAN
> CPU: 0 PID: 9593 Comm: syz-executor302 Not tainted 5.5.0-rc6-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> RIP: 0010:hlist_add_head_rcu include/linux/rculist.h:528 [inline]
> RIP: 0010:can_rx_register+0x43b/0x600 net/can/af_can.c:476

include/linux/rculist.h:528 is

struct hlist_node *first = h->first;

which would mean that 'h' must be NULL.

But the h parameter is rcv_list from
rcv_list = can_rcv_list_find(&can_id, &mask, dev_rcv_lists);

Which can not return NULL - at least when dev_rcv_lists is a proper
pointer to the dev_rcv_lists provided by can_dev_rcv_lists_find().

So either dev->ml_priv is NULL in the case of having a CAN interface
(here vxcan) or we have not allocated net->can.rx_alldev_list in
can_pernet_init() properly (which would lead to an -ENOMEM which is
reported to whom?).

Hm. I'm lost. Any ideas?

Regards,
Oliver


> Code: 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 89 22 8a fa
> 4c 89 33 4d 89 e5 49 c1 ed 03 48 b8 00 00 00 00 00 fc ff df <41> 80 7c
> 05 00 00 74 08 4c 89 e7 e8 c5 21 8a fa 4d 8b 34 24 4c 89
> RSP: 0018:ffffc90003e27d00 EFLAGS: 00010202
> RAX: dffffc0000000000 RBX: ffff8880a77336c8 RCX: ffff88809306a100
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a77336c0
> RBP: ffffc90003e27d58 R08: ffffffff87289cd6 R09: fffff520007c4f94
> R10: fffff520007c4f94 R11: 0000000000000000 R12: 0000000000000008
> R13: 0000000000000001 R14: ffff88809fbcf000 R15: ffff8880a7733690
> FS:  00007fb132f26700(0000) GS:ffff8880aec00000(0000)
> knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000000178f590 CR3: 00000000996d6000 CR4: 00000000001406f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  raw_enable_filters net/can/raw.c:189 [inline]
>  raw_enable_allfilters net/can/raw.c:255 [inline]
>  raw_bind+0x326/0x1230 net/can/raw.c:428
>  __sys_bind+0x2bd/0x3a0 net/socket.c:1649
>  __do_sys_bind net/socket.c:1660 [inline]
>  __se_sys_bind net/socket.c:1658 [inline]
>  __x64_sys_bind+0x7a/0x90 net/socket.c:1658
>  do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x446ba9
> Code: e8 0c e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89
> f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
> f0 ff ff 0f 83 5b 07 fc ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:00007fb132f25d98 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
> RAX: ffffffffffffffda RBX: 00000000006dbc88 RCX: 0000000000446ba9
> RDX: 0000000000000008 RSI: 0000000020000180 RDI: 0000000000000003
> RBP: 00000000006dbc80 R08: 00007fb132f26700 R09: 0000000000000000
> R10: 00007fb132f26700 R11: 0000000000000246 R12: 00000000006dbc8c
> R13: 0000000000000000 R14: 0000000000000000 R15: 068500100000003c
> Modules linked in:
> ---[ end trace 0dedabb13ca8e7d7 ]---
> RIP: 0010:hlist_add_head_rcu include/linux/rculist.h:528 [inline]
> RIP: 0010:can_rx_register+0x43b/0x600 net/can/af_can.c:476
> Code: 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 89 22 8a fa
> 4c 89 33 4d 89 e5 49 c1 ed 03 48 b8 00 00 00 00 00 fc ff df <41> 80 7c
> 05 00 00 74 08 4c 89 e7 e8 c5 21 8a fa 4d 8b 34 24 4c 89
> RSP: 0018:ffffc90003e27d00 EFLAGS: 00010202
> RAX: dffffc0000000000 RBX: ffff8880a77336c8 RCX: ffff88809306a100
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a77336c0
> RBP: ffffc90003e27d58 R08: ffffffff87289cd6 R09: fffff520007c4f94
> R10: fffff520007c4f94 R11: 0000000000000000 R12: 0000000000000008
> R13: 0000000000000001 R14: ffff88809fbcf000 R15: ffff8880a7733690
> FS:  00007fb132f26700(0000) GS:ffff8880aec00000(0000)
> knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000000178f590 CR3: 00000000996d6000 CR4: 00000000001406f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>
>
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at [email protected].
>
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> For information about bisection process see:
> https://goo.gl/tpsmEJ#bisection
> syzbot can test patches for this bug, for details see:
> https://goo.gl/tpsmEJ#testing-patches

2020-01-20 09:24:21

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: general protection fault in can_rx_register

On Mon, Jan 20, 2020 at 10:11 AM Kurt Van Dijck
<[email protected]> wrote:
>
> If bisect was right with this:
>
> > >The bug was bisected to:
> > >
> > >commit 9868b5d44f3df9dd75247acd23dddff0a42f79be
> > >Author: Kurt Van Dijck <[email protected]>
> > >Date: Mon Oct 8 09:48:33 2018 +0000
> > >
> > > can: introduce CAN_REQUIRED_SIZE macro
>
> Then I'd start looking in malformed sockaddr_can data instead.
>
> Is this code what triggers the bug?
> > >C reproducer: https://syzkaller.appspot.com/x/repro.c?x=138f5db9e00000

yes

> Kind regards,
> Kurt
>
> On vr, 17 jan 2020 21:02:48 +0100, Oliver Hartkopp wrote:
> > Hi Marc, Oleksij, Kurt,
> >
> > On 17/01/2020 14.46, syzbot wrote:
> > >Hello,
> > >
> > >syzbot found the following crash on:
> > >
> > >HEAD commit: f5ae2ea6 Fix built-in early-load Intel microcode alignment
> > >git tree: upstream
> > >console output: https://syzkaller.appspot.com/x/log.txt?x=1033df15e00000
> > >kernel config: https://syzkaller.appspot.com/x/.config?x=cfbb8fa33f49f9f3
> > >dashboard link:
> > >https://syzkaller.appspot.com/bug?extid=c3ea30e1e2485573f953
> > >compiler: clang version 10.0.0
> > >(https://github.com/llvm/llvm-project/
> > >c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
> > >syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13204f15e00000
> > >C reproducer: https://syzkaller.appspot.com/x/repro.c?x=138f5db9e00000
> > >
> > >The bug was bisected to:
> > >
> > >commit 9868b5d44f3df9dd75247acd23dddff0a42f79be
> > >Author: Kurt Van Dijck <[email protected]>
> > >Date: Mon Oct 8 09:48:33 2018 +0000
> > >
> > > can: introduce CAN_REQUIRED_SIZE macro
> > >
> > >bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=129bfdb9e00000
> > >final crash: https://syzkaller.appspot.com/x/report.txt?x=119bfdb9e00000
> > >console output: https://syzkaller.appspot.com/x/log.txt?x=169bfdb9e00000
> > >
> > >IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > >Reported-by: [email protected]
> > >Fixes: 9868b5d44f3d ("can: introduce CAN_REQUIRED_SIZE macro")
> > >
> > >kasan: CONFIG_KASAN_INLINE enabled
> > >kasan: GPF could be caused by NULL-ptr deref or user memory access
> > >general protection fault: 0000 [#1] PREEMPT SMP KASAN
> > >CPU: 0 PID: 9593 Comm: syz-executor302 Not tainted 5.5.0-rc6-syzkaller #0
> > >Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > >Google 01/01/2011
> > >RIP: 0010:hlist_add_head_rcu include/linux/rculist.h:528 [inline]
> > >RIP: 0010:can_rx_register+0x43b/0x600 net/can/af_can.c:476
> >
> > include/linux/rculist.h:528 is
> >
> > struct hlist_node *first = h->first;
> >
> > which would mean that 'h' must be NULL.
> >
> > But the h parameter is rcv_list from
> > rcv_list = can_rcv_list_find(&can_id, &mask, dev_rcv_lists);
> >
> > Which can not return NULL - at least when dev_rcv_lists is a proper pointer
> > to the dev_rcv_lists provided by can_dev_rcv_lists_find().
> >
> > So either dev->ml_priv is NULL in the case of having a CAN interface (here
> > vxcan) or we have not allocated net->can.rx_alldev_list in can_pernet_init()
> > properly (which would lead to an -ENOMEM which is reported to whom?).
> >
> > Hm. I'm lost. Any ideas?
> >
> > Regards,
> > Oliver
> >
> >
> > >Code: 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 89 22 8a fa 4c
> > >89 33 4d 89 e5 49 c1 ed 03 48 b8 00 00 00 00 00 fc ff df <41> 80 7c 05 00
> > >00 74 08 4c 89 e7 e8 c5 21 8a fa 4d 8b 34 24 4c 89
> > >RSP: 0018:ffffc90003e27d00 EFLAGS: 00010202
> > >RAX: dffffc0000000000 RBX: ffff8880a77336c8 RCX: ffff88809306a100
> > >RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a77336c0
> > >RBP: ffffc90003e27d58 R08: ffffffff87289cd6 R09: fffff520007c4f94
> > >R10: fffff520007c4f94 R11: 0000000000000000 R12: 0000000000000008
> > >R13: 0000000000000001 R14: ffff88809fbcf000 R15: ffff8880a7733690
> > >FS: 00007fb132f26700(0000) GS:ffff8880aec00000(0000)
> > >knlGS:0000000000000000
> > >CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >CR2: 000000000178f590 CR3: 00000000996d6000 CR4: 00000000001406f0
> > >DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > >DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > >Call Trace:
> > > raw_enable_filters net/can/raw.c:189 [inline]
> > > raw_enable_allfilters net/can/raw.c:255 [inline]
> > > raw_bind+0x326/0x1230 net/can/raw.c:428
> > > __sys_bind+0x2bd/0x3a0 net/socket.c:1649
> > > __do_sys_bind net/socket.c:1660 [inline]
> > > __se_sys_bind net/socket.c:1658 [inline]
> > > __x64_sys_bind+0x7a/0x90 net/socket.c:1658
> > > do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294
> > > entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > >RIP: 0033:0x446ba9
> > >Code: e8 0c e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
> > >48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
> > >ff 0f 83 5b 07 fc ff c3 66 2e 0f 1f 84 00 00 00 00
> > >RSP: 002b:00007fb132f25d98 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
> > >RAX: ffffffffffffffda RBX: 00000000006dbc88 RCX: 0000000000446ba9
> > >RDX: 0000000000000008 RSI: 0000000020000180 RDI: 0000000000000003
> > >RBP: 00000000006dbc80 R08: 00007fb132f26700 R09: 0000000000000000
> > >R10: 00007fb132f26700 R11: 0000000000000246 R12: 00000000006dbc8c
> > >R13: 0000000000000000 R14: 0000000000000000 R15: 068500100000003c
> > >Modules linked in:
> > >---[ end trace 0dedabb13ca8e7d7 ]---
> > >RIP: 0010:hlist_add_head_rcu include/linux/rculist.h:528 [inline]
> > >RIP: 0010:can_rx_register+0x43b/0x600 net/can/af_can.c:476
> > >Code: 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 89 22 8a fa 4c
> > >89 33 4d 89 e5 49 c1 ed 03 48 b8 00 00 00 00 00 fc ff df <41> 80 7c 05 00
> > >00 74 08 4c 89 e7 e8 c5 21 8a fa 4d 8b 34 24 4c 89
> > >RSP: 0018:ffffc90003e27d00 EFLAGS: 00010202
> > >RAX: dffffc0000000000 RBX: ffff8880a77336c8 RCX: ffff88809306a100
> > >RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a77336c0
> > >RBP: ffffc90003e27d58 R08: ffffffff87289cd6 R09: fffff520007c4f94
> > >R10: fffff520007c4f94 R11: 0000000000000000 R12: 0000000000000008
> > >R13: 0000000000000001 R14: ffff88809fbcf000 R15: ffff8880a7733690
> > >FS: 00007fb132f26700(0000) GS:ffff8880aec00000(0000)
> > >knlGS:0000000000000000
> > >CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >CR2: 000000000178f590 CR3: 00000000996d6000 CR4: 00000000001406f0
> > >DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > >DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > >
> > >
> > >---
> > >This bug is generated by a bot. It may contain errors.
> > >See https://goo.gl/tpsmEJ for more information about syzbot.
> > >syzbot engineers can be reached at [email protected].
> > >
> > >syzbot will keep track of this bug report. See:
> > >https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> > >For information about bisection process see:
> > >https://goo.gl/tpsmEJ#bisection
> > >syzbot can test patches for this bug, for details see:
> > >https://goo.gl/tpsmEJ#testing-patches
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20200120091146.GD11138%40x1.vandijck-laurijssen.be.

2020-01-20 09:32:13

by Kurt Van Dijck

[permalink] [raw]
Subject: Re: general protection fault in can_rx_register

If bisect was right with this:

> >The bug was bisected to:
> >
> >commit 9868b5d44f3df9dd75247acd23dddff0a42f79be
> >Author: Kurt Van Dijck <[email protected]>
> >Date:   Mon Oct 8 09:48:33 2018 +0000
> >
> >     can: introduce CAN_REQUIRED_SIZE macro

Then I'd start looking in malformed sockaddr_can data instead.

Is this code what triggers the bug?
> >C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=138f5db9e00000

Kind regards,
Kurt

On vr, 17 jan 2020 21:02:48 +0100, Oliver Hartkopp wrote:
> Hi Marc, Oleksij, Kurt,
>
> On 17/01/2020 14.46, syzbot wrote:
> >Hello,
> >
> >syzbot found the following crash on:
> >
> >HEAD commit:    f5ae2ea6 Fix built-in early-load Intel microcode alignment
> >git tree:       upstream
> >console output: https://syzkaller.appspot.com/x/log.txt?x=1033df15e00000
> >kernel config:  https://syzkaller.appspot.com/x/.config?x=cfbb8fa33f49f9f3
> >dashboard link:
> >https://syzkaller.appspot.com/bug?extid=c3ea30e1e2485573f953
> >compiler:       clang version 10.0.0
> >(https://github.com/llvm/llvm-project/
> >c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
> >syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=13204f15e00000
> >C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=138f5db9e00000
> >
> >The bug was bisected to:
> >
> >commit 9868b5d44f3df9dd75247acd23dddff0a42f79be
> >Author: Kurt Van Dijck <[email protected]>
> >Date:   Mon Oct 8 09:48:33 2018 +0000
> >
> >     can: introduce CAN_REQUIRED_SIZE macro
> >
> >bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=129bfdb9e00000
> >final crash:    https://syzkaller.appspot.com/x/report.txt?x=119bfdb9e00000
> >console output: https://syzkaller.appspot.com/x/log.txt?x=169bfdb9e00000
> >
> >IMPORTANT: if you fix the bug, please add the following tag to the commit:
> >Reported-by: [email protected]
> >Fixes: 9868b5d44f3d ("can: introduce CAN_REQUIRED_SIZE macro")
> >
> >kasan: CONFIG_KASAN_INLINE enabled
> >kasan: GPF could be caused by NULL-ptr deref or user memory access
> >general protection fault: 0000 [#1] PREEMPT SMP KASAN
> >CPU: 0 PID: 9593 Comm: syz-executor302 Not tainted 5.5.0-rc6-syzkaller #0
> >Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> >Google 01/01/2011
> >RIP: 0010:hlist_add_head_rcu include/linux/rculist.h:528 [inline]
> >RIP: 0010:can_rx_register+0x43b/0x600 net/can/af_can.c:476
>
> include/linux/rculist.h:528 is
>
> struct hlist_node *first = h->first;
>
> which would mean that 'h' must be NULL.
>
> But the h parameter is rcv_list from
> rcv_list = can_rcv_list_find(&can_id, &mask, dev_rcv_lists);
>
> Which can not return NULL - at least when dev_rcv_lists is a proper pointer
> to the dev_rcv_lists provided by can_dev_rcv_lists_find().
>
> So either dev->ml_priv is NULL in the case of having a CAN interface (here
> vxcan) or we have not allocated net->can.rx_alldev_list in can_pernet_init()
> properly (which would lead to an -ENOMEM which is reported to whom?).
>
> Hm. I'm lost. Any ideas?
>
> Regards,
> Oliver
>
>
> >Code: 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 89 22 8a fa 4c
> >89 33 4d 89 e5 49 c1 ed 03 48 b8 00 00 00 00 00 fc ff df <41> 80 7c 05 00
> >00 74 08 4c 89 e7 e8 c5 21 8a fa 4d 8b 34 24 4c 89
> >RSP: 0018:ffffc90003e27d00 EFLAGS: 00010202
> >RAX: dffffc0000000000 RBX: ffff8880a77336c8 RCX: ffff88809306a100
> >RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a77336c0
> >RBP: ffffc90003e27d58 R08: ffffffff87289cd6 R09: fffff520007c4f94
> >R10: fffff520007c4f94 R11: 0000000000000000 R12: 0000000000000008
> >R13: 0000000000000001 R14: ffff88809fbcf000 R15: ffff8880a7733690
> >FS:  00007fb132f26700(0000) GS:ffff8880aec00000(0000)
> >knlGS:0000000000000000
> >CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >CR2: 000000000178f590 CR3: 00000000996d6000 CR4: 00000000001406f0
> >DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >Call Trace:
> >  raw_enable_filters net/can/raw.c:189 [inline]
> >  raw_enable_allfilters net/can/raw.c:255 [inline]
> >  raw_bind+0x326/0x1230 net/can/raw.c:428
> >  __sys_bind+0x2bd/0x3a0 net/socket.c:1649
> >  __do_sys_bind net/socket.c:1660 [inline]
> >  __se_sys_bind net/socket.c:1658 [inline]
> >  __x64_sys_bind+0x7a/0x90 net/socket.c:1658
> >  do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294
> >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >RIP: 0033:0x446ba9
> >Code: e8 0c e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
> >48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
> >ff 0f 83 5b 07 fc ff c3 66 2e 0f 1f 84 00 00 00 00
> >RSP: 002b:00007fb132f25d98 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
> >RAX: ffffffffffffffda RBX: 00000000006dbc88 RCX: 0000000000446ba9
> >RDX: 0000000000000008 RSI: 0000000020000180 RDI: 0000000000000003
> >RBP: 00000000006dbc80 R08: 00007fb132f26700 R09: 0000000000000000
> >R10: 00007fb132f26700 R11: 0000000000000246 R12: 00000000006dbc8c
> >R13: 0000000000000000 R14: 0000000000000000 R15: 068500100000003c
> >Modules linked in:
> >---[ end trace 0dedabb13ca8e7d7 ]---
> >RIP: 0010:hlist_add_head_rcu include/linux/rculist.h:528 [inline]
> >RIP: 0010:can_rx_register+0x43b/0x600 net/can/af_can.c:476
> >Code: 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 89 22 8a fa 4c
> >89 33 4d 89 e5 49 c1 ed 03 48 b8 00 00 00 00 00 fc ff df <41> 80 7c 05 00
> >00 74 08 4c 89 e7 e8 c5 21 8a fa 4d 8b 34 24 4c 89
> >RSP: 0018:ffffc90003e27d00 EFLAGS: 00010202
> >RAX: dffffc0000000000 RBX: ffff8880a77336c8 RCX: ffff88809306a100
> >RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a77336c0
> >RBP: ffffc90003e27d58 R08: ffffffff87289cd6 R09: fffff520007c4f94
> >R10: fffff520007c4f94 R11: 0000000000000000 R12: 0000000000000008
> >R13: 0000000000000001 R14: ffff88809fbcf000 R15: ffff8880a7733690
> >FS:  00007fb132f26700(0000) GS:ffff8880aec00000(0000)
> >knlGS:0000000000000000
> >CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >CR2: 000000000178f590 CR3: 00000000996d6000 CR4: 00000000001406f0
> >DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >
> >
> >---
> >This bug is generated by a bot. It may contain errors.
> >See https://goo.gl/tpsmEJ for more information about syzbot.
> >syzbot engineers can be reached at [email protected].
> >
> >syzbot will keep track of this bug report. See:
> >https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> >For information about bisection process see:
> >https://goo.gl/tpsmEJ#bisection
> >syzbot can test patches for this bug, for details see:
> >https://goo.gl/tpsmEJ#testing-patches

2020-01-20 22:04:33

by Oliver Hartkopp

[permalink] [raw]
Subject: Re: general protection fault in can_rx_register

Hi all,

On 20/01/2020 10.22, Dmitry Vyukov wrote:

>> Is this code what triggers the bug?
>>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=138f5db9e00000
>
> yes
>

(..)

>>>> RIP: 0010:hlist_add_head_rcu include/linux/rculist.h:528 [inline]
>>>> RIP: 0010:can_rx_register+0x43b/0x600 net/can/af_can.c:476
>>>
>>> include/linux/rculist.h:528 is
>>>
>>> struct hlist_node *first = h->first;
>>>
>>> which would mean that 'h' must be NULL.
>>>
>>> But the h parameter is rcv_list from
>>> rcv_list = can_rcv_list_find(&can_id, &mask, dev_rcv_lists);
>>>
>>> Which can not return NULL - at least when dev_rcv_lists is a proper pointer
>>> to the dev_rcv_lists provided by can_dev_rcv_lists_find().
>>>
>>> So either dev->ml_priv is NULL in the case of having a CAN interface (here
>>> vxcan) ...

Added some code to check whether dev->ml_priv is NULL:

~/linux$ git diff
diff --git a/net/can/af_can.c b/net/can/af_can.c
index 128d37a4c2e0..6fb4ae4c359e 100644
--- a/net/can/af_can.c
+++ b/net/can/af_can.c
@@ -463,6 +463,10 @@ int can_rx_register(struct net *net, struct
net_device *dev, canid_t can_id,
spin_lock_bh(&net->can.rcvlists_lock);

dev_rcv_lists = can_dev_rcv_lists_find(net, dev);
+ if (!dev_rcv_lists) {
+ pr_err("dev_rcv_lists == NULL! %p\n", dev);
+ goto out_unlock;
+ }
rcv_list = can_rcv_list_find(&can_id, &mask, dev_rcv_lists);

rcv->can_id = can_id;
@@ -479,6 +483,7 @@ int can_rx_register(struct net *net, struct
net_device *dev, canid_t can_id,
rcv_lists_stats->rcv_entries++;
rcv_lists_stats->rcv_entries_max =
max(rcv_lists_stats->rcv_entries_max,

rcv_lists_stats->rcv_entries);
+out_unlock:
spin_unlock_bh(&net->can.rcvlists_lock);

return err;

And the output (after some time) is:

[ 758.505841] netlink: 'crash': attribute type 1 has an invalid length.
[ 758.508045] bond7148: (slave vxcan1): The slave device specified does
not support setting the MAC address
[ 758.508057] bond7148: (slave vxcan1): Error -22 calling dev_set_mtu
[ 758.532025] bond10413: (slave vxcan1): The slave device specified
does not support setting the MAC address
[ 758.532043] bond10413: (slave vxcan1): Error -22 calling dev_set_mtu
[ 758.532254] dev_rcv_lists == NULL! 000000006b9d257f
[ 758.547392] netlink: 'crash': attribute type 1 has an invalid length.
[ 758.549310] bond7145: (slave vxcan1): The slave device specified does
not support setting the MAC address
[ 758.549313] bond7145: (slave vxcan1): Error -22 calling dev_set_mtu
[ 758.550464] netlink: 'crash': attribute type 1 has an invalid length.
[ 758.552301] bond7146: (slave vxcan1): The slave device specified does
not support setting the MAC address

So we can see that we get a ml_priv pointer which is NULL which should
not be possible due to this:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/can/dev.c#n743

Btw. the variable 'size' is set two times at the top of
alloc_candev_mqs() depending on echo_skb_max. This looks wrong.

Best regards,
Oliver

2020-01-20 22:37:41

by Oliver Hartkopp

[permalink] [raw]
Subject: Re: general protection fault in can_rx_register

Answering myself ...

On 20/01/2020 23.02, Oliver Hartkopp wrote:

>
> Added some code to check whether dev->ml_priv is NULL:
>
> ~/linux$ git diff
> diff --git a/net/can/af_can.c b/net/can/af_can.c
> index 128d37a4c2e0..6fb4ae4c359e 100644
> --- a/net/can/af_can.c
> +++ b/net/can/af_can.c
> @@ -463,6 +463,10 @@ int can_rx_register(struct net *net, struct
> net_device *dev, canid_t can_id,
>         spin_lock_bh(&net->can.rcvlists_lock);
>
>         dev_rcv_lists = can_dev_rcv_lists_find(net, dev);
> +       if (!dev_rcv_lists) {
> +               pr_err("dev_rcv_lists == NULL! %p\n", dev);
> +               goto out_unlock;
> +       }
>         rcv_list = can_rcv_list_find(&can_id, &mask, dev_rcv_lists);
>
>         rcv->can_id = can_id;
> @@ -479,6 +483,7 @@ int can_rx_register(struct net *net, struct
> net_device *dev, canid_t can_id,
>         rcv_lists_stats->rcv_entries++;
>         rcv_lists_stats->rcv_entries_max =
> max(rcv_lists_stats->rcv_entries_max,
>
> rcv_lists_stats->rcv_entries);
> +out_unlock:
>         spin_unlock_bh(&net->can.rcvlists_lock);
>
>         return err;
>
> And the output (after some time) is:
>
> [  758.505841] netlink: 'crash': attribute type 1 has an invalid length.
> [  758.508045] bond7148: (slave vxcan1): The slave device specified does
> not support setting the MAC address
> [  758.508057] bond7148: (slave vxcan1): Error -22 calling dev_set_mtu
> [  758.532025] bond10413: (slave vxcan1): The slave device specified
> does not support setting the MAC address
> [  758.532043] bond10413: (slave vxcan1): Error -22 calling dev_set_mtu
> [  758.532254] dev_rcv_lists == NULL! 000000006b9d257f
> [  758.547392] netlink: 'crash': attribute type 1 has an invalid length.
> [  758.549310] bond7145: (slave vxcan1): The slave device specified does
> not support setting the MAC address
> [  758.549313] bond7145: (slave vxcan1): Error -22 calling dev_set_mtu
> [  758.550464] netlink: 'crash': attribute type 1 has an invalid length.
> [  758.552301] bond7146: (slave vxcan1): The slave device specified does
> not support setting the MAC address
>
> So we can see that we get a ml_priv pointer which is NULL which should
> not be possible due to this:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/can/dev.c#n743

This reference doesn't point to the right code as vxcan has its own
handling do assign ml_priv in vxcan.c .

> Btw. the variable 'size' is set two times at the top of
> alloc_candev_mqs() depending on echo_skb_max. This looks wrong.

No. It looks right as I did not get behind the ALIGN() macro at first sight.

But it is still open why dev->ml_priv is not set correctly in vxcan.c as
all the settings for .priv_size and in vxcan_setup look fine.

Best regards,
Oliver

2020-01-21 08:31:56

by Kurt Van Dijck

[permalink] [raw]
Subject: Re: general protection fault in can_rx_register

On ma, 20 jan 2020 23:35:16 +0100, Oliver Hartkopp wrote:
> Answering myself ...
>
> On 20/01/2020 23.02, Oliver Hartkopp wrote:
>
> >
> >Added some code to check whether dev->ml_priv is NULL:
> >
> >~/linux$ git diff
> >diff --git a/net/can/af_can.c b/net/can/af_can.c
> >index 128d37a4c2e0..6fb4ae4c359e 100644
> >--- a/net/can/af_can.c
> >+++ b/net/can/af_can.c
> >@@ -463,6 +463,10 @@ int can_rx_register(struct net *net, struct
> >net_device *dev, canid_t can_id,
> >         spin_lock_bh(&net->can.rcvlists_lock);
> >
> >         dev_rcv_lists = can_dev_rcv_lists_find(net, dev);
> >+       if (!dev_rcv_lists) {
> >+               pr_err("dev_rcv_lists == NULL! %p\n", dev);
> >+               goto out_unlock;
> >+       }
> >         rcv_list = can_rcv_list_find(&can_id, &mask, dev_rcv_lists);
> >
> >         rcv->can_id = can_id;
> >@@ -479,6 +483,7 @@ int can_rx_register(struct net *net, struct net_device
> >*dev, canid_t can_id,
> >         rcv_lists_stats->rcv_entries++;
> >         rcv_lists_stats->rcv_entries_max =
> >max(rcv_lists_stats->rcv_entries_max,
> >
> >rcv_lists_stats->rcv_entries);
> >+out_unlock:
> >         spin_unlock_bh(&net->can.rcvlists_lock);
> >
> >         return err;
> >
> >And the output (after some time) is:
> >
> >[  758.505841] netlink: 'crash': attribute type 1 has an invalid length.
> >[  758.508045] bond7148: (slave vxcan1): The slave device specified does
> >not support setting the MAC address
> >[  758.508057] bond7148: (slave vxcan1): Error -22 calling dev_set_mtu
> >[  758.532025] bond10413: (slave vxcan1): The slave device specified does
> >not support setting the MAC address
> >[  758.532043] bond10413: (slave vxcan1): Error -22 calling dev_set_mtu
> >[  758.532254] dev_rcv_lists == NULL! 000000006b9d257f
> >[  758.547392] netlink: 'crash': attribute type 1 has an invalid length.
> >[  758.549310] bond7145: (slave vxcan1): The slave device specified does
> >not support setting the MAC address
> >[  758.549313] bond7145: (slave vxcan1): Error -22 calling dev_set_mtu
> >[  758.550464] netlink: 'crash': attribute type 1 has an invalid length.
> >[  758.552301] bond7146: (slave vxcan1): The slave device specified does
> >not support setting the MAC address
> >
> >So we can see that we get a ml_priv pointer which is NULL which should not
> >be possible due to this:
> >
> >https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/can/dev.c#n743
>
> This reference doesn't point to the right code as vxcan has its own handling
> do assign ml_priv in vxcan.c .
>
> >Btw. the variable 'size' is set two times at the top of alloc_candev_mqs()
> >depending on echo_skb_max. This looks wrong.
>
> No. It looks right as I did not get behind the ALIGN() macro at first sight.
>
> But it is still open why dev->ml_priv is not set correctly in vxcan.c as all
> the settings for .priv_size and in vxcan_setup look fine.

Maybe I got completely lost:
Shouldn't can_ml_priv and vxcan_priv not be similar?
Where is the dev_rcv_lists in the vxcan case?

>
> Best regards,
> Oliver

2020-01-21 08:37:16

by Kurt Van Dijck

[permalink] [raw]
Subject: Re: general protection fault in can_rx_register

On di, 21 jan 2020 09:30:35 +0100, Kurt Van Dijck wrote:
> On ma, 20 jan 2020 23:35:16 +0100, Oliver Hartkopp wrote:
> > Answering myself ...
> >
> > On 20/01/2020 23.02, Oliver Hartkopp wrote:
> >
> > >
> > >Added some code to check whether dev->ml_priv is NULL:
> > >
> > >~/linux$ git diff
> > >diff --git a/net/can/af_can.c b/net/can/af_can.c
> > >index 128d37a4c2e0..6fb4ae4c359e 100644
> > >--- a/net/can/af_can.c
> > >+++ b/net/can/af_can.c
> > >@@ -463,6 +463,10 @@ int can_rx_register(struct net *net, struct
> > >net_device *dev, canid_t can_id,
> > >         spin_lock_bh(&net->can.rcvlists_lock);
> > >
> > >         dev_rcv_lists = can_dev_rcv_lists_find(net, dev);
> > >+       if (!dev_rcv_lists) {
> > >+               pr_err("dev_rcv_lists == NULL! %p\n", dev);
> > >+               goto out_unlock;
> > >+       }
> > >         rcv_list = can_rcv_list_find(&can_id, &mask, dev_rcv_lists);
> > >
> > >         rcv->can_id = can_id;
> > >@@ -479,6 +483,7 @@ int can_rx_register(struct net *net, struct net_device
> > >*dev, canid_t can_id,
> > >         rcv_lists_stats->rcv_entries++;
> > >         rcv_lists_stats->rcv_entries_max =
> > >max(rcv_lists_stats->rcv_entries_max,
> > >
> > >rcv_lists_stats->rcv_entries);
> > >+out_unlock:
> > >         spin_unlock_bh(&net->can.rcvlists_lock);
> > >
> > >         return err;
> > >
> > >And the output (after some time) is:
> > >
> > >[  758.505841] netlink: 'crash': attribute type 1 has an invalid length.
> > >[  758.508045] bond7148: (slave vxcan1): The slave device specified does
> > >not support setting the MAC address
> > >[  758.508057] bond7148: (slave vxcan1): Error -22 calling dev_set_mtu
> > >[  758.532025] bond10413: (slave vxcan1): The slave device specified does
> > >not support setting the MAC address
> > >[  758.532043] bond10413: (slave vxcan1): Error -22 calling dev_set_mtu
> > >[  758.532254] dev_rcv_lists == NULL! 000000006b9d257f
> > >[  758.547392] netlink: 'crash': attribute type 1 has an invalid length.
> > >[  758.549310] bond7145: (slave vxcan1): The slave device specified does
> > >not support setting the MAC address
> > >[  758.549313] bond7145: (slave vxcan1): Error -22 calling dev_set_mtu
> > >[  758.550464] netlink: 'crash': attribute type 1 has an invalid length.
> > >[  758.552301] bond7146: (slave vxcan1): The slave device specified does
> > >not support setting the MAC address
> > >
> > >So we can see that we get a ml_priv pointer which is NULL which should not
> > >be possible due to this:
> > >
> > >https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/can/dev.c#n743
> >
> > This reference doesn't point to the right code as vxcan has its own handling
> > do assign ml_priv in vxcan.c .
> >
> > >Btw. the variable 'size' is set two times at the top of alloc_candev_mqs()
> > >depending on echo_skb_max. This looks wrong.
> >
> > No. It looks right as I did not get behind the ALIGN() macro at first sight.
> >
> > But it is still open why dev->ml_priv is not set correctly in vxcan.c as all
> > the settings for .priv_size and in vxcan_setup look fine.
>
> Maybe I got completely lost:
> Shouldn't can_ml_priv and vxcan_priv not be similar?
> Where is the dev_rcv_lists in the vxcan case?

IMHO, net/can/af_can.c:306 is wrong in the vxcan case.

>
> >
> > Best regards,
> > Oliver

2020-01-21 18:55:36

by Kurt Van Dijck

[permalink] [raw]
Subject: Re: general protection fault in can_rx_register

On di, 21 jan 2020 09:30:35 +0100, Kurt Van Dijck wrote:
> On ma, 20 jan 2020 23:35:16 +0100, Oliver Hartkopp wrote:
> > Answering myself ...
> >
> > On 20/01/2020 23.02, Oliver Hartkopp wrote:
> >
> > >
> > >Added some code to check whether dev->ml_priv is NULL:
> > >
> > >~/linux$ git diff
> > >diff --git a/net/can/af_can.c b/net/can/af_can.c
> > >index 128d37a4c2e0..6fb4ae4c359e 100644
> > >--- a/net/can/af_can.c
> > >+++ b/net/can/af_can.c
> > >@@ -463,6 +463,10 @@ int can_rx_register(struct net *net, struct
> > >net_device *dev, canid_t can_id,
> > >         spin_lock_bh(&net->can.rcvlists_lock);
> > >
> > >         dev_rcv_lists = can_dev_rcv_lists_find(net, dev);
> > >+       if (!dev_rcv_lists) {
> > >+               pr_err("dev_rcv_lists == NULL! %p\n", dev);
> > >+               goto out_unlock;
> > >+       }
> > >         rcv_list = can_rcv_list_find(&can_id, &mask, dev_rcv_lists);
> > >
> > >         rcv->can_id = can_id;
> > >@@ -479,6 +483,7 @@ int can_rx_register(struct net *net, struct net_device
> > >*dev, canid_t can_id,
> > >         rcv_lists_stats->rcv_entries++;
> > >         rcv_lists_stats->rcv_entries_max =
> > >max(rcv_lists_stats->rcv_entries_max,
> > >
> > >rcv_lists_stats->rcv_entries);
> > >+out_unlock:
> > >         spin_unlock_bh(&net->can.rcvlists_lock);
> > >
> > >         return err;
> > >
> > >And the output (after some time) is:
> > >
> > >[  758.505841] netlink: 'crash': attribute type 1 has an invalid length.
> > >[  758.508045] bond7148: (slave vxcan1): The slave device specified does
> > >not support setting the MAC address
> > >[  758.508057] bond7148: (slave vxcan1): Error -22 calling dev_set_mtu
> > >[  758.532025] bond10413: (slave vxcan1): The slave device specified does
> > >not support setting the MAC address
> > >[  758.532043] bond10413: (slave vxcan1): Error -22 calling dev_set_mtu
> > >[  758.532254] dev_rcv_lists == NULL! 000000006b9d257f
> > >[  758.547392] netlink: 'crash': attribute type 1 has an invalid length.
> > >[  758.549310] bond7145: (slave vxcan1): The slave device specified does
> > >not support setting the MAC address
> > >[  758.549313] bond7145: (slave vxcan1): Error -22 calling dev_set_mtu
> > >[  758.550464] netlink: 'crash': attribute type 1 has an invalid length.
> > >[  758.552301] bond7146: (slave vxcan1): The slave device specified does
> > >not support setting the MAC address
> > >
> > >So we can see that we get a ml_priv pointer which is NULL which should not
> > >be possible due to this:
> > >
> > >https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/can/dev.c#n743
> >
> > This reference doesn't point to the right code as vxcan has its own handling
> > do assign ml_priv in vxcan.c .
> >
> > >Btw. the variable 'size' is set two times at the top of alloc_candev_mqs()
> > >depending on echo_skb_max. This looks wrong.
> >
> > No. It looks right as I did not get behind the ALIGN() macro at first sight.
> >
> > But it is still open why dev->ml_priv is not set correctly in vxcan.c as all
> > the settings for .priv_size and in vxcan_setup look fine.
>
> Maybe I got completely lost:
> Shouldn't can_ml_priv and vxcan_priv not be similar?
> Where is the dev_rcv_lists in the vxcan case?

I indeed got completely lost. vxcan_priv & can_ml_priv form together the
private part. I continue looking
>
> >
> > Best regards,
> > Oliver

2020-01-21 19:30:51

by Oliver Hartkopp

[permalink] [raw]
Subject: Re: general protection fault in can_rx_register

Hi Kurt,

On 21/01/2020 19.54, Kurt Van Dijck wrote:
> On di, 21 jan 2020 09:30:35 +0100, Kurt Van Dijck wrote:
>> On ma, 20 jan 2020 23:35:16 +0100, Oliver Hartkopp wrote:


>>> But it is still open why dev->ml_priv is not set correctly in vxcan.c as all
>>> the settings for .priv_size and in vxcan_setup look fine.
>>
>> Maybe I got completely lost:
>> Shouldn't can_ml_priv and vxcan_priv not be similar?
>> Where is the dev_rcv_lists in the vxcan case?
>
> I indeed got completely lost. vxcan_priv & can_ml_priv form together the
> private part. I continue looking

I added some more debug output:

@@ -463,6 +463,10 @@ int can_rx_register(struct net *net, struct
net_device *dev, canid_t can_id,
spin_lock_bh(&net->can.rcvlists_lock);

dev_rcv_lists = can_dev_rcv_lists_find(net, dev);
+ if (!dev_rcv_lists) {
+ pr_err("dev_rcv_lists == NULL! %p (%s)\n", dev, dev->name);
+ goto out_unlock;
+ }
rcv_list = can_rcv_list_find(&can_id, &mask, dev_rcv_lists);

rcv->can_id = can_id;


and the output becomes:

[ 1814.644087] bond5130: (slave vxcan1): The slave device specified does
not support setting the MAC address
[ 1814.644106] bond5130: (slave vxcan1): Error -22 calling dev_set_mtu
[ 1814.648867] bond5128: (slave vxcan1): The slave device specified does
not support setting the MAC address
[ 1814.648904] bond5128: (slave vxcan1): Error -22 calling dev_set_mtu
[ 1814.649124] dev_rcv_lists == NULL! 000000008e41fb06 (bond5128)
[ 1814.696420] bond5129: (slave vxcan1): The slave device specified does
not support setting the MAC address
[ 1814.696438] bond5129: (slave vxcan1): Error -22 calling dev_set_mtu

So it's not the vxcan1 netdev that causes the issue but (sporadically!!)
the bonding netdev.

Interesting enough that the bonding device bond5128 obviously passes the

if (dev && dev->type != ARPHRD_CAN)
return -ENODEV;
test.

?!?

Regards,
Oliver

2020-01-21 19:48:43

by Kurt Van Dijck

[permalink] [raw]
Subject: Re: general protection fault in can_rx_register

On di, 21 jan 2020 20:28:51 +0100, Oliver Hartkopp wrote:
> Hi Kurt,
>
> On 21/01/2020 19.54, Kurt Van Dijck wrote:
> >On di, 21 jan 2020 09:30:35 +0100, Kurt Van Dijck wrote:
> >>On ma, 20 jan 2020 23:35:16 +0100, Oliver Hartkopp wrote:
>
>
> >>>But it is still open why dev->ml_priv is not set correctly in vxcan.c as all
> >>>the settings for .priv_size and in vxcan_setup look fine.
> >>
> >>Maybe I got completely lost:
> >>Shouldn't can_ml_priv and vxcan_priv not be similar?
> >>Where is the dev_rcv_lists in the vxcan case?
> >
> >I indeed got completely lost. vxcan_priv & can_ml_priv form together the
> >private part. I continue looking
>
> I added some more debug output:
>
> @@ -463,6 +463,10 @@ int can_rx_register(struct net *net, struct net_device
> *dev, canid_t can_id,
> spin_lock_bh(&net->can.rcvlists_lock);
>
> dev_rcv_lists = can_dev_rcv_lists_find(net, dev);
> + if (!dev_rcv_lists) {
> + pr_err("dev_rcv_lists == NULL! %p (%s)\n", dev, dev->name);
> + goto out_unlock;
> + }
> rcv_list = can_rcv_list_find(&can_id, &mask, dev_rcv_lists);
>
> rcv->can_id = can_id;
>
>
> and the output becomes:
>
> [ 1814.644087] bond5130: (slave vxcan1): The slave device specified does not
> support setting the MAC address
> [ 1814.644106] bond5130: (slave vxcan1): Error -22 calling dev_set_mtu
> [ 1814.648867] bond5128: (slave vxcan1): The slave device specified does not
> support setting the MAC address
> [ 1814.648904] bond5128: (slave vxcan1): Error -22 calling dev_set_mtu
> [ 1814.649124] dev_rcv_lists == NULL! 000000008e41fb06 (bond5128)
> [ 1814.696420] bond5129: (slave vxcan1): The slave device specified does not
> support setting the MAC address
> [ 1814.696438] bond5129: (slave vxcan1): Error -22 calling dev_set_mtu
>
> So it's not the vxcan1 netdev that causes the issue but (sporadically!!) the
> bonding netdev.
>
> Interesting enough that the bonding device bond5128 obviously passes the
>
> if (dev && dev->type != ARPHRD_CAN)
> return -ENODEV;
> test.
>
> ?!?
Did you consider my hypothesis I sent you (at 20h22 tonight)?
I don't personally understand all the locks around networking, but your
observation acks my theory of race condition.

>
> Regards,
> Oliver