2018-01-04 19:36:54

by Tom Herbert

[permalink] [raw]
Subject: Re: BUG: free active (active state 0) object type: work_struct hint: strp_work

On Thu, Jan 4, 2018 at 4:10 AM, syzbot
<[email protected]> wrote:
> Hello,
>
> syzkaller hit the following crash on
> 6bb8824732f69de0f233ae6b1a8158e149627b38
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> Unfortunately, I don't have any reproducer for this bug yet.
>
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: [email protected]
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
>
> Use struct sctp_assoc_value instead
> sctp: [Deprecated]: syz-executor4 (pid 12483) Use of int in maxseg socket
> option.
> Use struct sctp_assoc_value instead
> ------------[ cut here ]------------
> ODEBUG: free active (active state 0) object type: work_struct hint:
> strp_work+0x0/0xf0 net/strparser/strparser.c:381
> WARNING: CPU: 1 PID: 3502 at lib/debugobjects.c:291
> debug_print_object+0x166/0x220 lib/debugobjects.c:288
> Kernel panic - not syncing: panic_on_warn set ...
>
> CPU: 1 PID: 3502 Comm: kworker/u4:4 Not tainted 4.15.0-rc5+ #170
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Workqueue: kkcmd kcm_tx_work
> Call Trace:
> __dump_stack lib/dump_stack.c:17 [inline]
> dump_stack+0x194/0x257 lib/dump_stack.c:53
> panic+0x1e4/0x41c kernel/panic.c:183
> __warn+0x1dc/0x200 kernel/panic.c:547
> report_bug+0x211/0x2d0 lib/bug.c:184
> fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
> fixup_bug arch/x86/kernel/traps.c:247 [inline]
> do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
> do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
> invalid_op+0x22/0x40 arch/x86/entry/entry_64.S:1061
> RIP: 0010:debug_print_object+0x166/0x220 lib/debugobjects.c:288
> RSP: 0018:ffff8801c0ee7068 EFLAGS: 00010086
> RAX: dffffc0000000008 RBX: 0000000000000003 RCX: ffffffff8159bc3e
> RDX: 0000000000000000 RSI: 1ffff100381dcdc8 RDI: ffff8801db317dd0
> RBP: ffff8801c0ee70a8 R08: 0000000000000000 R09: 1ffff100381dcd9a
> R10: ffffed00381dce3c R11: ffffffff86137ad8 R12: 0000000000000001
> R13: ffffffff86113480 R14: ffffffff8560dc40 R15: ffffffff8146e5f0
> __debug_check_no_obj_freed lib/debugobjects.c:745 [inline]
> debug_check_no_obj_freed+0x662/0xf1f lib/debugobjects.c:774
> kmem_cache_free+0x253/0x2a0 mm/slab.c:3745

I believe we just need to defer kmem_cache_free to call_rcu.

Tom

> unreserve_psock+0x5a1/0x780 net/kcm/kcmsock.c:547
> kcm_write_msgs+0xbae/0x1b80 net/kcm/kcmsock.c:590
> kcm_tx_work+0x2e/0x190 net/kcm/kcmsock.c:731
> process_one_work+0xbbf/0x1b10 kernel/workqueue.c:2112
> worker_thread+0x223/0x1990 kernel/workqueue.c:2246
> kthread+0x33c/0x400 kernel/kthread.c:238
> ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:515
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 4.15.0-rc5+ #170 Not tainted
> ------------------------------------------------------
> kworker/u4:4/3502 is trying to acquire lock:
> ((console_sem).lock){-.-.}, at: [<0000000091214b42>] down_trylock+0x13/0x70
> kernel/locking/semaphore.c:136
>
> but task is already holding lock:
> (&obj_hash[i].lock){-.-.}, at: [<00000000da143489>]
> __debug_check_no_obj_freed lib/debugobjects.c:736 [inline]
> (&obj_hash[i].lock){-.-.}, at: [<00000000da143489>]
> debug_check_no_obj_freed+0x1e9/0xf1f lib/debugobjects.c:774
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #3 (&obj_hash[i].lock){-.-.}:
> __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
> _raw_spin_lock_irqsave+0x96/0xc0 kernel/locking/spinlock.c:152
> __debug_object_init+0x109/0x1040 lib/debugobjects.c:343
> debug_object_init+0x17/0x20 lib/debugobjects.c:391
> debug_hrtimer_init kernel/time/hrtimer.c:396 [inline]
> debug_init kernel/time/hrtimer.c:441 [inline]
> hrtimer_init+0x8c/0x410 kernel/time/hrtimer.c:1122
> init_dl_task_timer+0x1b/0x50 kernel/sched/deadline.c:1023
> __sched_fork+0x2c4/0xb70 kernel/sched/core.c:2188
> init_idle+0x75/0x820 kernel/sched/core.c:5279
> sched_init+0xb19/0xc43 kernel/sched/core.c:5976
> start_kernel+0x452/0x819 init/main.c:582
> x86_64_start_reservations+0x2a/0x2c arch/x86/kernel/head64.c:378
> x86_64_start_kernel+0x77/0x7a arch/x86/kernel/head64.c:359
> secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:237
>
> -> #2 (&rq->lock){-.-.}:
> __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
> _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:144
> rq_lock kernel/sched/sched.h:1766 [inline]
> task_fork_fair+0x7a/0x690 kernel/sched/fair.c:9449
> sched_fork+0x435/0xc00 kernel/sched/core.c:2404
> copy_process.part.38+0x174b/0x4b20 kernel/fork.c:1722
> copy_process kernel/fork.c:1565 [inline]
> _do_fork+0x1f7/0xfe0 kernel/fork.c:2044
> kernel_thread+0x34/0x40 kernel/fork.c:2106
> rest_init+0x22/0xf0 init/main.c:401
> start_kernel+0x7f1/0x819 init/main.c:713
> x86_64_start_reservations+0x2a/0x2c arch/x86/kernel/head64.c:378
> x86_64_start_kernel+0x77/0x7a arch/x86/kernel/head64.c:359
> secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:237
>
> -> #1 (&p->pi_lock){-.-.}:
> __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
> _raw_spin_lock_irqsave+0x96/0xc0 kernel/locking/spinlock.c:152
> try_to_wake_up+0xbc/0x1600 kernel/sched/core.c:1988
> wake_up_process+0x10/0x20 kernel/sched/core.c:2151
> __up.isra.0+0x1cc/0x2c0 kernel/locking/semaphore.c:262
> up+0x13b/0x1d0 kernel/locking/semaphore.c:187
> __up_console_sem+0xb2/0x1a0 kernel/printk/printk.c:245
> console_unlock+0x538/0xd80 kernel/printk/printk.c:2248
> do_con_write+0x106e/0x1f70 drivers/tty/vt/vt.c:2433
> con_write+0x25/0xb0 drivers/tty/vt/vt.c:2782
> do_output_char+0x4d9/0x7a0 drivers/tty/n_tty.c:431
> process_output drivers/tty/n_tty.c:498 [inline]
> n_tty_write+0x68d/0xec0 drivers/tty/n_tty.c:2314
> do_tty_write drivers/tty/tty_io.c:949 [inline]
> tty_write+0x3fa/0x840 drivers/tty/tty_io.c:1033
> __vfs_write+0xef/0x970 fs/read_write.c:480
> vfs_write+0x189/0x510 fs/read_write.c:544
> SYSC_write fs/read_write.c:589 [inline]
> SyS_write+0xef/0x220 fs/read_write.c:581
> entry_SYSCALL_64_fastpath+0x1f/0x96
>
> -> #0 ((console_sem).lock){-.-.}:
> lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3914
> __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
> _raw_spin_lock_irqsave+0x96/0xc0 kernel/locking/spinlock.c:152
> down_trylock+0x13/0x70 kernel/locking/semaphore.c:136
> __down_trylock_console_sem+0xa2/0x1e0 kernel/printk/printk.c:228
> console_trylock+0x15/0x100 kernel/printk/printk.c:2065
> vprintk_emit+0x49b/0x590 kernel/printk/printk.c:1756
> vprintk_default+0x28/0x30 kernel/printk/printk.c:1796
> vprintk_func+0x57/0xc0 kernel/printk/printk_safe.c:379
> printk+0xaa/0xca kernel/printk/printk.c:1829
> __warn_printk+0x90/0xf0 kernel/panic.c:599
> debug_print_object+0x166/0x220 lib/debugobjects.c:288
> __debug_check_no_obj_freed lib/debugobjects.c:745 [inline]
> debug_check_no_obj_freed+0x662/0xf1f lib/debugobjects.c:774
> kmem_cache_free+0x253/0x2a0 mm/slab.c:3745
> unreserve_psock+0x5a1/0x780 net/kcm/kcmsock.c:547
> kcm_write_msgs+0xbae/0x1b80 net/kcm/kcmsock.c:590
> kcm_tx_work+0x2e/0x190 net/kcm/kcmsock.c:731
> process_one_work+0xbbf/0x1b10 kernel/workqueue.c:2112
> worker_thread+0x223/0x1990 kernel/workqueue.c:2246
> kthread+0x33c/0x400 kernel/kthread.c:238
> ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:515
>
> other info that might help us debug this:
>
> Chain exists of:
> (console_sem).lock --> &rq->lock --> &obj_hash[i].lock
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&obj_hash[i].lock);
> lock(&rq->lock);
> lock(&obj_hash[i].lock);
> lock((console_sem).lock);
>
> *** DEADLOCK ***
>
> 5 locks held by kworker/u4:4/3502:
> #0: ((wq_completion)"%s""kkcmd"){+.+.}, at: [<0000000030be6056>]
> process_one_work+0xaaf/0x1b10 kernel/workqueue.c:2083
> #1: ((work_completion)(&kcm->tx_work)){+.+.}, at: [<0000000019ffb03c>]
> process_one_work+0xb01/0x1b10 kernel/workqueue.c:2087
> #2: (sk_lock-AF_KCM){+.+.}, at: [<0000000077d44615>] lock_sock
> include/net/sock.h:1462 [inline]
> #2: (sk_lock-AF_KCM){+.+.}, at: [<0000000077d44615>]
> kcm_tx_work+0x26/0x190 net/kcm/kcmsock.c:726
> #3: (&(&mux->lock)->rlock){+...}, at: [<00000000c908a2e7>] spin_lock_bh
> include/linux/spinlock.h:315 [inline]
> #3: (&(&mux->lock)->rlock){+...}, at: [<00000000c908a2e7>]
> unreserve_psock+0x9e/0x780 net/kcm/kcmsock.c:521
> #4: (&obj_hash[i].lock){-.-.}, at: [<00000000da143489>]
> __debug_check_no_obj_freed lib/debugobjects.c:736 [inline]
> #4: (&obj_hash[i].lock){-.-.}, at: [<00000000da143489>]
> debug_check_no_obj_freed+0x1e9/0xf1f lib/debugobjects.c:774
>
> stack backtrace:
> CPU: 1 PID: 3502 Comm: kworker/u4:4 Not tainted 4.15.0-rc5+ #170
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Workqueue: kkcmd kcm_tx_work
> Call Trace:
> __dump_stack lib/dump_stack.c:17 [inline]
> dump_stack+0x194/0x257 lib/dump_stack.c:53
> print_circular_bug.isra.37+0x2cd/0x2dc kernel/locking/lockdep.c:1218
> check_prev_add kernel/locking/lockdep.c:1858 [inline]
> check_prevs_add kernel/locking/lockdep.c:1971 [inline]
> validate_chain kernel/locking/lockdep.c:2412 [inline]
> __lock_acquire+0x30a8/0x3e00 kernel/locking/lockdep.c:3426
> lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3914
> __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
> _raw_spin_lock_irqsave+0x96/0xc0 kernel/locking/spinlock.c:152
> down_trylock+0x13/0x70 kernel/locking/semaphore.c:136
> __down_trylock_console_sem+0xa2/0x1e0 kernel/printk/printk.c:228
> console_trylock+0x15/0x100 kernel/printk/printk.c:2065
> vprintk_emit+0x49b/0x590 kernel/printk/printk.c:1756
> vprintk_default+0x28/0x30 kernel/printk/printk.c:1796
> vprintk_func+0x57/0xc0 kernel/printk/printk_safe.c:379
> printk+0xaa/0xca kernel/printk/printk.c:1829
> __warn_printk+0x90/0xf0 kernel/panic.c:599
> debug_print_object+0x166/0x220 lib/debugobjects.c:288
> __debug_check_no_obj_freed lib/debugobjects.c:745 [inline]
> debug_check_no_obj_freed+0x662/0xf1f lib/debugobjects.c:774
> kmem_cache_free+0x253/0x2a0 mm/slab.c:3745
> unreserve_psock+0x5a1/0x780 net/kcm/kcmsock.c:547
> kcm_write_msgs+0xbae/0x1b80 net/kcm/kcmsock.c:590
> kcm_tx_work+0x2e/0x190 net/kcm/kcmsock.c:731
> process_one_work+0xbbf/0x1b10 kernel/workqueue.c:2112
> worker_thread+0x223/0x1990 kernel/workqueue.c:2246
> kthread+0x33c/0x400 kernel/kthread.c:238
> ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:515
> Shutting down cpus with NMI
> Dumping ftrace buffer:
> (ftrace buffer empty)
> Kernel Offset: disabled
> Rebooting in 86400 seconds..
>
>
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to [email protected].
>
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is
> merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug
> report.
> Note: all commands must start from beginning of the line in the email body.


2018-02-13 20:16:49

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: BUG: free active (active state 0) object type: work_struct hint: strp_work

On Thu, Jan 4, 2018 at 8:36 PM, Tom Herbert <[email protected]> wrote:
> On Thu, Jan 4, 2018 at 4:10 AM, syzbot
> <[email protected]> wrote:
>> Hello,
>>
>> syzkaller hit the following crash on
>> 6bb8824732f69de0f233ae6b1a8158e149627b38
>> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/master
>> compiler: gcc (GCC) 7.1.1 20170620
>> .config is attached
>> Raw console output is attached.
>> Unfortunately, I don't have any reproducer for this bug yet.
>>
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: [email protected]
>> It will help syzbot understand when the bug is fixed. See footer for
>> details.
>> If you forward the report, please keep this part and the footer.
>>
>> Use struct sctp_assoc_value instead
>> sctp: [Deprecated]: syz-executor4 (pid 12483) Use of int in maxseg socket
>> option.
>> Use struct sctp_assoc_value instead
>> ------------[ cut here ]------------
>> ODEBUG: free active (active state 0) object type: work_struct hint:
>> strp_work+0x0/0xf0 net/strparser/strparser.c:381
>> WARNING: CPU: 1 PID: 3502 at lib/debugobjects.c:291
>> debug_print_object+0x166/0x220 lib/debugobjects.c:288
>> Kernel panic - not syncing: panic_on_warn set ...
>>
>> CPU: 1 PID: 3502 Comm: kworker/u4:4 Not tainted 4.15.0-rc5+ #170
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> Google 01/01/2011
>> Workqueue: kkcmd kcm_tx_work
>> Call Trace:
>> __dump_stack lib/dump_stack.c:17 [inline]
>> dump_stack+0x194/0x257 lib/dump_stack.c:53
>> panic+0x1e4/0x41c kernel/panic.c:183
>> __warn+0x1dc/0x200 kernel/panic.c:547
>> report_bug+0x211/0x2d0 lib/bug.c:184
>> fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
>> fixup_bug arch/x86/kernel/traps.c:247 [inline]
>> do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
>> do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
>> invalid_op+0x22/0x40 arch/x86/entry/entry_64.S:1061
>> RIP: 0010:debug_print_object+0x166/0x220 lib/debugobjects.c:288
>> RSP: 0018:ffff8801c0ee7068 EFLAGS: 00010086
>> RAX: dffffc0000000008 RBX: 0000000000000003 RCX: ffffffff8159bc3e
>> RDX: 0000000000000000 RSI: 1ffff100381dcdc8 RDI: ffff8801db317dd0
>> RBP: ffff8801c0ee70a8 R08: 0000000000000000 R09: 1ffff100381dcd9a
>> R10: ffffed00381dce3c R11: ffffffff86137ad8 R12: 0000000000000001
>> R13: ffffffff86113480 R14: ffffffff8560dc40 R15: ffffffff8146e5f0
>> __debug_check_no_obj_freed lib/debugobjects.c:745 [inline]
>> debug_check_no_obj_freed+0x662/0xf1f lib/debugobjects.c:774
>> kmem_cache_free+0x253/0x2a0 mm/slab.c:3745
>
> I believe we just need to defer kmem_cache_free to call_rcu.


Hi Tom,

Was this ever submitted? I don't any such change in net/kcm/kcmsock.c.

2018-02-14 17:44:22

by Tom Herbert

[permalink] [raw]
Subject: Re: BUG: free active (active state 0) object type: work_struct hint: strp_work

On Tue, Feb 13, 2018 at 12:15 PM, Dmitry Vyukov <[email protected]> wrote:
>
> On Thu, Jan 4, 2018 at 8:36 PM, Tom Herbert <[email protected]> wrote:
> > On Thu, Jan 4, 2018 at 4:10 AM, syzbot
> > <[email protected]> wrote:
> >> Hello,
> >>
> >> syzkaller hit the following crash on
> >> 6bb8824732f69de0f233ae6b1a8158e149627b38
> >> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/master
> >> compiler: gcc (GCC) 7.1.1 20170620
> >> .config is attached
> >> Raw console output is attached.
> >> Unfortunately, I don't have any reproducer for this bug yet.
> >>
> >>
> >> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> >> Reported-by: [email protected]
> >> It will help syzbot understand when the bug is fixed. See footer for
> >> details.
> >> If you forward the report, please keep this part and the footer.
> >>
> >> Use struct sctp_assoc_value instead
> >> sctp: [Deprecated]: syz-executor4 (pid 12483) Use of int in maxseg socket
> >> option.
> >> Use struct sctp_assoc_value instead
> >> ------------[ cut here ]------------
> >> ODEBUG: free active (active state 0) object type: work_struct hint:
> >> strp_work+0x0/0xf0 net/strparser/strparser.c:381
> >> WARNING: CPU: 1 PID: 3502 at lib/debugobjects.c:291
> >> debug_print_object+0x166/0x220 lib/debugobjects.c:288
> >> Kernel panic - not syncing: panic_on_warn set ...
> >>
> >> CPU: 1 PID: 3502 Comm: kworker/u4:4 Not tainted 4.15.0-rc5+ #170
> >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> >> Google 01/01/2011
> >> Workqueue: kkcmd kcm_tx_work
> >> Call Trace:
> >> __dump_stack lib/dump_stack.c:17 [inline]
> >> dump_stack+0x194/0x257 lib/dump_stack.c:53
> >> panic+0x1e4/0x41c kernel/panic.c:183
> >> __warn+0x1dc/0x200 kernel/panic.c:547
> >> report_bug+0x211/0x2d0 lib/bug.c:184
> >> fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
> >> fixup_bug arch/x86/kernel/traps.c:247 [inline]
> >> do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
> >> do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
> >> invalid_op+0x22/0x40 arch/x86/entry/entry_64.S:1061
> >> RIP: 0010:debug_print_object+0x166/0x220 lib/debugobjects.c:288
> >> RSP: 0018:ffff8801c0ee7068 EFLAGS: 00010086
> >> RAX: dffffc0000000008 RBX: 0000000000000003 RCX: ffffffff8159bc3e
> >> RDX: 0000000000000000 RSI: 1ffff100381dcdc8 RDI: ffff8801db317dd0
> >> RBP: ffff8801c0ee70a8 R08: 0000000000000000 R09: 1ffff100381dcd9a
> >> R10: ffffed00381dce3c R11: ffffffff86137ad8 R12: 0000000000000001
> >> R13: ffffffff86113480 R14: ffffffff8560dc40 R15: ffffffff8146e5f0
> >> __debug_check_no_obj_freed lib/debugobjects.c:745 [inline]
> >> debug_check_no_obj_freed+0x662/0xf1f lib/debugobjects.c:774
> >> kmem_cache_free+0x253/0x2a0 mm/slab.c:3745
> >
> > I believe we just need to defer kmem_cache_free to call_rcu.
>
>
> Hi Tom,
>
> Was this ever submitted? I don't any such change in net/kcm/kcmsock.c.


Hi Dmitry,

I am looking at it. Not yet convinced that call_rcu is right fix.

Tom