MIME-Version: 1.0
In-Reply-To: <001a114aca7419fa410561f23992@google.com>
References: <001a114aca7419fa410561f23992@google.com>
From: Tom Herbert <tom@quantonium.net>
Date: Thu, 4 Jan 2018 11:36:49 -0800
Message-ID: <CAPDqMeo+CbhE6LpXi7DQ1zp-jNXue8ETdcJ9zjLGT+r=HM+y5Q@mail.gmail.com>
Subject: Re: BUG: free active (active state 0) object type: work_struct hint: strp_work
To: syzbot <syzbot+3c6c745b0d2f341bbf50@syzkaller.appspotmail.com>
Cc: "David S . Miller" <davem@davemloft.net>,
        Eric Biggers <ebiggers@google.com>,
        John Fastabend <john.fastabend@gmail.com>,
        linux-kernel@vger.kernel.org,
        Linux Kernel Network Developers <netdev@vger.kernel.org>,
        syzkaller-bugs@googlegroups.com, xiyou.wangcong@gmail.com
Content-Type: text/plain; charset="UTF-8"
Sender: linux-kernel-owner@vger.kernel.org

On Thu, Jan 4, 2018 at 4:10 AM, syzbot
<syzbot+3c6c745b0d2f341bbf50@syzkaller.appspotmail.com> wrote:
> Hello,
>
> syzkaller hit the following crash on
> 6bb8824732f69de0f233ae6b1a8158e149627b38
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> Unfortunately, I don't have any reproducer for this bug yet.
>
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+3c6c745b0d2f341bbf50@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
>
> Use struct sctp_assoc_value instead
> sctp: [Deprecated]: syz-executor4 (pid 12483) Use of int in maxseg socket
> option.
> Use struct sctp_assoc_value instead
> ------------[ cut here ]------------
> ODEBUG: free active (active state 0) object type: work_struct hint:
> strp_work+0x0/0xf0 net/strparser/strparser.c:381
> WARNING: CPU: 1 PID: 3502 at lib/debugobjects.c:291
> debug_print_object+0x166/0x220 lib/debugobjects.c:288
> Kernel panic - not syncing: panic_on_warn set ...
>
> CPU: 1 PID: 3502 Comm: kworker/u4:4 Not tainted 4.15.0-rc5+ #170
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Workqueue: kkcmd kcm_tx_work
> Call Trace:
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:53
>  panic+0x1e4/0x41c kernel/panic.c:183
>  __warn+0x1dc/0x200 kernel/panic.c:547
>  report_bug+0x211/0x2d0 lib/bug.c:184
>  fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
>  fixup_bug arch/x86/kernel/traps.c:247 [inline]
>  do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
>  invalid_op+0x22/0x40 arch/x86/entry/entry_64.S:1061
> RIP: 0010:debug_print_object+0x166/0x220 lib/debugobjects.c:288
> RSP: 0018:ffff8801c0ee7068 EFLAGS: 00010086
> RAX: dffffc0000000008 RBX: 0000000000000003 RCX: ffffffff8159bc3e
> RDX: 0000000000000000 RSI: 1ffff100381dcdc8 RDI: ffff8801db317dd0
> RBP: ffff8801c0ee70a8 R08: 0000000000000000 R09: 1ffff100381dcd9a
> R10: ffffed00381dce3c R11: ffffffff86137ad8 R12: 0000000000000001
> R13: ffffffff86113480 R14: ffffffff8560dc40 R15: ffffffff8146e5f0
>  __debug_check_no_obj_freed lib/debugobjects.c:745 [inline]
>  debug_check_no_obj_freed+0x662/0xf1f lib/debugobjects.c:774
>  kmem_cache_free+0x253/0x2a0 mm/slab.c:3745

I believe we just need to defer kmem_cache_free to call_rcu.

Tom

>  unreserve_psock+0x5a1/0x780 net/kcm/kcmsock.c:547
>  kcm_write_msgs+0xbae/0x1b80 net/kcm/kcmsock.c:590
>  kcm_tx_work+0x2e/0x190 net/kcm/kcmsock.c:731
>  process_one_work+0xbbf/0x1b10 kernel/workqueue.c:2112
>  worker_thread+0x223/0x1990 kernel/workqueue.c:2246
>  kthread+0x33c/0x400 kernel/kthread.c:238
>  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:515
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 4.15.0-rc5+ #170 Not tainted
> ------------------------------------------------------
> kworker/u4:4/3502 is trying to acquire lock:
>  ((console_sem).lock){-.-.}, at: [<0000000091214b42>] down_trylock+0x13/0x70
> kernel/locking/semaphore.c:136
>
> but task is already holding lock:
>  (&obj_hash[i].lock){-.-.}, at: [<00000000da143489>]
> __debug_check_no_obj_freed lib/debugobjects.c:736 [inline]
>  (&obj_hash[i].lock){-.-.}, at: [<00000000da143489>]
> debug_check_no_obj_freed+0x1e9/0xf1f lib/debugobjects.c:774
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #3 (&obj_hash[i].lock){-.-.}:
>        __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
>        _raw_spin_lock_irqsave+0x96/0xc0 kernel/locking/spinlock.c:152
>        __debug_object_init+0x109/0x1040 lib/debugobjects.c:343
>        debug_object_init+0x17/0x20 lib/debugobjects.c:391
>        debug_hrtimer_init kernel/time/hrtimer.c:396 [inline]
>        debug_init kernel/time/hrtimer.c:441 [inline]
>        hrtimer_init+0x8c/0x410 kernel/time/hrtimer.c:1122
>        init_dl_task_timer+0x1b/0x50 kernel/sched/deadline.c:1023
>        __sched_fork+0x2c4/0xb70 kernel/sched/core.c:2188
>        init_idle+0x75/0x820 kernel/sched/core.c:5279
>        sched_init+0xb19/0xc43 kernel/sched/core.c:5976
>        start_kernel+0x452/0x819 init/main.c:582
>        x86_64_start_reservations+0x2a/0x2c arch/x86/kernel/head64.c:378
>        x86_64_start_kernel+0x77/0x7a arch/x86/kernel/head64.c:359
>        secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:237
>
> -> #2 (&rq->lock){-.-.}:
>        __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
>        _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:144
>        rq_lock kernel/sched/sched.h:1766 [inline]
>        task_fork_fair+0x7a/0x690 kernel/sched/fair.c:9449
>        sched_fork+0x435/0xc00 kernel/sched/core.c:2404
>        copy_process.part.38+0x174b/0x4b20 kernel/fork.c:1722
>        copy_process kernel/fork.c:1565 [inline]
>        _do_fork+0x1f7/0xfe0 kernel/fork.c:2044
>        kernel_thread+0x34/0x40 kernel/fork.c:2106
>        rest_init+0x22/0xf0 init/main.c:401
>        start_kernel+0x7f1/0x819 init/main.c:713
>        x86_64_start_reservations+0x2a/0x2c arch/x86/kernel/head64.c:378
>        x86_64_start_kernel+0x77/0x7a arch/x86/kernel/head64.c:359
>        secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:237
>
> -> #1 (&p->pi_lock){-.-.}:
>        __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
>        _raw_spin_lock_irqsave+0x96/0xc0 kernel/locking/spinlock.c:152
>        try_to_wake_up+0xbc/0x1600 kernel/sched/core.c:1988
>        wake_up_process+0x10/0x20 kernel/sched/core.c:2151
>        __up.isra.0+0x1cc/0x2c0 kernel/locking/semaphore.c:262
>        up+0x13b/0x1d0 kernel/locking/semaphore.c:187
>        __up_console_sem+0xb2/0x1a0 kernel/printk/printk.c:245
>        console_unlock+0x538/0xd80 kernel/printk/printk.c:2248
>        do_con_write+0x106e/0x1f70 drivers/tty/vt/vt.c:2433
>        con_write+0x25/0xb0 drivers/tty/vt/vt.c:2782
>        do_output_char+0x4d9/0x7a0 drivers/tty/n_tty.c:431
>        process_output drivers/tty/n_tty.c:498 [inline]
>        n_tty_write+0x68d/0xec0 drivers/tty/n_tty.c:2314
>        do_tty_write drivers/tty/tty_io.c:949 [inline]
>        tty_write+0x3fa/0x840 drivers/tty/tty_io.c:1033
>        __vfs_write+0xef/0x970 fs/read_write.c:480
>        vfs_write+0x189/0x510 fs/read_write.c:544
>        SYSC_write fs/read_write.c:589 [inline]
>        SyS_write+0xef/0x220 fs/read_write.c:581
>        entry_SYSCALL_64_fastpath+0x1f/0x96
>
> -> #0 ((console_sem).lock){-.-.}:
>        lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3914
>        __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
>        _raw_spin_lock_irqsave+0x96/0xc0 kernel/locking/spinlock.c:152
>        down_trylock+0x13/0x70 kernel/locking/semaphore.c:136
>        __down_trylock_console_sem+0xa2/0x1e0 kernel/printk/printk.c:228
>        console_trylock+0x15/0x100 kernel/printk/printk.c:2065
>        vprintk_emit+0x49b/0x590 kernel/printk/printk.c:1756
>        vprintk_default+0x28/0x30 kernel/printk/printk.c:1796
>        vprintk_func+0x57/0xc0 kernel/printk/printk_safe.c:379
>        printk+0xaa/0xca kernel/printk/printk.c:1829
>        __warn_printk+0x90/0xf0 kernel/panic.c:599
>        debug_print_object+0x166/0x220 lib/debugobjects.c:288
>        __debug_check_no_obj_freed lib/debugobjects.c:745 [inline]
>        debug_check_no_obj_freed+0x662/0xf1f lib/debugobjects.c:774
>        kmem_cache_free+0x253/0x2a0 mm/slab.c:3745
>        unreserve_psock+0x5a1/0x780 net/kcm/kcmsock.c:547
>        kcm_write_msgs+0xbae/0x1b80 net/kcm/kcmsock.c:590
>        kcm_tx_work+0x2e/0x190 net/kcm/kcmsock.c:731
>        process_one_work+0xbbf/0x1b10 kernel/workqueue.c:2112
>        worker_thread+0x223/0x1990 kernel/workqueue.c:2246
>        kthread+0x33c/0x400 kernel/kthread.c:238
>        ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:515
>
> other info that might help us debug this:
>
> Chain exists of:
>   (console_sem).lock --> &rq->lock --> &obj_hash[i].lock
>
>  Possible unsafe locking scenario:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(&obj_hash[i].lock);
>                                lock(&rq->lock);
>                                lock(&obj_hash[i].lock);
>   lock((console_sem).lock);
>
>  *** DEADLOCK ***
>
> 5 locks held by kworker/u4:4/3502:
>  #0:  ((wq_completion)"%s""kkcmd"){+.+.}, at: [<0000000030be6056>]
> process_one_work+0xaaf/0x1b10 kernel/workqueue.c:2083
>  #1:  ((work_completion)(&kcm->tx_work)){+.+.}, at: [<0000000019ffb03c>]
> process_one_work+0xb01/0x1b10 kernel/workqueue.c:2087
>  #2:  (sk_lock-AF_KCM){+.+.}, at: [<0000000077d44615>] lock_sock
> include/net/sock.h:1462 [inline]
>  #2:  (sk_lock-AF_KCM){+.+.}, at: [<0000000077d44615>]
> kcm_tx_work+0x26/0x190 net/kcm/kcmsock.c:726
>  #3:  (&(&mux->lock)->rlock){+...}, at: [<00000000c908a2e7>] spin_lock_bh
> include/linux/spinlock.h:315 [inline]
>  #3:  (&(&mux->lock)->rlock){+...}, at: [<00000000c908a2e7>]
> unreserve_psock+0x9e/0x780 net/kcm/kcmsock.c:521
>  #4:  (&obj_hash[i].lock){-.-.}, at: [<00000000da143489>]
> __debug_check_no_obj_freed lib/debugobjects.c:736 [inline]
>  #4:  (&obj_hash[i].lock){-.-.}, at: [<00000000da143489>]
> debug_check_no_obj_freed+0x1e9/0xf1f lib/debugobjects.c:774
>
> stack backtrace:
> CPU: 1 PID: 3502 Comm: kworker/u4:4 Not tainted 4.15.0-rc5+ #170
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Workqueue: kkcmd kcm_tx_work
> Call Trace:
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:53
>  print_circular_bug.isra.37+0x2cd/0x2dc kernel/locking/lockdep.c:1218
>  check_prev_add kernel/locking/lockdep.c:1858 [inline]
>  check_prevs_add kernel/locking/lockdep.c:1971 [inline]
>  validate_chain kernel/locking/lockdep.c:2412 [inline]
>  __lock_acquire+0x30a8/0x3e00 kernel/locking/lockdep.c:3426
>  lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3914
>  __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
>  _raw_spin_lock_irqsave+0x96/0xc0 kernel/locking/spinlock.c:152
>  down_trylock+0x13/0x70 kernel/locking/semaphore.c:136
>  __down_trylock_console_sem+0xa2/0x1e0 kernel/printk/printk.c:228
>  console_trylock+0x15/0x100 kernel/printk/printk.c:2065
>  vprintk_emit+0x49b/0x590 kernel/printk/printk.c:1756
>  vprintk_default+0x28/0x30 kernel/printk/printk.c:1796
>  vprintk_func+0x57/0xc0 kernel/printk/printk_safe.c:379
>  printk+0xaa/0xca kernel/printk/printk.c:1829
>  __warn_printk+0x90/0xf0 kernel/panic.c:599
>  debug_print_object+0x166/0x220 lib/debugobjects.c:288
>  __debug_check_no_obj_freed lib/debugobjects.c:745 [inline]
>  debug_check_no_obj_freed+0x662/0xf1f lib/debugobjects.c:774
>  kmem_cache_free+0x253/0x2a0 mm/slab.c:3745
>  unreserve_psock+0x5a1/0x780 net/kcm/kcmsock.c:547
>  kcm_write_msgs+0xbae/0x1b80 net/kcm/kcmsock.c:590
>  kcm_tx_work+0x2e/0x190 net/kcm/kcmsock.c:731
>  process_one_work+0xbbf/0x1b10 kernel/workqueue.c:2112
>  worker_thread+0x223/0x1990 kernel/workqueue.c:2246
>  kthread+0x33c/0x400 kernel/kthread.c:238
>  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:515
> Shutting down cpus with NMI
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Kernel Offset: disabled
> Rebooting in 86400 seconds..
>
>
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkaller@googlegroups.com.
>
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is
> merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug
> report.
> Note: all commands must start from beginning of the line in the email body.