2017-11-05 08:53:34

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: possible deadlock in flush_work (2)

On Sun, Nov 5, 2017 at 11:41 AM, syzbot
<bot+d8fe95298ef830cd7d05e33eefa4a5a6f6f334d4@syzkaller.appspotmail.com>
wrote:
> Hello,
>
> syzkaller hit the following crash on
> 0f611fb6dcc0d6d91b4e1fec911321f434a3b858
> git://git.cmpxchg.org/linux-mmots.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
>
> xs_tcp_setup_socket: connect returned unhandled error -113
> xs_tcp_setup_socket: connect returned unhandled error -113
> xs_tcp_setup_socket: connect returned unhandled error -113
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 4.14.0-rc5-mm1+ #20 Not tainted
> ------------------------------------------------------
> kworker/0:3/3400 is trying to acquire lock:
> ("xprtiod"){+.+.}, at: [<ffffffff8146adda>] start_flush_work
> kernel/workqueue.c:2850 [inline]
> ("xprtiod"){+.+.}, at: [<ffffffff8146adda>] flush_work+0x55a/0x8a0
> kernel/workqueue.c:2882
>
> but task is already holding lock:
> ((&task->u.tk_work)){+.+.}, at: [<ffffffff81471eb2>]
> process_one_work+0xb32/0x1bc0 kernel/workqueue.c:2087
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 ((&task->u.tk_work)){+.+.}:
> process_one_work+0xba2/0x1bc0 kernel/workqueue.c:2088
> worker_thread+0x223/0x1990 kernel/workqueue.c:2246
> kthread+0x38b/0x470 kernel/kthread.c:242
> ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
>
> -> #0 ("xprtiod"){+.+.}:
> lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3991
> start_flush_work kernel/workqueue.c:2851 [inline]
> flush_work+0x57f/0x8a0 kernel/workqueue.c:2882
> __cancel_work_timer+0x30a/0x7e0 kernel/workqueue.c:2954
> cancel_work_sync+0x17/0x20 kernel/workqueue.c:2990
> xprt_destroy+0xa1/0x130 net/sunrpc/xprt.c:1467
> xprt_destroy_kref net/sunrpc/xprt.c:1477 [inline]
> kref_put include/linux/kref.h:70 [inline]
> xprt_put+0x38/0x40 net/sunrpc/xprt.c:1501
> rpc_task_release_client+0x299/0x430 net/sunrpc/clnt.c:986
> rpc_release_resources_task+0x7f/0xa0 net/sunrpc/sched.c:1020
> rpc_release_task net/sunrpc/sched.c:1059 [inline]
> __rpc_execute+0x4d9/0xe70 net/sunrpc/sched.c:824
> rpc_async_schedule+0x16/0x20 net/sunrpc/sched.c:848
> process_one_work+0xbf0/0x1bc0 kernel/workqueue.c:2112
> worker_thread+0x223/0x1990 kernel/workqueue.c:2246
> kthread+0x38b/0x470 kernel/kthread.c:242
> ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
>
> other info that might help us debug this:
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock((&task->u.tk_work));
> lock("xprtiod");
> lock((&task->u.tk_work));
> lock("xprtiod");
>
> *** DEADLOCK ***
>
> 2 locks held by kworker/0:3/3400:
> #0: ("rpciod"){+.+.}, at: [<ffffffff81471e5f>] __write_once_size
> include/linux/compiler.h:305 [inline]
> #0: ("rpciod"){+.+.}, at: [<ffffffff81471e5f>] atomic64_set
> arch/x86/include/asm/atomic64_64.h:33 [inline]
> #0: ("rpciod"){+.+.}, at: [<ffffffff81471e5f>] atomic_long_set
> include/asm-generic/atomic-long.h:56 [inline]
> #0: ("rpciod"){+.+.}, at: [<ffffffff81471e5f>] set_work_data
> kernel/workqueue.c:618 [inline]
> #0: ("rpciod"){+.+.}, at: [<ffffffff81471e5f>]
> set_work_pool_and_clear_pending kernel/workqueue.c:645 [inline]
> #0: ("rpciod"){+.+.}, at: [<ffffffff81471e5f>]
> process_one_work+0xadf/0x1bc0 kernel/workqueue.c:2083
> #1: ((&task->u.tk_work)){+.+.}, at: [<ffffffff81471eb2>]
> process_one_work+0xb32/0x1bc0 kernel/workqueue.c:2087
>
> stack backtrace:
> CPU: 0 PID: 3400 Comm: kworker/0:3 Not tainted 4.14.0-rc5-mm1+ #20
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Workqueue: rpciod rpc_async_schedule
> Call Trace:
> __dump_stack lib/dump_stack.c:16 [inline]
> dump_stack+0x194/0x257 lib/dump_stack.c:52
> print_circular_bug.isra.41+0x342/0x36a kernel/locking/lockdep.c:1258
> check_prev_add kernel/locking/lockdep.c:1901 [inline]
> check_prevs_add kernel/locking/lockdep.c:2018 [inline]
> validate_chain kernel/locking/lockdep.c:2460 [inline]
> __lock_acquire+0x2f55/0x3d50 kernel/locking/lockdep.c:3487
> lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3991
> start_flush_work kernel/workqueue.c:2851 [inline]
> flush_work+0x57f/0x8a0 kernel/workqueue.c:2882
> __cancel_work_timer+0x30a/0x7e0 kernel/workqueue.c:2954
> cancel_work_sync+0x17/0x20 kernel/workqueue.c:2990
> xprt_destroy+0xa1/0x130 net/sunrpc/xprt.c:1467
> xprt_destroy_kref net/sunrpc/xprt.c:1477 [inline]
> kref_put include/linux/kref.h:70 [inline]
> xprt_put+0x38/0x40 net/sunrpc/xprt.c:1501
> rpc_task_release_client+0x299/0x430 net/sunrpc/clnt.c:986
> rpc_release_resources_task+0x7f/0xa0 net/sunrpc/sched.c:1020
> rpc_release_task net/sunrpc/sched.c:1059 [inline]
> __rpc_execute+0x4d9/0xe70 net/sunrpc/sched.c:824
> rpc_async_schedule+0x16/0x20 net/sunrpc/sched.c:848
> process_one_work+0xbf0/0x1bc0 kernel/workqueue.c:2112
> worker_thread+0x223/0x1990 kernel/workqueue.c:2246
> kthread+0x38b/0x470 kernel/kthread.c:242
> ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431


+sunrpc maintainers


> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to [email protected].
> Please credit me with: Reported-by: syzbot <[email protected]>
>
> syzbot will keep track of this bug report.
> Once a fix for this bug is committed, please reply to this email with:
> #syz fix: exact-commit-title
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug
> report.
> Note: all commands must start from beginning of the line.
>
> --
> You received this message because you are subscribed to the Google Groups
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/syzkaller-bugs/001a113ee9baf95598055d384ecb%40google.com.
> For more options, visit https://groups.google.com/d/optout.


2017-11-05 16:00:28

by Trond Myklebust

[permalink] [raw]
Subject: Re: possible deadlock in flush_work (2)

DQoNCk9uIFN1biwgMjAxNy0xMS0wNSBhdCAxMTo1MyArMDMwMCwgRG1pdHJ5IFZ5dWtvdiB3cm90
ZToNCj4gT24gU3VuLCBOb3YgNSwgMjAxNyBhdCAxMTo0MSBBTSwgc3l6Ym90DQo+IDxib3QrZDhm
ZTk1Mjk4ZWY4MzBjZDdkMDVlMzNlZWZhNGE1YTZmNmYzMzRkNEBzeXprYWxsZXIuYXBwc3BvdG1h
aWwuYw0KPiBvbT4NCj4gd3JvdGU6DQo+ID4gSGVsbG8sDQo+ID4gDQo+ID4gc3l6a2FsbGVyIGhp
dCB0aGUgZm9sbG93aW5nIGNyYXNoIG9uDQo+ID4gMGY2MTFmYjZkY2MwZDZkOTFiNGUxZmVjOTEx
MzIxZjQzNGEzYjg1OA0KPiA+IGdpdDovL2dpdC5jbXB4Y2hnLm9yZy9saW51eC1tbW90cy5naXQv
bWFzdGVyDQo+ID4gY29tcGlsZXI6IGdjYyAoR0NDKSA3LjEuMSAyMDE3MDYyMA0KPiA+IC5jb25m
aWcgaXMgYXR0YWNoZWQNCj4gPiBSYXcgY29uc29sZSBvdXRwdXQgaXMgYXR0YWNoZWQuDQo+ID4g
DQo+ID4geHNfdGNwX3NldHVwX3NvY2tldDogY29ubmVjdCByZXR1cm5lZCB1bmhhbmRsZWQgZXJy
b3IgLTExMw0KPiA+IHhzX3RjcF9zZXR1cF9zb2NrZXQ6IGNvbm5lY3QgcmV0dXJuZWQgdW5oYW5k
bGVkIGVycm9yIC0xMTMNCj4gPiB4c190Y3Bfc2V0dXBfc29ja2V0OiBjb25uZWN0IHJldHVybmVk
IHVuaGFuZGxlZCBlcnJvciAtMTEzDQo+ID4gDQo+ID4gPT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09DQo+ID4gV0FSTklORzogcG9zc2libGUgY2ly
Y3VsYXIgbG9ja2luZyBkZXBlbmRlbmN5IGRldGVjdGVkDQo+ID4gNC4xNC4wLXJjNS1tbTErICMy
MCBOb3QgdGFpbnRlZA0KPiA+IC0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLQ0KPiA+IGt3b3JrZXIvMDozLzM0MDAgaXMgdHJ5aW5nIHRvIGFjcXVp
cmUgbG9jazoNCj4gPiAgKCJ4cHJ0aW9kIil7Ky4rLn0sIGF0OiBbPGZmZmZmZmZmODE0NmFkZGE+
XSBzdGFydF9mbHVzaF93b3JrDQo+ID4ga2VybmVsL3dvcmtxdWV1ZS5jOjI4NTAgW2lubGluZV0N
Cj4gPiAgKCJ4cHJ0aW9kIil7Ky4rLn0sIGF0OiBbPGZmZmZmZmZmODE0NmFkZGE+XSBmbHVzaF93
b3JrKzB4NTVhLzB4OGEwDQo+ID4ga2VybmVsL3dvcmtxdWV1ZS5jOjI4ODINCj4gPiANCj4gPiBi
dXQgdGFzayBpcyBhbHJlYWR5IGhvbGRpbmcgbG9jazoNCj4gPiAgKCgmdGFzay0+dS50a193b3Jr
KSl7Ky4rLn0sIGF0OiBbPGZmZmZmZmZmODE0NzFlYjI+XQ0KPiA+IHByb2Nlc3Nfb25lX3dvcmsr
MHhiMzIvMHgxYmMwIGtlcm5lbC93b3JrcXVldWUuYzoyMDg3DQo+ID4gDQo+ID4gd2hpY2ggbG9j
ayBhbHJlYWR5IGRlcGVuZHMgb24gdGhlIG5ldyBsb2NrLg0KPiA+IA0KPiA+IA0KPiA+IHRoZSBl
eGlzdGluZyBkZXBlbmRlbmN5IGNoYWluIChpbiByZXZlcnNlIG9yZGVyKSBpczoNCj4gPiANCj4g
PiAtPiAjMSAoKCZ0YXNrLT51LnRrX3dvcmspKXsrLisufToNCj4gPiAgICAgICAgcHJvY2Vzc19v
bmVfd29yaysweGJhMi8weDFiYzAga2VybmVsL3dvcmtxdWV1ZS5jOjIwODgNCj4gPiAgICAgICAg
d29ya2VyX3RocmVhZCsweDIyMy8weDE5OTAga2VybmVsL3dvcmtxdWV1ZS5jOjIyNDYNCj4gPiAg
ICAgICAga3RocmVhZCsweDM4Yi8weDQ3MCBrZXJuZWwva3RocmVhZC5jOjI0Mg0KPiA+ICAgICAg
ICByZXRfZnJvbV9mb3JrKzB4MmEvMHg0MCBhcmNoL3g4Ni9lbnRyeS9lbnRyeV82NC5TOjQzMQ0K
PiA+IA0KPiA+IC0+ICMwICgieHBydGlvZCIpeysuKy59Og0KPiA+ICAgICAgICBsb2NrX2FjcXVp
cmUrMHgxZDUvMHg1ODAga2VybmVsL2xvY2tpbmcvbG9ja2RlcC5jOjM5OTENCj4gPiAgICAgICAg
c3RhcnRfZmx1c2hfd29yayBrZXJuZWwvd29ya3F1ZXVlLmM6Mjg1MSBbaW5saW5lXQ0KPiA+ICAg
ICAgICBmbHVzaF93b3JrKzB4NTdmLzB4OGEwIGtlcm5lbC93b3JrcXVldWUuYzoyODgyDQo+ID4g
ICAgICAgIF9fY2FuY2VsX3dvcmtfdGltZXIrMHgzMGEvMHg3ZTAga2VybmVsL3dvcmtxdWV1ZS5j
OjI5NTQNCj4gPiAgICAgICAgY2FuY2VsX3dvcmtfc3luYysweDE3LzB4MjAga2VybmVsL3dvcmtx
dWV1ZS5jOjI5OTANCj4gPiAgICAgICAgeHBydF9kZXN0cm95KzB4YTEvMHgxMzAgbmV0L3N1bnJw
Yy94cHJ0LmM6MTQ2Nw0KPiA+ICAgICAgICB4cHJ0X2Rlc3Ryb3lfa3JlZiBuZXQvc3VucnBjL3hw
cnQuYzoxNDc3IFtpbmxpbmVdDQo+ID4gICAgICAgIGtyZWZfcHV0IGluY2x1ZGUvbGludXgva3Jl
Zi5oOjcwIFtpbmxpbmVdDQo+ID4gICAgICAgIHhwcnRfcHV0KzB4MzgvMHg0MCBuZXQvc3VucnBj
L3hwcnQuYzoxNTAxDQo+ID4gICAgICAgIHJwY190YXNrX3JlbGVhc2VfY2xpZW50KzB4Mjk5LzB4
NDMwIG5ldC9zdW5ycGMvY2xudC5jOjk4Ng0KPiA+ICAgICAgICBycGNfcmVsZWFzZV9yZXNvdXJj
ZXNfdGFzaysweDdmLzB4YTAgbmV0L3N1bnJwYy9zY2hlZC5jOjEwMjANCj4gPiAgICAgICAgcnBj
X3JlbGVhc2VfdGFzayBuZXQvc3VucnBjL3NjaGVkLmM6MTA1OSBbaW5saW5lXQ0KPiA+ICAgICAg
ICBfX3JwY19leGVjdXRlKzB4NGQ5LzB4ZTcwIG5ldC9zdW5ycGMvc2NoZWQuYzo4MjQNCj4gPiAg
ICAgICAgcnBjX2FzeW5jX3NjaGVkdWxlKzB4MTYvMHgyMCBuZXQvc3VucnBjL3NjaGVkLmM6ODQ4
DQo+ID4gICAgICAgIHByb2Nlc3Nfb25lX3dvcmsrMHhiZjAvMHgxYmMwIGtlcm5lbC93b3JrcXVl
dWUuYzoyMTEyDQo+ID4gICAgICAgIHdvcmtlcl90aHJlYWQrMHgyMjMvMHgxOTkwIGtlcm5lbC93
b3JrcXVldWUuYzoyMjQ2DQo+ID4gICAgICAgIGt0aHJlYWQrMHgzOGIvMHg0NzAga2VybmVsL2t0
aHJlYWQuYzoyNDINCj4gPiAgICAgICAgcmV0X2Zyb21fZm9yaysweDJhLzB4NDAgYXJjaC94ODYv
ZW50cnkvZW50cnlfNjQuUzo0MzENCj4gPiANCj4gPiBvdGhlciBpbmZvIHRoYXQgbWlnaHQgaGVs
cCB1cyBkZWJ1ZyB0aGlzOg0KPiA+IA0KPiA+ICBQb3NzaWJsZSB1bnNhZmUgbG9ja2luZyBzY2Vu
YXJpbzoNCj4gPiANCj4gPiAgICAgICAgQ1BVMCAgICAgICAgICAgICAgICAgICAgQ1BVMQ0KPiA+
ICAgICAgICAtLS0tICAgICAgICAgICAgICAgICAgICAtLS0tDQo+ID4gICBsb2NrKCgmdGFzay0+
dS50a193b3JrKSk7DQo+ID4gICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGxvY2soInhw
cnRpb2QiKTsNCj4gPiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgbG9jaygoJnRhc2st
PnUudGtfd29yaykpOw0KPiA+ICAgbG9jaygieHBydGlvZCIpOw0KPiA+IA0KPiA+ICAqKiogREVB
RExPQ0sgKioqDQo+ID4gDQo+ID4gMiBsb2NrcyBoZWxkIGJ5IGt3b3JrZXIvMDozLzM0MDA6DQo+
ID4gICMwOiAgKCJycGNpb2QiKXsrLisufSwgYXQ6IFs8ZmZmZmZmZmY4MTQ3MWU1Zj5dIF9fd3Jp
dGVfb25jZV9zaXplDQo+ID4gaW5jbHVkZS9saW51eC9jb21waWxlci5oOjMwNSBbaW5saW5lXQ0K
PiA+ICAjMDogICgicnBjaW9kIil7Ky4rLn0sIGF0OiBbPGZmZmZmZmZmODE0NzFlNWY+XSBhdG9t
aWM2NF9zZXQNCj4gPiBhcmNoL3g4Ni9pbmNsdWRlL2FzbS9hdG9taWM2NF82NC5oOjMzIFtpbmxp
bmVdDQo+ID4gICMwOiAgKCJycGNpb2QiKXsrLisufSwgYXQ6IFs8ZmZmZmZmZmY4MTQ3MWU1Zj5d
IGF0b21pY19sb25nX3NldA0KPiA+IGluY2x1ZGUvYXNtLWdlbmVyaWMvYXRvbWljLWxvbmcuaDo1
NiBbaW5saW5lXQ0KPiA+ICAjMDogICgicnBjaW9kIil7Ky4rLn0sIGF0OiBbPGZmZmZmZmZmODE0
NzFlNWY+XSBzZXRfd29ya19kYXRhDQo+ID4ga2VybmVsL3dvcmtxdWV1ZS5jOjYxOCBbaW5saW5l
XQ0KPiA+ICAjMDogICgicnBjaW9kIil7Ky4rLn0sIGF0OiBbPGZmZmZmZmZmODE0NzFlNWY+XQ0K
PiA+IHNldF93b3JrX3Bvb2xfYW5kX2NsZWFyX3BlbmRpbmcga2VybmVsL3dvcmtxdWV1ZS5jOjY0
NSBbaW5saW5lXQ0KPiA+ICAjMDogICgicnBjaW9kIil7Ky4rLn0sIGF0OiBbPGZmZmZmZmZmODE0
NzFlNWY+XQ0KPiA+IHByb2Nlc3Nfb25lX3dvcmsrMHhhZGYvMHgxYmMwIGtlcm5lbC93b3JrcXVl
dWUuYzoyMDgzDQo+ID4gICMxOiAgKCgmdGFzay0+dS50a193b3JrKSl7Ky4rLn0sIGF0OiBbPGZm
ZmZmZmZmODE0NzFlYjI+XQ0KPiA+IHByb2Nlc3Nfb25lX3dvcmsrMHhiMzIvMHgxYmMwIGtlcm5l
bC93b3JrcXVldWUuYzoyMDg3DQo+ID4gDQo+ID4gc3RhY2sgYmFja3RyYWNlOg0KPiA+IENQVTog
MCBQSUQ6IDM0MDAgQ29tbToga3dvcmtlci8wOjMgTm90IHRhaW50ZWQgNC4xNC4wLXJjNS1tbTEr
ICMyMA0KPiA+IEhhcmR3YXJlIG5hbWU6IEdvb2dsZSBHb29nbGUgQ29tcHV0ZSBFbmdpbmUvR29v
Z2xlIENvbXB1dGUgRW5naW5lLA0KPiA+IEJJT1MNCj4gPiBHb29nbGUgMDEvMDEvMjAxMQ0KPiA+
IFdvcmtxdWV1ZTogcnBjaW9kIHJwY19hc3luY19zY2hlZHVsZQ0KPiA+IENhbGwgVHJhY2U6DQo+
ID4gIF9fZHVtcF9zdGFjayBsaWIvZHVtcF9zdGFjay5jOjE2IFtpbmxpbmVdDQo+ID4gIGR1bXBf
c3RhY2srMHgxOTQvMHgyNTcgbGliL2R1bXBfc3RhY2suYzo1Mg0KPiA+ICBwcmludF9jaXJjdWxh
cl9idWcuaXNyYS40MSsweDM0Mi8weDM2YQ0KPiA+IGtlcm5lbC9sb2NraW5nL2xvY2tkZXAuYzox
MjU4DQo+ID4gIGNoZWNrX3ByZXZfYWRkIGtlcm5lbC9sb2NraW5nL2xvY2tkZXAuYzoxOTAxIFtp
bmxpbmVdDQo+ID4gIGNoZWNrX3ByZXZzX2FkZCBrZXJuZWwvbG9ja2luZy9sb2NrZGVwLmM6MjAx
OCBbaW5saW5lXQ0KPiA+ICB2YWxpZGF0ZV9jaGFpbiBrZXJuZWwvbG9ja2luZy9sb2NrZGVwLmM6
MjQ2MCBbaW5saW5lXQ0KPiA+ICBfX2xvY2tfYWNxdWlyZSsweDJmNTUvMHgzZDUwIGtlcm5lbC9s
b2NraW5nL2xvY2tkZXAuYzozNDg3DQo+ID4gIGxvY2tfYWNxdWlyZSsweDFkNS8weDU4MCBrZXJu
ZWwvbG9ja2luZy9sb2NrZGVwLmM6Mzk5MQ0KPiA+ICBzdGFydF9mbHVzaF93b3JrIGtlcm5lbC93
b3JrcXVldWUuYzoyODUxIFtpbmxpbmVdDQo+ID4gIGZsdXNoX3dvcmsrMHg1N2YvMHg4YTAga2Vy
bmVsL3dvcmtxdWV1ZS5jOjI4ODINCj4gPiAgX19jYW5jZWxfd29ya190aW1lcisweDMwYS8weDdl
MCBrZXJuZWwvd29ya3F1ZXVlLmM6Mjk1NA0KPiA+ICBjYW5jZWxfd29ya19zeW5jKzB4MTcvMHgy
MCBrZXJuZWwvd29ya3F1ZXVlLmM6Mjk5MA0KPiA+ICB4cHJ0X2Rlc3Ryb3krMHhhMS8weDEzMCBu
ZXQvc3VucnBjL3hwcnQuYzoxNDY3DQo+ID4gIHhwcnRfZGVzdHJveV9rcmVmIG5ldC9zdW5ycGMv
eHBydC5jOjE0NzcgW2lubGluZV0NCj4gPiAga3JlZl9wdXQgaW5jbHVkZS9saW51eC9rcmVmLmg6
NzAgW2lubGluZV0NCj4gPiAgeHBydF9wdXQrMHgzOC8weDQwIG5ldC9zdW5ycGMveHBydC5jOjE1
MDENCj4gPiAgcnBjX3Rhc2tfcmVsZWFzZV9jbGllbnQrMHgyOTkvMHg0MzAgbmV0L3N1bnJwYy9j
bG50LmM6OTg2DQo+ID4gIHJwY19yZWxlYXNlX3Jlc291cmNlc190YXNrKzB4N2YvMHhhMCBuZXQv
c3VucnBjL3NjaGVkLmM6MTAyMA0KPiA+ICBycGNfcmVsZWFzZV90YXNrIG5ldC9zdW5ycGMvc2No
ZWQuYzoxMDU5IFtpbmxpbmVdDQo+ID4gIF9fcnBjX2V4ZWN1dGUrMHg0ZDkvMHhlNzAgbmV0L3N1
bnJwYy9zY2hlZC5jOjgyNA0KPiA+ICBycGNfYXN5bmNfc2NoZWR1bGUrMHgxNi8weDIwIG5ldC9z
dW5ycGMvc2NoZWQuYzo4NDgNCj4gPiAgcHJvY2Vzc19vbmVfd29yaysweGJmMC8weDFiYzAga2Vy
bmVsL3dvcmtxdWV1ZS5jOjIxMTINCj4gPiAgd29ya2VyX3RocmVhZCsweDIyMy8weDE5OTAga2Vy
bmVsL3dvcmtxdWV1ZS5jOjIyNDYNCj4gPiAga3RocmVhZCsweDM4Yi8weDQ3MCBrZXJuZWwva3Ro
cmVhZC5jOjI0Mg0KPiA+ICByZXRfZnJvbV9mb3JrKzB4MmEvMHg0MCBhcmNoL3g4Ni9lbnRyeS9l
bnRyeV82NC5TOjQzMQ0KPiANCj4gDQo+ICtzdW5ycGMgbWFpbnRhaW5lcnMNCg0KQSBmaXggZm9y
IHRoaXMgaGFzIGFscmVhZHkgYmVlbiBtZXJnZWQuIFBsZWFzZSByZXRlc3Qgd2l0aCBhbiB1cCB0
bw0KZGF0ZSBrZXJuZWwuDQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50
IG1haW50YWluZXIsIFByaW1hcnlEYXRhDQp0cm9uZC5teWtsZWJ1c3RAcHJpbWFyeWRhdGEuY29t
DQo=


2017-11-06 10:34:48

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: possible deadlock in flush_work (2)

On Sun, Nov 5, 2017 at 5:00 PM, Trond Myklebust <[email protected]> wrote:
>
>
> On Sun, 2017-11-05 at 11:53 +0300, Dmitry Vyukov wrote:
>> On Sun, Nov 5, 2017 at 11:41 AM, syzbot
>> <bot+d8fe95298ef830cd7d05e33eefa4a5a6f6f334d4@syzkaller.appspotmail.c
>> om>
>> wrote:
>> > Hello,
>> >
>> > syzkaller hit the following crash on
>> > 0f611fb6dcc0d6d91b4e1fec911321f434a3b858
>> > git://git.cmpxchg.org/linux-mmots.git/master
>> > compiler: gcc (GCC) 7.1.1 20170620
>> > .config is attached
>> > Raw console output is attached.
>> >
>> > xs_tcp_setup_socket: connect returned unhandled error -113
>> > xs_tcp_setup_socket: connect returned unhandled error -113
>> > xs_tcp_setup_socket: connect returned unhandled error -113
>> >
>> > ======================================================
>> > WARNING: possible circular locking dependency detected
>> > 4.14.0-rc5-mm1+ #20 Not tainted
>> > ------------------------------------------------------
>> > kworker/0:3/3400 is trying to acquire lock:
>> > ("xprtiod"){+.+.}, at: [<ffffffff8146adda>] start_flush_work
>> > kernel/workqueue.c:2850 [inline]
>> > ("xprtiod"){+.+.}, at: [<ffffffff8146adda>] flush_work+0x55a/0x8a0
>> > kernel/workqueue.c:2882
>> >
>> > but task is already holding lock:
>> > ((&task->u.tk_work)){+.+.}, at: [<ffffffff81471eb2>]
>> > process_one_work+0xb32/0x1bc0 kernel/workqueue.c:2087
>> >
>> > which lock already depends on the new lock.
>> >
>> >
>> > the existing dependency chain (in reverse order) is:
>> >
>> > -> #1 ((&task->u.tk_work)){+.+.}:
>> > process_one_work+0xba2/0x1bc0 kernel/workqueue.c:2088
>> > worker_thread+0x223/0x1990 kernel/workqueue.c:2246
>> > kthread+0x38b/0x470 kernel/kthread.c:242
>> > ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
>> >
>> > -> #0 ("xprtiod"){+.+.}:
>> > lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3991
>> > start_flush_work kernel/workqueue.c:2851 [inline]
>> > flush_work+0x57f/0x8a0 kernel/workqueue.c:2882
>> > __cancel_work_timer+0x30a/0x7e0 kernel/workqueue.c:2954
>> > cancel_work_sync+0x17/0x20 kernel/workqueue.c:2990
>> > xprt_destroy+0xa1/0x130 net/sunrpc/xprt.c:1467
>> > xprt_destroy_kref net/sunrpc/xprt.c:1477 [inline]
>> > kref_put include/linux/kref.h:70 [inline]
>> > xprt_put+0x38/0x40 net/sunrpc/xprt.c:1501
>> > rpc_task_release_client+0x299/0x430 net/sunrpc/clnt.c:986
>> > rpc_release_resources_task+0x7f/0xa0 net/sunrpc/sched.c:1020
>> > rpc_release_task net/sunrpc/sched.c:1059 [inline]
>> > __rpc_execute+0x4d9/0xe70 net/sunrpc/sched.c:824
>> > rpc_async_schedule+0x16/0x20 net/sunrpc/sched.c:848
>> > process_one_work+0xbf0/0x1bc0 kernel/workqueue.c:2112
>> > worker_thread+0x223/0x1990 kernel/workqueue.c:2246
>> > kthread+0x38b/0x470 kernel/kthread.c:242
>> > ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
>> >
>> > other info that might help us debug this:
>> >
>> > Possible unsafe locking scenario:
>> >
>> > CPU0 CPU1
>> > ---- ----
>> > lock((&task->u.tk_work));
>> > lock("xprtiod");
>> > lock((&task->u.tk_work));
>> > lock("xprtiod");
>> >
>> > *** DEADLOCK ***
>> >
>> > 2 locks held by kworker/0:3/3400:
>> > #0: ("rpciod"){+.+.}, at: [<ffffffff81471e5f>] __write_once_size
>> > include/linux/compiler.h:305 [inline]
>> > #0: ("rpciod"){+.+.}, at: [<ffffffff81471e5f>] atomic64_set
>> > arch/x86/include/asm/atomic64_64.h:33 [inline]
>> > #0: ("rpciod"){+.+.}, at: [<ffffffff81471e5f>] atomic_long_set
>> > include/asm-generic/atomic-long.h:56 [inline]
>> > #0: ("rpciod"){+.+.}, at: [<ffffffff81471e5f>] set_work_data
>> > kernel/workqueue.c:618 [inline]
>> > #0: ("rpciod"){+.+.}, at: [<ffffffff81471e5f>]
>> > set_work_pool_and_clear_pending kernel/workqueue.c:645 [inline]
>> > #0: ("rpciod"){+.+.}, at: [<ffffffff81471e5f>]
>> > process_one_work+0xadf/0x1bc0 kernel/workqueue.c:2083
>> > #1: ((&task->u.tk_work)){+.+.}, at: [<ffffffff81471eb2>]
>> > process_one_work+0xb32/0x1bc0 kernel/workqueue.c:2087
>> >
>> > stack backtrace:
>> > CPU: 0 PID: 3400 Comm: kworker/0:3 Not tainted 4.14.0-rc5-mm1+ #20
>> > Hardware name: Google Google Compute Engine/Google Compute Engine,
>> > BIOS
>> > Google 01/01/2011
>> > Workqueue: rpciod rpc_async_schedule
>> > Call Trace:
>> > __dump_stack lib/dump_stack.c:16 [inline]
>> > dump_stack+0x194/0x257 lib/dump_stack.c:52
>> > print_circular_bug.isra.41+0x342/0x36a
>> > kernel/locking/lockdep.c:1258
>> > check_prev_add kernel/locking/lockdep.c:1901 [inline]
>> > check_prevs_add kernel/locking/lockdep.c:2018 [inline]
>> > validate_chain kernel/locking/lockdep.c:2460 [inline]
>> > __lock_acquire+0x2f55/0x3d50 kernel/locking/lockdep.c:3487
>> > lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3991
>> > start_flush_work kernel/workqueue.c:2851 [inline]
>> > flush_work+0x57f/0x8a0 kernel/workqueue.c:2882
>> > __cancel_work_timer+0x30a/0x7e0 kernel/workqueue.c:2954
>> > cancel_work_sync+0x17/0x20 kernel/workqueue.c:2990
>> > xprt_destroy+0xa1/0x130 net/sunrpc/xprt.c:1467
>> > xprt_destroy_kref net/sunrpc/xprt.c:1477 [inline]
>> > kref_put include/linux/kref.h:70 [inline]
>> > xprt_put+0x38/0x40 net/sunrpc/xprt.c:1501
>> > rpc_task_release_client+0x299/0x430 net/sunrpc/clnt.c:986
>> > rpc_release_resources_task+0x7f/0xa0 net/sunrpc/sched.c:1020
>> > rpc_release_task net/sunrpc/sched.c:1059 [inline]
>> > __rpc_execute+0x4d9/0xe70 net/sunrpc/sched.c:824
>> > rpc_async_schedule+0x16/0x20 net/sunrpc/sched.c:848
>> > process_one_work+0xbf0/0x1bc0 kernel/workqueue.c:2112
>> > worker_thread+0x223/0x1990 kernel/workqueue.c:2246
>> > kthread+0x38b/0x470 kernel/kthread.c:242
>> > ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
>>
>>
>> +sunrpc maintainers
>
> A fix for this has already been merged. Please retest with an up to
> date kernel.

Hi,

What's the fix? Please specify it in the following form:

> syzbot will keep track of this bug report.
> Once a fix for this bug is committed, please reply to this email with:
> #syz fix: exact-commit-title
> Note: all commands must start from beginning of the line.

The bot tests HEAD of multiple branches and it needs to know what
fixes what, because this still happens on other branches. Once the bot
knows the fix, it will track when the fix reaches all tested branches.

Thanks

2018-02-13 18:55:42

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: possible deadlock in flush_work (2)

On Mon, Nov 6, 2017 at 11:34 AM, Dmitry Vyukov <[email protected]> wrote:
> On Sun, Nov 5, 2017 at 5:00 PM, Trond Myklebust <[email protected]> wrote:
>>
>>
>> On Sun, 2017-11-05 at 11:53 +0300, Dmitry Vyukov wrote:
>>> On Sun, Nov 5, 2017 at 11:41 AM, syzbot
>>> <bot+d8fe95298ef830cd7d05e33eefa4a5a6f6f334d4@syzkaller.appspotmail.c
>>> om>
>>> wrote:
>>> > Hello,
>>> >
>>> > syzkaller hit the following crash on
>>> > 0f611fb6dcc0d6d91b4e1fec911321f434a3b858
>>> > git://git.cmpxchg.org/linux-mmots.git/master
>>> > compiler: gcc (GCC) 7.1.1 20170620
>>> > .config is attached
>>> > Raw console output is attached.
>>> >
>>> > xs_tcp_setup_socket: connect returned unhandled error -113
>>> > xs_tcp_setup_socket: connect returned unhandled error -113
>>> > xs_tcp_setup_socket: connect returned unhandled error -113
>>> >
>>> > ======================================================
>>> > WARNING: possible circular locking dependency detected
>>> > 4.14.0-rc5-mm1+ #20 Not tainted
>>> > ------------------------------------------------------
>>> > kworker/0:3/3400 is trying to acquire lock:
>>> > ("xprtiod"){+.+.}, at: [<ffffffff8146adda>] start_flush_work
>>> > kernel/workqueue.c:2850 [inline]
>>> > ("xprtiod"){+.+.}, at: [<ffffffff8146adda>] flush_work+0x55a/0x8a0
>>> > kernel/workqueue.c:2882
>>> >
>>> > but task is already holding lock:
>>> > ((&task->u.tk_work)){+.+.}, at: [<ffffffff81471eb2>]
>>> > process_one_work+0xb32/0x1bc0 kernel/workqueue.c:2087
>>> >
>>> > which lock already depends on the new lock.
>>> >
>>> >
>>> > the existing dependency chain (in reverse order) is:
>>> >
>>> > -> #1 ((&task->u.tk_work)){+.+.}:
>>> > process_one_work+0xba2/0x1bc0 kernel/workqueue.c:2088
>>> > worker_thread+0x223/0x1990 kernel/workqueue.c:2246
>>> > kthread+0x38b/0x470 kernel/kthread.c:242
>>> > ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
>>> >
>>> > -> #0 ("xprtiod"){+.+.}:
>>> > lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3991
>>> > start_flush_work kernel/workqueue.c:2851 [inline]
>>> > flush_work+0x57f/0x8a0 kernel/workqueue.c:2882
>>> > __cancel_work_timer+0x30a/0x7e0 kernel/workqueue.c:2954
>>> > cancel_work_sync+0x17/0x20 kernel/workqueue.c:2990
>>> > xprt_destroy+0xa1/0x130 net/sunrpc/xprt.c:1467
>>> > xprt_destroy_kref net/sunrpc/xprt.c:1477 [inline]
>>> > kref_put include/linux/kref.h:70 [inline]
>>> > xprt_put+0x38/0x40 net/sunrpc/xprt.c:1501
>>> > rpc_task_release_client+0x299/0x430 net/sunrpc/clnt.c:986
>>> > rpc_release_resources_task+0x7f/0xa0 net/sunrpc/sched.c:1020
>>> > rpc_release_task net/sunrpc/sched.c:1059 [inline]
>>> > __rpc_execute+0x4d9/0xe70 net/sunrpc/sched.c:824
>>> > rpc_async_schedule+0x16/0x20 net/sunrpc/sched.c:848
>>> > process_one_work+0xbf0/0x1bc0 kernel/workqueue.c:2112
>>> > worker_thread+0x223/0x1990 kernel/workqueue.c:2246
>>> > kthread+0x38b/0x470 kernel/kthread.c:242
>>> > ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
>>> >
>>> > other info that might help us debug this:
>>> >
>>> > Possible unsafe locking scenario:
>>> >
>>> > CPU0 CPU1
>>> > ---- ----
>>> > lock((&task->u.tk_work));
>>> > lock("xprtiod");
>>> > lock((&task->u.tk_work));
>>> > lock("xprtiod");
>>> >
>>> > *** DEADLOCK ***
>>> >
>>> > 2 locks held by kworker/0:3/3400:
>>> > #0: ("rpciod"){+.+.}, at: [<ffffffff81471e5f>] __write_once_size
>>> > include/linux/compiler.h:305 [inline]
>>> > #0: ("rpciod"){+.+.}, at: [<ffffffff81471e5f>] atomic64_set
>>> > arch/x86/include/asm/atomic64_64.h:33 [inline]
>>> > #0: ("rpciod"){+.+.}, at: [<ffffffff81471e5f>] atomic_long_set
>>> > include/asm-generic/atomic-long.h:56 [inline]
>>> > #0: ("rpciod"){+.+.}, at: [<ffffffff81471e5f>] set_work_data
>>> > kernel/workqueue.c:618 [inline]
>>> > #0: ("rpciod"){+.+.}, at: [<ffffffff81471e5f>]
>>> > set_work_pool_and_clear_pending kernel/workqueue.c:645 [inline]
>>> > #0: ("rpciod"){+.+.}, at: [<ffffffff81471e5f>]
>>> > process_one_work+0xadf/0x1bc0 kernel/workqueue.c:2083
>>> > #1: ((&task->u.tk_work)){+.+.}, at: [<ffffffff81471eb2>]
>>> > process_one_work+0xb32/0x1bc0 kernel/workqueue.c:2087
>>> >
>>> > stack backtrace:
>>> > CPU: 0 PID: 3400 Comm: kworker/0:3 Not tainted 4.14.0-rc5-mm1+ #20
>>> > Hardware name: Google Google Compute Engine/Google Compute Engine,
>>> > BIOS
>>> > Google 01/01/2011
>>> > Workqueue: rpciod rpc_async_schedule
>>> > Call Trace:
>>> > __dump_stack lib/dump_stack.c:16 [inline]
>>> > dump_stack+0x194/0x257 lib/dump_stack.c:52
>>> > print_circular_bug.isra.41+0x342/0x36a
>>> > kernel/locking/lockdep.c:1258
>>> > check_prev_add kernel/locking/lockdep.c:1901 [inline]
>>> > check_prevs_add kernel/locking/lockdep.c:2018 [inline]
>>> > validate_chain kernel/locking/lockdep.c:2460 [inline]
>>> > __lock_acquire+0x2f55/0x3d50 kernel/locking/lockdep.c:3487
>>> > lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3991
>>> > start_flush_work kernel/workqueue.c:2851 [inline]
>>> > flush_work+0x57f/0x8a0 kernel/workqueue.c:2882
>>> > __cancel_work_timer+0x30a/0x7e0 kernel/workqueue.c:2954
>>> > cancel_work_sync+0x17/0x20 kernel/workqueue.c:2990
>>> > xprt_destroy+0xa1/0x130 net/sunrpc/xprt.c:1467
>>> > xprt_destroy_kref net/sunrpc/xprt.c:1477 [inline]
>>> > kref_put include/linux/kref.h:70 [inline]
>>> > xprt_put+0x38/0x40 net/sunrpc/xprt.c:1501
>>> > rpc_task_release_client+0x299/0x430 net/sunrpc/clnt.c:986
>>> > rpc_release_resources_task+0x7f/0xa0 net/sunrpc/sched.c:1020
>>> > rpc_release_task net/sunrpc/sched.c:1059 [inline]
>>> > __rpc_execute+0x4d9/0xe70 net/sunrpc/sched.c:824
>>> > rpc_async_schedule+0x16/0x20 net/sunrpc/sched.c:848
>>> > process_one_work+0xbf0/0x1bc0 kernel/workqueue.c:2112
>>> > worker_thread+0x223/0x1990 kernel/workqueue.c:2246
>>> > kthread+0x38b/0x470 kernel/kthread.c:242
>>> > ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
>>>
>>>
>>> +sunrpc maintainers
>>
>> A fix for this has already been merged. Please retest with an up to
>> date kernel.
>
> Hi,
>
> What's the fix? Please specify it in the following form:
>
>> syzbot will keep track of this bug report.
>> Once a fix for this bug is committed, please reply to this email with:
>> #syz fix: exact-commit-title
>> Note: all commands must start from beginning of the line.
>
> The bot tests HEAD of multiple branches and it needs to know what
> fixes what, because this still happens on other branches. Once the bot
> knows the fix, it will track when the fix reaches all tested branches.

Seems to be this one:

#syz fix: SUNRPC: Destroy transport from the system workqueue