Received: by 2002:ac0:98c7:0:0:0:0:0 with SMTP id g7-v6csp2492394imd; Fri, 2 Nov 2018 12:21:51 -0700 (PDT) X-Google-Smtp-Source: AJdET5emPXUme+VLOMtVtDUgAXZA49FiboSGvuUq/KQjdGnkyTIJ++dpwJ4PcdxcA13niBbyR7ZV X-Received: by 2002:a63:4f5e:: with SMTP id p30mr5134305pgl.71.1541186511313; Fri, 02 Nov 2018 12:21:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1541186511; cv=none; d=google.com; s=arc-20160816; b=vOLOJBj6nVcB36xjhYV8T//Pa+4NrAOBp0+kCiBt8UiIDLXZ6j+OWz0JqX9BDQIs0O 9ewzoavyDARxslrgLzCIhPPvI63zu4G+CWJuvcM0deNePmQVR8jbLlA9aTEMbWtPmKGN EQ3dUbEzr+c7FlTJE4xV5T6DiCCfmBtjCBSoypLTKNZezwzAFUBKcM4Ck5wiwMo7u3Xn eolH8tzVtvLZV3Qb/TJpEFmlMM0PtN7wJannySHjZFF/cGhNEcT3MhE2Mc4uHZcsrONd gtVoxf/8vbTCK9IXGU3GlZ061c7Ll7WQ/siv3pw0uJRbUE8qHajhoPaE6HD7aLpzk6i1 LvLg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature; bh=29K521QUNUq/mTyay8S87fPUFUgW2fNu2DEDWamBehc=; b=XsGGHW13XaQV0vWXgOLrSZ+8ulQUK2tN/NRORPa+n/WbkNHvPw5wPUPENIxr/GyN+y FYw18HfpgDFQ9bRJcUSTKdD56rGEDM1xayqKl6HXZUkKznRaAKPeDWshVPAYofWeTTIw SiAoFXn/7hqiPD6D7yrhfOsBYJWC/LN6DpHPAs2mXl1Kf10/aYw0X12Q22opwubyPiZ4 c1wAFqI8+rAJYaYQf4rnI+j3dFWCzhK9WrP/CbmsCiueghvGBhI+S7EajFHe19iiT1Qz jEFxsQm9+WTC0UeRB1vhZyq2lO6mQAsDqQ9PRmwrS4CbIhInUCceife3q7Rk70pLH7GU amVQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Ol7xZadR; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m7-v6si33762268pls.358.2018.11.02.12.21.35; Fri, 02 Nov 2018 12:21:51 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Ol7xZadR; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726911AbeKCE2W (ORCPT + 99 others); Sat, 3 Nov 2018 00:28:22 -0400 Received: from mail-it1-f195.google.com ([209.85.166.195]:37670 "EHLO mail-it1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725917AbeKCE2W (ORCPT ); Sat, 3 Nov 2018 00:28:22 -0400 Received: by mail-it1-f195.google.com with SMTP id e74-v6so4621121ita.2 for ; Fri, 02 Nov 2018 12:19:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=29K521QUNUq/mTyay8S87fPUFUgW2fNu2DEDWamBehc=; b=Ol7xZadRL8HtwRCktGXoywaaWw4+oGVgc/IrPChdUEtmW3m1tIMV8vR90NhzKs86xW nYsFyX2qO01/G3EX6GWFZ2kPiQYs4ddStdhbcenRwj0gjLbqkdrwD1i4va1P/AsO578V Knm/wETnoTIQBQzBmxbrnHbNZsXBhM7cabvKwJqebNXKaync32nRW0kMHkY4MjvZTyNb 8/OTXIgTITt306uvbvTFbIPoRaPeMBYtbdBr1oRZqcI07Gh0VPUOSMLAgNxe+Z6VzTf2 MWCYYK2GlvT5p012mMvZZXwzC0kLdYT3oPfRtnwOgDESvE01Xbe4UvWeUmpOhlvmWZXU 72sA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=29K521QUNUq/mTyay8S87fPUFUgW2fNu2DEDWamBehc=; b=T0P0Wv/gHbXVSY1tNkHm0ryvxY/dZD7lkyvQnZJg2pnsaIhprFQENQ1+KMEaqE+MAC SdxMThHZcWYkeh3EVzXYvrBk526W5e8rVRa2JnV4Bhzww9F7MTImRJFBys1AAKNSo1Of JGgEz7Wx0JgpU/wLswETbkpIF6IAWphVv6qGmgW2xJvKzYrCxoPGLvTrZFeDkgpP2c3K o7l/V1Fi83K7aXASC3WfrsFa6sM9rrBdB0ylhIZTAwHMwm1++qmlML6VqJwtz9W3VXDR zCyIWvhbBWCibfDHW7Q4BYhBGEddB/kyKspO3sRpwQNqUgLewwvmZeBeC56CW+0Eh8wY 9LIA== X-Gm-Message-State: AGRZ1gJLrRKuhrvcmaasJvlLrmEqspgRGMVfe/5QQoq/aM+q0H+dKm3V Fl+Iq01jE97IytRwNJ3M3dwzMjmgPAPl7Sp22oNZmA== X-Received: by 2002:a02:1548:: with SMTP id j69-v6mr10789578jad.72.1541186398616; Fri, 02 Nov 2018 12:19:58 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a02:b01b:0:0:0:0:0 with HTTP; Fri, 2 Nov 2018 12:19:38 -0700 (PDT) In-Reply-To: <727110bb-0154-e5df-4b2f-e965e3b98c62@i-love.sakura.ne.jp> References: <0000000000002f5541057143a85e@google.com> <0adc592b-d4a3-f6da-3c5c-22490f641eb9@i-love.sakura.ne.jp> <727110bb-0154-e5df-4b2f-e965e3b98c62@i-love.sakura.ne.jp> From: Dmitry Vyukov Date: Fri, 2 Nov 2018 20:19:38 +0100 Message-ID: Subject: Re: INFO: task hung in grab_super To: Tetsuo Handa Cc: Eric Van Hensbergen , Ron Minnich , Latchesar Ionkov , v9fs-developer@lists.sourceforge.net, syzbot , linux-fsdevel , LKML , syzkaller-bugs , Al Viro Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 18, 2018 at 4:17 PM, Tetsuo Handa wrote: > On 2018/07/18 23:11, Dmitry Vyukov wrote: >> On Wed, Jul 18, 2018 at 3:35 PM, Tetsuo Handa >> wrote: >>>>>> This seems to be related to 9p. After rerunning the log I got: >>>>>> >>>>>> root@syzkaller:~# ps afxu | grep syz >>>>>> root 18253 0.0 0.0 0 0 ttyS0 Zl 10:16 0:00 \_ >>>>>> [syz-executor] >>>>>> root@syzkaller:~# cat /proc/18253/task/*/stack >>>>>> [<0>] p9_client_rpc+0x3a2/0x1400 >>>>>> [<0>] p9_client_flush+0x134/0x2a0 >>>>>> [<0>] p9_client_rpc+0x122c/0x1400 >>>>>> [<0>] p9_client_create+0xc56/0x16af >>>>>> [<0>] v9fs_session_init+0x21a/0x1a80 >>>>>> [<0>] v9fs_mount+0x7c/0x900 >>>>>> [<0>] mount_fs+0xae/0x328 >>>>>> [<0>] vfs_kern_mount.part.34+0xdc/0x4e0 >>>>>> [<0>] do_mount+0x581/0x30e0 >>>>>> [<0>] ksys_mount+0x12d/0x140 >>>>>> [<0>] __x64_sys_mount+0xbe/0x150 >>>>>> [<0>] do_syscall_64+0x1b9/0x820 >>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe >>>>>> [<0>] 0xffffffffffffffff >>>>>> >>>>>> There is a bunch of hangs in 9p, so let's do: >>>>>> >>>>>> #syz dup: INFO: task hung in flush_work >>>>>> >>>>> Then, is dumping all threads when khungtaskd fires a candidate >>>>> for CONFIG_DEBUG_AID_FOR_SYZBOT=y path? >>>> >>>> Perhaps would be useful. But maybe only tasks that are blocked for >>>> more than timeout/2? and/or unkillable tasks? killable tasks are not a >>>> problem. >>> >>> TASK_KILLABLE waiters are not reported by khungtaskd, are they? >>> >>> /* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */ >>> if (t->state == TASK_UNINTERRUPTIBLE) >>> check_hung_task(t, timeout); >>> >>> And TASK_KILLABLE waiters can become a problem because >>> >>>> >>>> Btw, I see that p9_client_rpc uses wait_event_killable, why wasn't it >>>> killed along with the whole process? >>>> >>> >>> wait_event_killable() would return -ERESTARTSYS if got SIGKILL. >>> But if (c->status == Connected) && (type == P9_TFLUSH) is also true, >>> it ignores SIGKILL by retrying the loop... >>> >>> again: >>> err = wait_event_killable(*req->wq, req->status >= REQ_STATUS_RCVD); >>> if ((err == -ERESTARTSYS) && (c->status == Connected) && (type == P9_TFLUSH)) { >>> sigpending = 1; >>> clear_thread_flag(TIF_SIGPENDING); >>> goto again; >>> } >>> >>> I wish they don't ignore SIGKILL (by e.g. offloading operations to a kernel thread). >> >> >> I guess that's the problem, right? SIGKILL-ed task must not ignore >> SIGKILL and hang in infinite loop. This would explain a bunch of hangs >> in 9p. > > Did you check /proc/18253/task/*/stack after manually sending SIGKILL? Yes: root@syzkaller:~# ps afxu | grep syz root 18253 0.0 0.0 0 0 ttyS0 Zl 10:16 0:00 \_ [syz-executor] root@syzkaller:~# cat /proc/18253/task/*/stack [<0>] p9_client_rpc+0x3a2/0x1400 [<0>] p9_client_flush+0x134/0x2a0 [<0>] p9_client_rpc+0x122c/0x1400 [<0>] p9_client_create+0xc56/0x16af [<0>] v9fs_session_init+0x21a/0x1a80 [<0>] v9fs_mount+0x7c/0x900 [<0>] mount_fs+0xae/0x328 [<0>] vfs_kern_mount.part.34+0xdc/0x4e0 [<0>] do_mount+0x581/0x30e0 [<0>] ksys_mount+0x12d/0x140 [<0>] __x64_sys_mount+0xbe/0x150 [<0>] do_syscall_64+0x1b9/0x820 [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe [<0>] 0xffffffffffffffff > I mean, who (i.e. you or syzkaller programs) is sending a signal (not limited > to SIGKILL but any signal) that makes TASK_KILLABLE waiters to wake up? Both. syzkaller always SIGKILLs test process after some timeout and expects it to go away. I also tried manually after that, but it does not make any difference.