Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp272828imm; Mon, 4 Jun 2018 17:29:22 -0700 (PDT) X-Google-Smtp-Source: ADUXVKKp5xknIOOzTJXuY+ccU7uinXSQdnAxfPQcjtHeqJbXq96JPiOdCmGoqhI1zI41yBufEj8x X-Received: by 2002:a63:b257:: with SMTP id t23-v6mr16232407pgo.431.1528158562633; Mon, 04 Jun 2018 17:29:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528158562; cv=none; d=google.com; s=arc-20160816; b=JCczsVPOOPEQuFfhl9Nn3aZMaNfvi/PP3gwQQK5T0/cJLYflTjmAd3DFnzCiKtsRWE i24/hIa7HQpdbvTjMwQeFagfqMSvTnXgW+pBAblOTNH1Bw7pZj5qXL/nvqz+nFaA3Kmu zRpYM0ls0p0zmUXyxbvtC0FPPN85oaoGPEUw0y9FyqLBC79FdW8CmQ1dQz8b0RMgxvqZ px0Y9Cq0R7DMRqwrK9FYK5D/nY8Acx1NLfjRi0qP0Mu6n6iE/FJLsy9S3MmBbmCJn/uM NuqBcIdhjeKL0kaXG1qBRBxQm/KTD8dzMMFky9V6hzTJq+EA6kWdgkgnDGy65GoL32E/ T4kg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :references:date:mime-version:cc:to:from:subject:message-id :arc-authentication-results; bh=oaOCyh/3tbE3LYxGC8SkIR8n0d8siVIhqXnVbueoMGI=; b=K6kFJXwzkBjIJAUQR4K3suUfnx5ndEZiPS4TmhxRilCG+kah/SkW0IuQNJ/eWLk2Hn tDZBlZKc/4Dpvcjud0iqplB8N/gQkurJ04scRIzmldigBb2Rlw0vyIDimCuDQzttfVAv oNfFYV+pp64qkZ8Gwil28VRTr9uLrV/GfyAAt1AbtJnNB/dEdf13na9hpiJhQrHPipZ1 7ymeFyfnJR7Y/suBsnQ8yNs3C6a0vahvOK+khz4+RZHCM4UuP9FLZMx//ESwd063BjCR 96nFL3VpKeJMdY+NtTewOF90mAh2NhMcoPJf2JqR8N3zZ8lOmU+Z1mfUudUhgrjIHWYS DmPA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u4-v6si26029740plj.43.2018.06.04.17.29.07; Mon, 04 Jun 2018 17:29:22 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751354AbeFEA2n (ORCPT + 99 others); Mon, 4 Jun 2018 20:28:43 -0400 Received: from www262.sakura.ne.jp ([202.181.97.72]:37262 "EHLO www262.sakura.ne.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751042AbeFEA2m (ORCPT ); Mon, 4 Jun 2018 20:28:42 -0400 Received: from fsav401.sakura.ne.jp (fsav401.sakura.ne.jp [133.242.250.100]) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTP id w550Rfvi010163; Tue, 5 Jun 2018 09:27:42 +0900 (JST) (envelope-from penguin-kernel@i-love.sakura.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav401.sakura.ne.jp (F-Secure/fsigk_smtp/530/fsav401.sakura.ne.jp); Tue, 05 Jun 2018 09:27:41 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/530/fsav401.sakura.ne.jp) Received: from www262.sakura.ne.jp (localhost [127.0.0.1]) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTP id w550RfYN010158; Tue, 5 Jun 2018 09:27:41 +0900 (JST) (envelope-from penguin-kernel@i-love.sakura.ne.jp) Received: (from i-love@localhost) by www262.sakura.ne.jp (8.15.2/8.15.2/Submit) id w550RfJl010157; Tue, 5 Jun 2018 09:27:41 +0900 (JST) (envelope-from penguin-kernel@i-love.sakura.ne.jp) Message-Id: <201806050027.w550RfJl010157@www262.sakura.ne.jp> X-Authentication-Warning: www262.sakura.ne.jp: i-love set sender to penguin-kernel@i-love.sakura.ne.jp using -f Subject: Re: INFO: task hung in =?ISO-2022-JP?B?YmxrX3F1ZXVlX2VudGVy?= From: Tetsuo Handa To: Jens Axboe Cc: Bart.VanAssche@wdc.com, dvyukov@google.com, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, jthumshirn@suse.de, alan.christopher.jenkins@gmail.com, syzbot+c4f9cebf9d651f6e54de@syzkaller.appspotmail.com, martin.petersen@oracle.com, dan.j.williams@intel.com, hch@lst.de, oleksandr@natalenko.name, ming.lei@redhat.com, martin@lichtvoll.de, hare@suse.com, syzkaller-bugs@googlegroups.com, ross.zwisler@linux.intel.com, keith.busch@intel.com, linux-ext4@vger.kernel.org MIME-Version: 1.0 Date: Tue, 05 Jun 2018 09:27:41 +0900 References: <25708e84-6f35-04c3-a2e4-6854f0ed9e78@I-love.SAKURA.ne.jp> In-Reply-To: Content-Type: text/plain; charset="ISO-2022-JP" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Jens Axboe wrote: > On 6/1/18 4:10 AM, Tetsuo Handa wrote: > > Tetsuo Handa wrote: > >> Since sum of percpu_count did not change after percpu_ref_kill(), this is > >> not a race condition while folding percpu counter values into atomic counter > >> value. That is, for some reason, someone who is responsible for calling > >> percpu_ref_put(&q->q_usage_counter) (presumably via blk_queue_exit()) is > >> unable to call percpu_ref_put(). > >> But I don't know how to find someone who is failing to call percpu_ref_put()... > > > > I found the someone. It was already there in the backtrace... > > > > Ahh, nicely spotted! One idea would be the one below. For this case, > we're recursing, so we can either do a non-block queue enter, or we > can just do a live enter. > While "block: don't use blocking queue entered for recursive bio submits" was already applied, syzbot is still reporting a hung task with same signature but different trace. https://syzkaller.appspot.com/text?tag=CrashLog&x=1432cedf800000 ---------------------------------------- [ 492.512243] INFO: task syz-executor1:20263 blocked for more than 120 seconds. [ 492.519604] Not tainted 4.17.0+ #83 [ 492.523793] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 492.531787] syz-executor1 D23384 20263 4574 0x00000004 [ 492.537443] Call Trace: [ 492.540041] __schedule+0x801/0x1e30 [ 492.580958] schedule+0xef/0x430 [ 492.610154] blk_queue_enter+0x8da/0xdf0 [ 492.716327] generic_make_request+0x651/0x1790 [ 492.765680] submit_bio+0xba/0x460 [ 492.793198] submit_bio_wait+0x134/0x1e0 [ 492.801891] blkdev_issue_flush+0x204/0x300 [ 492.806236] blkdev_fsync+0x93/0xd0 [ 492.813620] vfs_fsync_range+0x140/0x220 [ 492.817702] vfs_fsync+0x29/0x30 [ 492.821081] __loop_update_dio+0x4de/0x6a0 [ 492.825341] lo_ioctl+0xd28/0x2190 [ 492.833442] blkdev_ioctl+0x9b6/0x2020 [ 492.872146] block_ioctl+0xee/0x130 [ 492.880139] do_vfs_ioctl+0x1cf/0x16a0 [ 492.927550] ksys_ioctl+0xa9/0xd0 [ 492.931036] __x64_sys_ioctl+0x73/0xb0 [ 492.934952] do_syscall_64+0x1b1/0x800 [ 492.963624] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 493.212768] 1 lock held by syz-executor1/20263: [ 493.217448] #0: 00000000956bf5a3 (&lo->lo_ctl_mutex/1){+.+.}, at: lo_ioctl+0x8d/0x2190 ---------------------------------------- Is it OK to call [__]loop_update_dio() between blk_mq_freeze_queue() and blk_mq_unfreeze_queue(), for vfs_fsync() from __loop_update_dio() is calling blk_queue_enter() after blk_mq_freeze_queue() started blocking blk_queue_enter() by caling atomic_inc_return() and percpu_ref_kill() ?