Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp281973imm; Mon, 4 Jun 2018 17:42:45 -0700 (PDT) X-Google-Smtp-Source: ADUXVKIkXetxcx4h7ltSU3LeED0bdLcPEG8KYKxfblMuBL2/bZ9xQt5/DN9H+eOwnC18uoQupDS+ X-Received: by 2002:a63:4383:: with SMTP id q125-v6mr19420866pga.412.1528159365128; Mon, 04 Jun 2018 17:42:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528159365; cv=none; d=google.com; s=arc-20160816; b=xYE23bz6/5uANuDaemu9fK2EHqmlT5PGZt8RbW88zVgPhLpxsWAMJdlGpi0jMwbyOY 1fzsVWqQ8ldQ0qBYq5DI+919ffQICaXTe8TCMugOn/OvUsi4B6yyge/e4pt+izQ8BIwz iSRTRbQi3LdWJkTQgq4qzSv2t6/lURxMKsW8MsJqUgC9W1fMVhervgLqw/qBIJqdvfCW iwLe0qOZulEEbpBPia0PzSX5oly9TgLSC5KRLqdB/HE35XNJGdyWIqMwAy2O8Ds2sbnR VQ7nrFI3xLBZW+MjqyXFLZoIuAoR7gdErH2OINOESduOCq57fBTtaS2HhPu8AJKeUa21 pPtA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=o6puewiaSRmQT36UPmJC0BdjsB7+PDgmu+2T5Pcd2ag=; b=A1Oy3yFxEBcNxPvLmjfl68rnQWkSYOsDDCB+cyyFAPKPDBQNLFSSEZ3vGh6e5xfA94 om54pGKfT4waFD+LP113f64gB8MXmR97dRCZPjmEq3N33SoEwjE7h9GchXjFmCO4e4P/ lWyyWzb0KarOxGf5UJ3mMhiVTy7SUbghLMtVs8B6BD8k6R8n3B8R4O7BwaMwvnykuF4G KTFqkqX33kWG8U1UrSvmWK5G2yBzTAGAJBKWxK613xfeXSI+b9b4krZnZRz8irbWDNiv BO2ukqKj9onl+nWh2tBhHOR0gs3mLWjPyVSQfL/6J0rRxkhnRo1vhVvVBgEbNTM9Eftx e5Ew== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 33-v6si48534295plu.385.2018.06.04.17.42.30; Mon, 04 Jun 2018 17:42:45 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751326AbeFEAl7 (ORCPT + 99 others); Mon, 4 Jun 2018 20:41:59 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:44370 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751046AbeFEAl6 (ORCPT ); Mon, 4 Jun 2018 20:41:58 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 65A0640122B5; Tue, 5 Jun 2018 00:41:57 +0000 (UTC) Received: from ming.t460p (ovpn-12-34.pek2.redhat.com [10.72.12.34]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 6FC841117627; Tue, 5 Jun 2018 00:41:39 +0000 (UTC) Date: Tue, 5 Jun 2018 08:41:35 +0800 From: Ming Lei To: Tetsuo Handa Cc: Jens Axboe , Bart.VanAssche@wdc.com, dvyukov@google.com, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, jthumshirn@suse.de, alan.christopher.jenkins@gmail.com, syzbot+c4f9cebf9d651f6e54de@syzkaller.appspotmail.com, martin.petersen@oracle.com, dan.j.williams@intel.com, hch@lst.de, oleksandr@natalenko.name, martin@lichtvoll.de, hare@suse.com, syzkaller-bugs@googlegroups.com, ross.zwisler@linux.intel.com, keith.busch@intel.com, linux-ext4@vger.kernel.org Subject: Re: INFO: task hung in blk_queue_enter Message-ID: <20180605004128.GA28826@ming.t460p> References: <25708e84-6f35-04c3-a2e4-6854f0ed9e78@I-love.SAKURA.ne.jp> <201806050027.w550RfJl010157@www262.sakura.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201806050027.w550RfJl010157@www262.sakura.ne.jp> User-Agent: Mutt/1.9.1 (2017-09-22) X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Tue, 05 Jun 2018 00:41:57 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Tue, 05 Jun 2018 00:41:57 +0000 (UTC) for IP:'10.11.54.3' DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'ming.lei@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 05, 2018 at 09:27:41AM +0900, Tetsuo Handa wrote: > Jens Axboe wrote: > > On 6/1/18 4:10 AM, Tetsuo Handa wrote: > > > Tetsuo Handa wrote: > > >> Since sum of percpu_count did not change after percpu_ref_kill(), this is > > >> not a race condition while folding percpu counter values into atomic counter > > >> value. That is, for some reason, someone who is responsible for calling > > >> percpu_ref_put(&q->q_usage_counter) (presumably via blk_queue_exit()) is > > >> unable to call percpu_ref_put(). > > >> But I don't know how to find someone who is failing to call percpu_ref_put()... > > > > > > I found the someone. It was already there in the backtrace... > > > > > > > Ahh, nicely spotted! One idea would be the one below. For this case, > > we're recursing, so we can either do a non-block queue enter, or we > > can just do a live enter. > > > > While "block: don't use blocking queue entered for recursive bio submits" was > already applied, syzbot is still reporting a hung task with same signature but > different trace. > > https://syzkaller.appspot.com/text?tag=CrashLog&x=1432cedf800000 > ---------------------------------------- > [ 492.512243] INFO: task syz-executor1:20263 blocked for more than 120 seconds. > [ 492.519604] Not tainted 4.17.0+ #83 > [ 492.523793] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 492.531787] syz-executor1 D23384 20263 4574 0x00000004 > [ 492.537443] Call Trace: > [ 492.540041] __schedule+0x801/0x1e30 > [ 492.580958] schedule+0xef/0x430 > [ 492.610154] blk_queue_enter+0x8da/0xdf0 > [ 492.716327] generic_make_request+0x651/0x1790 > [ 492.765680] submit_bio+0xba/0x460 > [ 492.793198] submit_bio_wait+0x134/0x1e0 > [ 492.801891] blkdev_issue_flush+0x204/0x300 > [ 492.806236] blkdev_fsync+0x93/0xd0 > [ 492.813620] vfs_fsync_range+0x140/0x220 > [ 492.817702] vfs_fsync+0x29/0x30 > [ 492.821081] __loop_update_dio+0x4de/0x6a0 > [ 492.825341] lo_ioctl+0xd28/0x2190 > [ 492.833442] blkdev_ioctl+0x9b6/0x2020 > [ 492.872146] block_ioctl+0xee/0x130 > [ 492.880139] do_vfs_ioctl+0x1cf/0x16a0 > [ 492.927550] ksys_ioctl+0xa9/0xd0 > [ 492.931036] __x64_sys_ioctl+0x73/0xb0 > [ 492.934952] do_syscall_64+0x1b1/0x800 > [ 492.963624] entry_SYSCALL_64_after_hwframe+0x49/0xbe > [ 493.212768] 1 lock held by syz-executor1/20263: > [ 493.217448] #0: 00000000956bf5a3 (&lo->lo_ctl_mutex/1){+.+.}, at: lo_ioctl+0x8d/0x2190 > ---------------------------------------- > > Is it OK to call [__]loop_update_dio() between blk_mq_freeze_queue() and > blk_mq_unfreeze_queue(), for vfs_fsync() from __loop_update_dio() is calling > blk_queue_enter() after blk_mq_freeze_queue() started blocking blk_queue_enter() > by caling atomic_inc_return() and percpu_ref_kill() ? > The vfs_fsync() isn't necessary in loop_update_dio() since both generic_file_write_iter() and generic_file_read_iter() can handle buffered io vs dio well. I will send one patch to remove the vfs_sync() later. Thanks, Ming