Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp1977425imm; Wed, 16 May 2018 06:07:25 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpqVEwNwugBym10EdAMzu3/A43PUxR3nljC4L+uS8RGwXG+5jOg8YjYRHNDgFKhW4vYCzbf X-Received: by 2002:a62:a111:: with SMTP id b17-v6mr923478pff.132.1526476045714; Wed, 16 May 2018 06:07:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526476045; cv=none; d=google.com; s=arc-20160816; b=eKZvbfW3LVjrqLEOXvmdejmfi0tEUa5kMJkRpsGVP+DThQZbMZuvaLlE3fOXJx/pyX OsUu1AM6M5BUY2FimJbI8SGzKsQXzjLFJqLGj+5BRGHtTMYWWh8pzerdR5qsJYuWze/Q vYgUNvuHRKPJHO1d3jcHa/eKqse1ebBxl6upr2rvz97xMXGzzkDxcYeBxfGGKQbWsUW2 TBSLSEFq4e7YwabNmsa8Fu5RZp4xGk6NddBzqDE94JSViOpzWQoFb7xOBr1c6xdQL2oT 7s/qq0htLVMEcehTn78w+1KQKi61Dfdtt0nEKz88IjJbCf9kAVwZMPB4t6yJq/0UFTpU 2iMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=pXgBCGiTacRd9ZecKPu8G1uQeUWQrrvIs8TRAQHiuk8=; b=TXJbLHyAXh3s6jqinBydmNCAXhdOBgLwhkD9hj56NXVItX6wDjiKYe0Ng0/Vx9hIo2 WnYwuHIkUhyJEWNvgeSMLY0UEuWN3Y4/KJH0U5gCPhLwT8e7qd2IrSgbs5h2j0u/bzID TP1wsWpQEp6bATeK04H70Djsov2BXsPezH58xH5QW+37YQImouDHC0Ab5rGCdLF7Sa8r wRsNusEvcCvxA4b+34a7q0ltmefWlkMcfxoDq0n1d5qX2jhKeFQszGeu7991oF98UNhe TTD0iudkW9P8R/96hDWOsnGNNx5Wsc7Xaf4ili791AtuRJQK2GiO6IGnBhZL+na43Q5R /wMA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 43-v6si2579166pla.509.2018.05.16.06.07.10; Wed, 16 May 2018 06:07:25 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752542AbeEPNGq (ORCPT + 99 others); Wed, 16 May 2018 09:06:46 -0400 Received: from www262.sakura.ne.jp ([202.181.97.72]:52960 "EHLO www262.sakura.ne.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751151AbeEPNGo (ORCPT ); Wed, 16 May 2018 09:06:44 -0400 Received: from fsav102.sakura.ne.jp (fsav102.sakura.ne.jp [27.133.134.229]) by www262.sakura.ne.jp (8.14.5/8.14.5) with ESMTP id w4GD5BRN097832; Wed, 16 May 2018 22:05:11 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav102.sakura.ne.jp (F-Secure/fsigk_smtp/530/fsav102.sakura.ne.jp); Wed, 16 May 2018 22:05:11 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/530/fsav102.sakura.ne.jp) Received: from [192.168.1.8] (softbank126074194044.bbtec.net [126.74.194.44]) (authenticated bits=0) by www262.sakura.ne.jp (8.14.5/8.14.5) with ESMTP id w4GD55Nf097774 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 16 May 2018 22:05:10 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Subject: Re: INFO: task hung in blk_queue_enter To: syzbot , linux-block@vger.kernel.org, syzkaller-bugs@googlegroups.com, Dan Williams , Jens Axboe Cc: linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org, dvyukov@google.com, Alan Jenkins , Bart Van Assche , Christoph Hellwig , Hannes Reinecke , Johannes Thumshirn , Keith Busch , "Martin K. Petersen" , Martin Steigerwald , Ming Lei , Oleksandr Natalenko , Ross Zwisler References: <0000000000009b212b056ae6dbad@google.com> <343bbbf6-64eb-879e-d19e-96aebb037d47@I-love.SAKURA.ne.jp> From: Tetsuo Handa Message-ID: Date: Wed, 16 May 2018 22:05:06 +0900 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <343bbbf6-64eb-879e-d19e-96aebb037d47@I-love.SAKURA.ne.jp> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Tetsuo Handa wrote: > I couldn't check whether freeze_depth in blk_freeze_queue_start() was 1, > but presumably q->mq_freeze_depth > 0 because syz-executor7(PID=5010) is > stuck at wait_event() in blk_queue_enter(). > > Since flags == 0, preempt == false. Since stuck at wait_event(), success == false. > Thus, atomic_read(&q->mq_freeze_depth) > 0 if blk_queue_dying(q) == false. And I > guess blk_queue_dying(q) == false because we are just trying to freeze/unfreeze. > I was able to reproduce the hung up using modified reproducer, and got values using below debug printk() patch. --- a/block/blk-core.c +++ b/block/blk-core.c @@ -950,10 +950,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) */ smp_rmb(); - wait_event(q->mq_freeze_wq, - (atomic_read(&q->mq_freeze_depth) == 0 && - (preempt || !blk_queue_preempt_only(q))) || - blk_queue_dying(q)); + while (wait_event_timeout(q->mq_freeze_wq, + (atomic_read(&q->mq_freeze_depth) == 0 && + (preempt || !blk_queue_preempt_only(q))) || + blk_queue_dying(q), 10 * HZ) == 0) + printk("%s(%u): q->mq_freeze_depth=%d preempt=%d blk_queue_preempt_only(q)=%d blk_queue_dying(q)=%d\n", + current->comm, current->pid, atomic_read(&q->mq_freeze_depth), preempt, blk_queue_preempt_only(q), blk_queue_dying(q)); if (blk_queue_dying(q)) return -ENODEV; } [ 75.869126] print_req_error: I/O error, dev loop0, sector 0 [ 85.983146] a.out(8838): q->mq_freeze_depth=1 preempt=0 blk_queue_preempt_only(q)=0 blk_queue_dying(q)=0 [ 96.222884] a.out(8838): q->mq_freeze_depth=1 preempt=0 blk_queue_preempt_only(q)=0 blk_queue_dying(q)=0 [ 106.463322] a.out(8838): q->mq_freeze_depth=1 preempt=0 blk_queue_preempt_only(q)=0 blk_queue_dying(q)=0 [ 116.702912] a.out(8838): q->mq_freeze_depth=1 preempt=0 blk_queue_preempt_only(q)=0 blk_queue_dying(q)=0 One ore more threads are waiting for q->mq_freeze_depth to become 0. But the thread who incremented q->mq_freeze_depth at blk_freeze_queue_start(q) from blk_freeze_queue() is waiting at blk_mq_freeze_queue_wait(). Therefore, atomic_read(&q->mq_freeze_depth) == 0 condition for wait_event() in blk_queue_enter() will never be satisfied. But what does that wait_event() want to do? Isn't "start freezing" a sort of blk_queue_dying(q) == true? Since percpu_ref_tryget_live(&q->q_usage_counter) failed and the queue is about to be frozen, shouldn't we treat atomic_read(&q->mq_freeze_depth) != 0 as if blk_queue_dying(q) == true? That is, something like below: diff --git a/block/blk-core.c b/block/blk-core.c index 85909b4..59e2496 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -951,10 +951,10 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) smp_rmb(); wait_event(q->mq_freeze_wq, - (atomic_read(&q->mq_freeze_depth) == 0 && - (preempt || !blk_queue_preempt_only(q))) || + atomic_read(&q->mq_freeze_depth) || + (preempt || !blk_queue_preempt_only(q)) || blk_queue_dying(q)); - if (blk_queue_dying(q)) + if (atomic_read(&q->mq_freeze_depth) || blk_queue_dying(q)) return -ENODEV; } }