Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp411302imm; Fri, 1 Jun 2018 03:12:32 -0700 (PDT) X-Google-Smtp-Source: ADUXVKIiERvQZoq/Rg6lz9UugZMTBQooJ4EVOaPKloLrSWZYjQndx6sFcn2PTFI5QBiiqgYWO1Cl X-Received: by 2002:a62:1656:: with SMTP id 83-v6mr10244141pfw.61.1527847952226; Fri, 01 Jun 2018 03:12:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527847952; cv=none; d=google.com; s=arc-20160816; b=A5lCmXTJ1/6oabpg7vaCeG5EzCXM3SkfigdE93yVMMBxw1QOtiWWfYpNtjWsvvlQKP ziuWsUjDas2e/0EO6kZiHdRifM8+bzgPQKc0rf7SaRTTDH8zsJSYsWqedsNSe+bN89C0 gNjZIdF34vSIY6qTv2roqNIu4IlwwZ2p2FhelXjKKvU43toCZtFwS9OeQFzuUldR/qCG Q+o8q9oFWjr2TRYFAI2MwVErKzRRZz1gjySl2/XC7BVnAIbMNoN+h7T2rQ6bm05LWXOb 6ccuuIIZfuxbeUWLmu8hQFYcmnX1btnxkB18+SPOH/otzTh8Spgv6hMWONZBjm5EuUfe pmVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=xmR3P+slwDfVPXXBnwCm/lssAI8fJYsFH5qoMZ5YfBQ=; b=Zilm2PdvJtWgnTGrsQzRgjVYcd499ZnAgX4VwMG80tr/AIGbbcoC8vS9HeTPGB/zpL qFavjvVSKuxk1QKRSZuQRLXHD7Kz1h5MhChrlCsfeVu7IP1Hs6P/qPgV8H0MMjX7oYaM NBowzMca5vygjHcjtSEunG0vm7BTFA2oSacLjZqGYRM7Z41Q+3kBo6Ani5Li64XgQyzv tbZRHAWTP2by2pYFssiDVOpD+yIjFwyAWEnAB8RwpKXHnUNKOl+QQM0U5d27/U/Zf/t2 8s8dy1fQamoSqIYr9ihlx5e7UMFrEpIBg9+YrYOI4NqqJ/nMJSUI0KUlTKkGwZqOHGl/ E9Pg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w24-v6si37404952plq.254.2018.06.01.03.12.17; Fri, 01 Jun 2018 03:12:32 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751092AbeFAKL4 (ORCPT + 99 others); Fri, 1 Jun 2018 06:11:56 -0400 Received: from www262.sakura.ne.jp ([202.181.97.72]:62912 "EHLO www262.sakura.ne.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750766AbeFAKLx (ORCPT ); Fri, 1 Jun 2018 06:11:53 -0400 Received: from fsav304.sakura.ne.jp (fsav304.sakura.ne.jp [153.120.85.135]) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTP id w51AAL6T081296; Fri, 1 Jun 2018 19:10:22 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav304.sakura.ne.jp (F-Secure/fsigk_smtp/530/fsav304.sakura.ne.jp); Fri, 01 Jun 2018 19:10:21 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/530/fsav304.sakura.ne.jp) Received: from [192.168.1.8] (softbank126074194044.bbtec.net [126.74.194.44]) (authenticated bits=0) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTPSA id w51AAGq0081274 (version=TLSv1.2 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 1 Jun 2018 19:10:21 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Subject: Re: INFO: task hung in blk_queue_enter To: Bart.VanAssche@wdc.com, dvyukov@google.com Cc: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, jthumshirn@suse.de, alan.christopher.jenkins@gmail.com, syzbot+c4f9cebf9d651f6e54de@syzkaller.appspotmail.com, martin.petersen@oracle.com, axboe@kernel.dk, dan.j.williams@intel.com, hch@lst.de, oleksandr@natalenko.name, ming.lei@redhat.com, martin@lichtvoll.de, hare@suse.com, syzkaller-bugs@googlegroups.com, ross.zwisler@linux.intel.com, keith.busch@intel.com, linux-ext4@vger.kernel.org References: <43327033306c3dd2f7c3717d64ce22415b6f3451.camel@wdc.com> <6db16aa3a7c56b6dcca2d10b4e100a780c740081.camel@wdc.com> <201805220652.BFH82351.SMQFFOJOtFOVLH@I-love.SAKURA.ne.jp> <201805222020.FEJ82897.OFtJMFHOVLQOSF@I-love.SAKURA.ne.jp> From: Tetsuo Handa Message-ID: <25708e84-6f35-04c3-a2e4-6854f0ed9e78@I-love.SAKURA.ne.jp> Date: Fri, 1 Jun 2018 19:10:17 +0900 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <201805222020.FEJ82897.OFtJMFHOVLQOSF@I-love.SAKURA.ne.jp> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Tetsuo Handa wrote: > Since sum of percpu_count did not change after percpu_ref_kill(), this is > not a race condition while folding percpu counter values into atomic counter > value. That is, for some reason, someone who is responsible for calling > percpu_ref_put(&q->q_usage_counter) (presumably via blk_queue_exit()) is > unable to call percpu_ref_put(). > But I don't know how to find someone who is failing to call percpu_ref_put()... I found the someone. It was already there in the backtrace... ---------------------------------------- [ 62.065852] a.out D 0 4414 4337 0x00000000 [ 62.067677] Call Trace: [ 62.068545] __schedule+0x40b/0x860 [ 62.069726] schedule+0x31/0x80 [ 62.070796] schedule_timeout+0x1c1/0x3c0 [ 62.072159] ? __next_timer_interrupt+0xd0/0xd0 [ 62.073670] blk_queue_enter+0x218/0x520 [ 62.074985] ? remove_wait_queue+0x70/0x70 [ 62.076361] generic_make_request+0x3d/0x540 [ 62.077785] ? __bio_clone_fast+0x6b/0x80 [ 62.079147] ? bio_clone_fast+0x2c/0x70 [ 62.080456] blk_queue_split+0x29b/0x560 [ 62.081772] ? blk_queue_split+0x29b/0x560 [ 62.083162] blk_mq_make_request+0x7c/0x430 [ 62.084562] generic_make_request+0x276/0x540 [ 62.086034] submit_bio+0x6e/0x140 [ 62.087185] ? submit_bio+0x6e/0x140 [ 62.088384] ? guard_bio_eod+0x9d/0x1d0 [ 62.089681] do_mpage_readpage+0x328/0x730 [ 62.091045] ? __add_to_page_cache_locked+0x12e/0x1a0 [ 62.092726] mpage_readpages+0x120/0x190 [ 62.094034] ? check_disk_change+0x70/0x70 [ 62.095454] ? check_disk_change+0x70/0x70 [ 62.096849] ? alloc_pages_current+0x65/0xd0 [ 62.098277] blkdev_readpages+0x18/0x20 [ 62.099568] __do_page_cache_readahead+0x298/0x360 [ 62.101157] ondemand_readahead+0x1f6/0x490 [ 62.102546] ? ondemand_readahead+0x1f6/0x490 [ 62.103995] page_cache_sync_readahead+0x29/0x40 [ 62.105539] generic_file_read_iter+0x7d0/0x9d0 [ 62.107067] ? futex_wait+0x221/0x240 [ 62.108303] ? trace_hardirqs_on+0xd/0x10 [ 62.109654] blkdev_read_iter+0x30/0x40 [ 62.110954] generic_file_splice_read+0xc5/0x140 [ 62.112538] do_splice_to+0x74/0x90 [ 62.113726] splice_direct_to_actor+0xa4/0x1f0 [ 62.115209] ? generic_pipe_buf_nosteal+0x10/0x10 [ 62.116773] do_splice_direct+0x8a/0xb0 [ 62.118056] do_sendfile+0x1aa/0x390 [ 62.119255] __x64_sys_sendfile64+0x4e/0xc0 [ 62.120666] do_syscall_64+0x6e/0x210 [ 62.121909] entry_SYSCALL_64_after_hwframe+0x49/0xbe ---------------------------------------- The someone is blk_queue_split() from blk_mq_make_request() who depends on an assumption that blk_queue_enter() from recursively called generic_make_request() does not get blocked due to percpu_ref_tryget_live(&q->q_usage_counter) failure. ---------------------------------------- generic_make_request(struct bio *bio) { if (blk_queue_enter(q, flags) < 0) { /* <= percpu_ref_tryget_live() succeeds. */ if (!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT)) bio_wouldblock_error(bio); else bio_io_error(bio); return ret; } (...snipped...) ret = q->make_request_fn(q, bio); (...snipped...) if (q) blk_queue_exit(q); } ---------------------------------------- where q->make_request_fn == blk_mq_make_request which does ---------------------------------------- blk_mq_make_request(struct request_queue *q, struct bio *bio) { blk_queue_split(q, &bio); } blk_queue_split(struct request_queue *q, struct bio **bio) { generic_make_request(*bio); /* <= percpu_ref_tryget_live() fails and waits until atomic_read(&q->mq_freeze_depth) becomes 0. */ } ---------------------------------------- and meanwhile atomic_inc_return(&q->mq_freeze_depth) and percpu_ref_kill() are called by blk_freeze_queue_start()... Now, it is up to you about how to fix this race problem.