Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935357AbcJZWvl (ORCPT ); Wed, 26 Oct 2016 18:51:41 -0400 Received: from mail-oi0-f49.google.com ([209.85.218.49]:35058 "EHLO mail-oi0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932584AbcJZWvE (ORCPT ); Wed, 26 Oct 2016 18:51:04 -0400 MIME-Version: 1.0 In-Reply-To: <20161026224025.mou27kki4bslftli@codemonkey.org.uk> References: <20161026002752.qvrm6yxqb54fiqnd@codemonkey.org.uk> <20161026163018.wx57yy554576s6e2@codemonkey.org.uk> <20161026184201.6ofblkd3j5uxystq@codemonkey.org.uk> <488f9edc-6a1c-2c68-0d33-d3aa32ece9a4@fb.com> <20161026224025.mou27kki4bslftli@codemonkey.org.uk> From: Linus Torvalds Date: Wed, 26 Oct 2016 15:51:01 -0700 X-Google-Sender-Auth: GGRAoy8EgatZxAdLcwY7rN1Epno Message-ID: Subject: Re: bio linked list corruption. To: Dave Jones , Linus Torvalds , Chris Mason , Andy Lutomirski , Andy Lutomirski , Jens Axboe , Al Viro , Josef Bacik , David Sterba , linux-btrfs , Linux Kernel , Dave Chinner Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1432 Lines: 47 On Wed, Oct 26, 2016 at 3:40 PM, Dave Jones wrote: > > I gave it a shot too for shits & giggles. > This falls out during boot. > > [ 9.278420] WARNING: CPU: 0 PID: 1 at block/blk-mq.c:1181 blk_sq_make_request+0x465/0x4a0 Hmm. That's the WARN_ON_ONCE(rq->mq_ctx != ctx); that I added to blk_mq_merge_queue_io(), and I really think that warning is valid, and the fact that it triggers shows that something is wrong with locking. We just did a spin_lock(&ctx->lock); and that lock is *supposed* to protect the __blk_mq_insert_request(), but that uses rq->mq_ctx. So if rq->mq_ctx != ctx, then we're locking the wrong context. Jens - please explain to me why I'm wrong. Or maybe I actually might have found the problem? In which case please send me a patch that fixes it ;) Dave: it might be a good idea to split that "WARN_ON_ONCE()" in blk_mq_merge_queue_io() into two, since right now it can trigger both for the blk_mq_bio_to_request(rq, bio); path _and_ for the if (!blk_mq_attempt_merge(q, ctx, bio)) { blk_mq_bio_to_request(rq, bio); goto insert_rq; path. If you split it into two: one before that "insert_rq:" label, and one before the "goto insert_rq" thing, then we could see if it is just one of the blk_mq_merge_queue_io() cases (or both) that is broken.. Linus