Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp2432612imm; Thu, 2 Aug 2018 11:28:30 -0700 (PDT) X-Google-Smtp-Source: AAOMgpeb1Y6/Or196z80Osq60nX9W2aC+dehTAnwV9ipzQccAg8QRwCG67xkmdSS3Mneu0Y9Zh5f X-Received: by 2002:a65:40ca:: with SMTP id u10-v6mr568364pgp.2.1533234509968; Thu, 02 Aug 2018 11:28:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533234509; cv=none; d=google.com; s=arc-20160816; b=c5XDSmVrqW6THGDXao5XoxyVhxeBDWXjCslitaoINT0ix/rBAGjovQ6c3cFWFJL+If DPne8tZY3/qJKZPzxHLIFdCxmg6GjNxKOHpku39t973ixoZpz7b4wPYkI6D3W53qJhEk G0lnPWuwGf3s72zrEk1MK7gPdva7n9w81FCRa4/RYWhgqaMuBGn+e8M0LCdQXlKX0yLv focenCi10j9ykviWJ68aAToOg2dHpsPA96P3uf3dWBgdI8skTt8FFefPiBcZ8zZ09VQU lmud7jN5eHbtr/vkPF+6G+7I0ebEBc0XHFi/WVVLja07xIwsUcztcm7LksqgcM3DUxU5 tZiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:date:cc:to:from:subject:message-id :dkim-signature:arc-authentication-results; bh=6bA9oFLuvsxXfzCMTAaQL2411XLfl8TtyiiOQGYkXIw=; b=f/0H+sWGh5erVTOj951zw80612jLZA8L9vx7aygYRXJhwhy6zvV3b0e5PrtGQxGRAt y9DrZO9cBpQeVHZCfLcxw0DuSS9qlQRb1q2kYJEbprRT55zjurk6eNA/vWo7ov/zWVsv KxrK12RuxE+pun+EjFdsD207qUu4VcIdbXOlX5xfL8t0Q/1yoRryDO7W3aQUfvedEq6L v9C5rzfngcY1CIlaeDSB+Tc8Ne2Tjs6+d7db7e0t/LaQiEPR3y+nxXr2s0iAKfOjbtAf LB0cQN0q3IAhDcikwcA03CZ/G6zlYeO9pE2HAsJrqd16PE5u4CUnpxTrbPrfiVGolWqn XxHg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@hansenpartnership.com header.s=20151216 header.b=s1LTwuDb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=hansenpartnership.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id bf9-v6si1974654plb.76.2018.08.02.11.28.15; Thu, 02 Aug 2018 11:28:29 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@hansenpartnership.com header.s=20151216 header.b=s1LTwuDb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=hansenpartnership.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732345AbeHBTKn (ORCPT + 99 others); Thu, 2 Aug 2018 15:10:43 -0400 Received: from bedivere.hansenpartnership.com ([66.63.167.143]:38092 "EHLO bedivere.hansenpartnership.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726938AbeHBTKn (ORCPT ); Thu, 2 Aug 2018 15:10:43 -0400 Received: from localhost (localhost [127.0.0.1]) by bedivere.hansenpartnership.com (Postfix) with ESMTP id 09FAC8EE153; Thu, 2 Aug 2018 10:18:40 -0700 (PDT) Received: from bedivere.hansenpartnership.com ([127.0.0.1]) by localhost (bedivere.hansenpartnership.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FL2ooN8j3JCU; Thu, 2 Aug 2018 10:18:39 -0700 (PDT) Received: from [153.66.254.194] (unknown [50.35.68.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by bedivere.hansenpartnership.com (Postfix) with ESMTPSA id 382578EE0E4; Thu, 2 Aug 2018 10:18:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=hansenpartnership.com; s=20151216; t=1533230319; bh=4kafPYzYLFedELgv3fJfVF3KItZROdzqzHkkqY0CogU=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=s1LTwuDbBf/cq65CbsELSrIFuVXHySy/hMZYXitkrdg5bgH5GOkMffbW0jthdMwD6 hkIyTLycFrx5b/xPC/a30fMtvW9N89CyzBP6ZzF3QmCrjjO2GIlrfkQiM5pEeIcvp5 lVb7WSf3s24CO/DaCEieDihwj2+gmMlf4usy6Fpg= Message-ID: <1533230318.12916.2.camel@HansenPartnership.com> Subject: Re: [PATCH] blk-mq: fix blk_mq_tagset_busy_iter From: James Bottomley To: Jens Axboe , Ming Lei Cc: linux-block@vger.kernel.org, Josef Bacik , Christoph Hellwig , Guenter Roeck , Mark Brown , Matt Hart , Johannes Thumshirn , John Garry , Hannes Reinecke , "Martin K. Petersen" , linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org Date: Thu, 02 Aug 2018 10:18:38 -0700 In-Reply-To: References: <20180802164329.11900-1-ming.lei@redhat.com> <1533228846.3915.17.camel@HansenPartnership.com> <20180802170601.GC8928@ming.t460p> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.22.6 Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2018-08-02 at 11:08 -0600, Jens Axboe wrote: > On 8/2/18 11:06 AM, Ming Lei wrote: > > On Thu, Aug 02, 2018 at 09:54:06AM -0700, James Bottomley wrote: > > > On Fri, 2018-08-03 at 00:43 +0800, Ming Lei wrote: > > > > Commit d250bf4e776ff09d5("blk-mq: only iterate over inflight > > > > requests > > > > in blk_mq_tagset_busy_iter") uses 'blk_mq_rq_state(rq) == > > > > MQ_RQ_IN_FLIGHT' to replace 'blk_mq_request_started(req)', this > > > > way is wrong, and causes lots of test system hang during > > > > booting. > > > > > > > > Fix the issue by using blk_mq_request_started(req) inside > > > > bt_tags_iter(). > > > > > > > > Fixes: d250bf4e776ff09d5 ("blk-mq: only iterate over inflight > > > > requests in blk_mq_tagset_busy_iter") > > > > Cc: Josef Bacik > > > > Cc: Christoph Hellwig > > > > Cc: Guenter Roeck > > > > Cc: Mark Brown > > > > Cc: Matt Hart > > > > Cc: Johannes Thumshirn > > > > Cc: John Garry > > > > Cc: Hannes Reinecke , > > > > Cc: "Martin K. Petersen" , > > > > Cc: James Bottomley > > > > Cc: linux-scsi@vger.kernel.org > > > > Cc: linux-kernel@vger.kernel.org > > > > Signed-off-by: Ming Lei > > > > --- > > > >  block/blk-mq-tag.c | 2 +- > > > >  1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c > > > > index 09b2ee6694fb..3de0836163c2 100644 > > > > --- a/block/blk-mq-tag.c > > > > +++ b/block/blk-mq-tag.c > > > > @@ -271,7 +271,7 @@ static bool bt_tags_iter(struct sbitmap > > > > *bitmap, > > > > unsigned int bitnr, void *data) > > > >    * test and set the bit before assining ->rqs[]. > > > >    */ > > > >   rq = tags->rqs[bitnr]; > > > > - if (rq && blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT) > > > > + if (rq && blk_mq_request_started(rq)) > > > > > > So now we have dueling versions of this patch: > > > > > > https://marc.info/?l=linux-scsi&m=153322802207688 > > > > > > Can we at least make sure we've root caused the problem and > > > confirmed we've got it fixed before we start the formal patch > > > process?  When we > > > > EH uses scsi_host_busy to check if the error handler needs to be > > waken up. And blk_mq_tagset_busy_iter() is used for implementing > > scsi_host_busy(), so causes EH not waken up, then this timed-out > > request can't be handled. Yes, I know what the problem is and why this patch is necessary and that it is very likely the root cause. However, can we confirm that it fixes the boot hang completely before we declare victory? > > > do start the formal patch process, please give appropriate credit > > > to the reporter(s) since this has been a royal pain for them to > > > help us track down. > > > > Sure. > > > > Jens, could you add reported-by if you are fine with this version? > > Or please just let me know if new version is needed, then I can add > > it. > > I'll add that, would also love a tested-by from the reporter. The > patch looks good to me, however. Is there a reason why blk_mq_request_started() isn't a static inline? It looks to be somewhat in the hot path. James