Received: by 10.223.176.46 with SMTP id f43csp42224wra; Thu, 18 Jan 2018 13:39:52 -0800 (PST) X-Google-Smtp-Source: ACJfBov1ki4E50sL+zN5gMyRZninYOavCcP01iy0rOLVWGZHZ4dZHom1JdO8nrA2/ixMFaanC4h6 X-Received: by 10.98.152.149 with SMTP id d21mr37059914pfk.108.1516311592252; Thu, 18 Jan 2018 13:39:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516311592; cv=none; d=google.com; s=arc-20160816; b=h5j062AspGTCwjjkr0j8+/JpA0J1AcfIEPUDcqzfXDLdGRj2g/a9PCRtxGniPxUovU ZZx6ogLtH64gMbhf9UT7J8QWd5Nyw7w5g+VO3NqTUXTpakAa17msWGS2MPIoZOMw5EUM PQ421mRaZ2j3nrShm98OvtP6Iox6PiO0tybWwyDa/sCrl+aJSYJufFnj+7Rw6Mi8uwBq gVetNH7lnH4gbwWoZrmTTLrPmkTxTv1dI7qLHJXAGggczo6yaglTYLbwTSa/aAvV175w qAElJCFQhNv08i+vyWvma42SqwbQ35ud6Sad2RjmExFmD4q14DNNro8C551T9qQNtHBD /ETg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:date:cc:to:from:subject:message-id :arc-authentication-results; bh=+o8kPxJfbHznmvwMCUkDmspJrdEeUfgkyun45T7Rg3s=; b=IX2GHkOWCwcbUEiIgdvDptCGc81eR+h73tyKJHDw1Hs6uyWTDglMoLVtsg8IHWpOz2 VaYO1TfyDDkfsQTrq7vD3uvjqlFVZfPLxwaAFyOihpA7l1KvCKD+5QYJqpjm2NDghvfX Em7nFek0QAXjdEtSrrjMZAG9SrqeqjEOqKjienENCidFovo0UfvBzI2wna7XP4jon/jn OJ0QpQZ2Ayh0u8fezH6iUiE5/mfb7nNeQAPXaGpUByTUDe6A46LOgbrtGHyNda+of2DJ EYlGSd2syKnjcTd9dPRS/brQ2DYsu0Pu2oIv0xCvmQeLW3yBHd5hUAeqU7Spr4YF/qex sGyA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o30-v6si219535pli.689.2018.01.18.13.39.37; Thu, 18 Jan 2018 13:39:52 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754815AbeARVhz (ORCPT + 99 others); Thu, 18 Jan 2018 16:37:55 -0500 Received: from mail-qt0-f171.google.com ([209.85.216.171]:34748 "EHLO mail-qt0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753285AbeARVhr (ORCPT ); Thu, 18 Jan 2018 16:37:47 -0500 Received: by mail-qt0-f171.google.com with SMTP id 33so33993394qtv.1 for ; Thu, 18 Jan 2018 13:37:47 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:mime-version:content-transfer-encoding; bh=+o8kPxJfbHznmvwMCUkDmspJrdEeUfgkyun45T7Rg3s=; b=MCJOxXVsikVHRhRNlEa0A4Qte8owAhAgKysuHed+Vx/WEHV1078oI4LrceAbwB463e aSS/PHp6h+8+3aj8bYqhf8W0vChJwDPiqEv4jERlkzR6oXS7OmaNH6LlCen3btUUBM0w WjJwp1AoEOgONeSyMT8tiMKLALmXWyr0MVlKTGRCRNAqdWk3+2tk+yJjHX1i7lkV5yXo qG6UQ5ynryEwj4nFMwN6HoerOJaZpFUs8RuFYaoQG4tD0puDVgg+x52HZ6ukS3JS50uR HyMk/9g6WS9cYKraorSG3zi5kUhAQQq8/6Vmj2lneq4+gNy/qQZ4saJfzHv+QvvI8lXY yj7Q== X-Gm-Message-State: AKwxytcbn+6IMdsU8PpwJ3pSmYMZ1xfHTQiP8vpsSncGL5ek/zuMHKzd FgcD8wDV/Wc6W50L0Ndi8WTGiQ== X-Received: by 10.237.33.65 with SMTP id 59mr26179962qtc.34.1516311467022; Thu, 18 Jan 2018 13:37:47 -0800 (PST) Received: from loberhel74 (71-88-101-235.dhcp.oxfr.ma.charter.com. [71.88.101.235]) by smtp.gmail.com with ESMTPSA id d42sm1367310qta.87.2018.01.18.13.37.45 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 18 Jan 2018 13:37:46 -0800 (PST) Message-ID: <1516311465.24506.2.camel@redhat.com> Subject: Re: [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle From: Laurence Oberman To: Mike Snitzer , Bart Van Assche Cc: "axboe@kernel.dk" , "dm-devel@redhat.com" , "hch@infradead.org" , "linux-kernel@vger.kernel.org" , "linux-block@vger.kernel.org" , "osandov@fb.com" , "ming.lei@redhat.com" Date: Thu, 18 Jan 2018 16:37:45 -0500 In-Reply-To: <20180118212327.GB31679@redhat.com> References: <20180118024124.8079-1-ming.lei@redhat.com> <20180118170353.GB19734@redhat.com> <1516296056.2676.23.camel@wdc.com> <20180118183039.GA20121@redhat.com> <1516301278.2676.35.camel@wdc.com> <20180118204856.GA31679@redhat.com> <1516309128.2676.38.camel@wdc.com> <20180118212327.GB31679@redhat.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.22.6 (3.22.6-10.el7) Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2018-01-18 at 16:23 -0500, Mike Snitzer wrote: > On Thu, Jan 18 2018 at  3:58P -0500, > Bart Van Assche wrote: > > > On Thu, 2018-01-18 at 15:48 -0500, Mike Snitzer wrote: > > > For Bart's test the underlying scsi-mq driver is what is > > > regularly > > > hitting this case in __blk_mq_try_issue_directly(): > > > > > >         if (blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)) > > > > Hello Mike, > > > > That code path is not the code path that triggered the lockups that > > I reported > > during the past days. > > If you're hitting blk_mq_sched_insert_request() then you most > certainly > are hitting that code path. > > If you aren't then what was your earlier email going on about? > https://www.redhat.com/archives/dm-devel/2018-January/msg00372.html > > If you were just focusing on that as one possible reason, that isn't > very helpful.  By this point you really should _know_ what is > triggering > the stall based on the code paths taken.  Please use ftrace's > function_graph tracer if need be. > > > These lockups were all triggered by incorrect handling of > > .queue_rq() returning BLK_STS_RESOURCE. > > Please be precise, dm_mq_queue_rq()'s return of BLK_STS_RESOURCE? > "Incorrect" because it no longer runs blk_mq_delay_run_hw_queue()? > > Please try to do more work analyzing the test case that only you can > easily run (due to srp_test being a PITA).  And less time lobbying > for > a change that you don't understand to _really_ be correct. > > We have time to get this right, please stop hyperventilating about > "regressions". > > Thanks, > Mike Hello Bart I have run a good few loops of 02-mq and its stable for me on your tree. I am not running the entire disconnect re-connect loops and un-mounts etc. for good reason. I have 35 LUNS so its very impact-full to lose them and have them come back all the time. Anyway I am very happy to try reproduce this in-house so Mike and Ming can focus on it but I need to know if all I need to do is loop over 02-mq over and over. Also please let me know whats debugfs and sysfs to capture and I am happy to try help move this along. Regards Laurence