Received: by 10.223.176.46 with SMTP id f43csp1219896wra; Fri, 19 Jan 2018 08:29:00 -0800 (PST) X-Google-Smtp-Source: ACJfBovZzZuc321/eEGxbdZETYKsDcjEfmxpdJIdf4v/xh8bkxdMO3hu6qVtyO6boTaj6r1vh3Zz X-Received: by 10.98.245.214 with SMTP id b83mr37511782pfm.85.1516379340584; Fri, 19 Jan 2018 08:29:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516379340; cv=none; d=google.com; s=arc-20160816; b=zRHUqQqle5KQIHKMDHbMEBd/pS0YgmW3xSg8vl09x0gwTUsQd3pW/cy9J79c4cb4YZ Zm7uP3QM4HYcORDleUoUfpNM1igbEhI7UZPBVpkUjhZtxdUlQcS+2S9bpkiBWHOdTExG h5AjQReFqDvGUgCq5V9sbpF1Zn7Px5u5W8VzeI9z8jB9/ujXBciVrIpgrhPj3kOP0iUE PTlmrOHL/4cYcuYoT7ZdF9BJW+8WhTWDrphqNXUvzbBd9rmB5mAuQqzKCu9lgHqalgKF piaTAfZ8dbnc2c6wd9cRvf2DR9aj/kX3uk7PQmAWyySSoM6uQGNb4CohqKi9u3jLj4n0 R42g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=gps5Tq+IIcgWg8s0gcbZRncaC6WLvod/T1+E10g6SF0=; b=hxJIZaH5hMqAF1yy8ADjScx5bBJVuu+8IRn/ijtY7tQWA6B5I79FwOSZnqRxd/F0hY Vo3JOQCYgUnIVEQ/2adPkZKKtJ9iCY3nDfGK0ubntOc0T9pllkVul8ViShWASTaKnezj EOU0OvyOLW11Nc6MrE7NGvWUsdXO+rJJBRWQ5ISqVQJF5ouYMwkwbQDGIdy0luUm8gcn rbWlLdm1KAhH9vOJR54Rj/gvmRfd/nWwUXaCX/LBNN15uOI3E1BzW8JuMSaSIYz1cpm4 MUz/Cm32wYTZxWovLIEVp9hdGzTPlhW4frT++p87uRzq3QZwS8Hcp6OrKBBcFr0uWlUR Ti7g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=cIvTb1dQ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n4si8604912pgu.65.2018.01.19.08.28.46; Fri, 19 Jan 2018 08:29:00 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=cIvTb1dQ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932266AbeASQ14 (ORCPT + 99 others); Fri, 19 Jan 2018 11:27:56 -0500 Received: from mail-io0-f181.google.com ([209.85.223.181]:36641 "EHLO mail-io0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756102AbeASQ1t (ORCPT ); Fri, 19 Jan 2018 11:27:49 -0500 Received: by mail-io0-f181.google.com with SMTP id l17so2733565ioc.3 for ; Fri, 19 Jan 2018 08:27:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=gps5Tq+IIcgWg8s0gcbZRncaC6WLvod/T1+E10g6SF0=; b=cIvTb1dQKgBGw9lQAReZwrZ0rdSggLMcd5OY9IOSB2qnxcuZI+vRmPgwG9rvt/nlD7 dYmW0LSG+vrOeX9wciLlOZ9gLxkQnumRHI0YuYHmmeBnvCwpoYwQIUpav3AUiQvBDZ0g mPDOrImdSRgjz6ecaQ50u04JJihW1W83XSjxxq8JRJOFv9z3aUuJEJwnHGf0vFuPa9Vc fZzZit4wA0BqfxXt+yfnS+DGsdKg5wvIRlKjDvZPwW+diNJb0hC2fSB7OXXvLXxMeaEv Mdn6jIRTMzla6KTmEsE8OQyGSF6KybrQvSPEvARUQXJdhFEFpmm3vUk4k3TH7TpFaXl1 +dqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=gps5Tq+IIcgWg8s0gcbZRncaC6WLvod/T1+E10g6SF0=; b=Lnjv0shcFBudz9G9IzCzJm55PSsn+fFh7A4gg8C2vky+VariWXdM4PmgOxXlvpk4TV J6I/IKrXWnjQCZTnDHuA4Vmq5hBw5YIiGjLVLKjYUzxNLV4qMXLbcf5XHvQaZcBOBz86 B7HgCh0kVok5iJ5t2iXhVSoynErkPN5HuYnuDy4bflfnWM7gL4PiIVE5miNAXgaJp/ZS wyRXqvw5e65ZkQZYQHP/h3u47uEw8S/mwI9nSw0L0lS4d8j6rgCyUDnMn/R6hAnxCccu Y4E8oLBcEMEO41QFAZ/BLhOWCbsNBf8HC4d82swNK4FpzLHZ7/TsIKs24m1qcAlkrc/i 5cVg== X-Gm-Message-State: AKwxytcs/shAcvIi0DnVVJy87bw/V+x1Scn6ev/RWWQlydhU/e4jz7qK 2yGD6a/Qgx8TAh+EMEuqEkNtzw== X-Received: by 10.107.199.67 with SMTP id x64mr6045527iof.169.1516379268808; Fri, 19 Jan 2018 08:27:48 -0800 (PST) Received: from [192.168.1.160] ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id q82sm994095itb.7.2018.01.19.08.27.46 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 19 Jan 2018 08:27:47 -0800 (PST) Subject: Re: [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle To: Ming Lei Cc: Bart Van Assche , "snitzer@redhat.com" , "dm-devel@redhat.com" , "hch@infradead.org" , "linux-kernel@vger.kernel.org" , "linux-block@vger.kernel.org" , "osandov@fb.com" References: <1516301278.2676.35.camel@wdc.com> <20180119023212.GA25413@ming.t460p> <20180119072623.GB25369@ming.t460p> <047f68ec-f51b-190f-2f89-f413325c2540@kernel.dk> <20180119154047.GB14827@ming.t460p> <540e1239-c415-766b-d4ff-bb0b7f3517a7@kernel.dk> <20180119160518.GC14827@ming.t460p> <4a5c049f-0fab-bbaf-bfe2-eb5bca73f2c8@kernel.dk> <20180119162618.GD14827@ming.t460p> From: Jens Axboe Message-ID: <1f072086-533e-4b75-d0e3-9e621b2120d8@kernel.dk> Date: Fri, 19 Jan 2018 09:27:46 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:58.0) Gecko/20100101 Thunderbird/58.0 MIME-Version: 1.0 In-Reply-To: <20180119162618.GD14827@ming.t460p> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 1/19/18 9:26 AM, Ming Lei wrote: > On Fri, Jan 19, 2018 at 09:19:24AM -0700, Jens Axboe wrote: >> On 1/19/18 9:05 AM, Ming Lei wrote: >>> On Fri, Jan 19, 2018 at 08:48:55AM -0700, Jens Axboe wrote: >>>> On 1/19/18 8:40 AM, Ming Lei wrote: >>>>>>>> Where does the dm STS_RESOURCE error usually come from - what's exact >>>>>>>> resource are we running out of? >>>>>>> >>>>>>> It is from blk_get_request(underlying queue), see >>>>>>> multipath_clone_and_map(). >>>>>> >>>>>> That's what I thought. So for a low queue depth underlying queue, it's >>>>>> quite possible that this situation can happen. Two potential solutions >>>>>> I see: >>>>>> >>>>>> 1) As described earlier in this thread, having a mechanism for being >>>>>> notified when the scarce resource becomes available. It would not >>>>>> be hard to tap into the existing sbitmap wait queue for that. >>>>>> >>>>>> 2) Have dm set BLK_MQ_F_BLOCKING and just sleep on the resource >>>>>> allocation. I haven't read the dm code to know if this is a >>>>>> possibility or not. >>>>>> >>>>>> I'd probably prefer #1. It's a classic case of trying to get the >>>>>> request, and if it fails, add ourselves to the sbitmap tag wait >>>>>> queue head, retry, and bail if that also fails. Connecting the >>>>>> scarce resource and the consumer is the only way to really fix >>>>>> this, without bogus arbitrary delays. >>>>> >>>>> Right, as I have replied to Bart, using mod_delayed_work_on() with >>>>> returning BLK_STS_NO_DEV_RESOURCE(or sort of name) for the scarce >>>>> resource should fix this issue. >>>> >>>> It'll fix the forever stall, but it won't really fix it, as we'll slow >>>> down the dm device by some random amount. >>>> >>>> A simple test case would be to have a null_blk device with a queue depth >>>> of one, and dm on top of that. Start a fio job that runs two jobs: one >>>> that does IO to the underlying device, and one that does IO to the dm >>>> device. If the job on the dm device runs substantially slower than the >>>> one to the underlying device, then the problem isn't really fixed. >>> >>> I remembered that I tried this test on scsi-debug & dm-mpath over scsi-debug, >>> seems not observed this issue, could you explain a bit why IO over dm-mpath >>> may be slower? Because both two IO contexts call same get_request(), and >>> in theory dm-mpath should be a bit quicker since it uses direct issue for >>> underlying queue, without io scheduler involved. >> >> Because if you lose the race for getting the request, you'll have some >> arbitrary delay before trying again, potentially. Compared to the direct > > But the restart still works, one request is completed, then the queue > is return immediately because we use mod_delayed_work_on(0), so looks > no such issue. There are no pending requests for this case, nothing to restart the queue. When you fail that blk_get_request(), you are idle, nothing is pending. -- Jens Axboe