Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752020AbbEDUUh (ORCPT ); Mon, 4 May 2015 16:20:37 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:30018 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751282AbbEDUUV (ORCPT ); Mon, 4 May 2015 16:20:21 -0400 Date: Mon, 4 May 2015 13:20:12 -0700 From: Shaohua Li To: Jens Axboe CC: , Subject: Re: [PATCH] blk-mq: don't lose requests if a stopped queue restarts Message-ID: <20150504202012.GA3491199@devbig257.prn2.facebook.com> References: <430595847fe98f99e1c8ea79b018003d9315f905.1430613052.git.shli@fb.com> <5547C5BF.2030703@fb.com> <20150504195137.GB3365187@devbig257.prn2.facebook.com> <5547CEFA.1000109@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <5547CEFA.1000109@fb.com> User-Agent: Mutt/1.5.20 (2009-12-10) X-Originating-IP: [192.168.52.13] X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.13.68,1.0.33,0.0.0000 definitions=2015-05-04_04:2015-05-04,2015-05-04,1970-01-01 signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4339 Lines: 107 On Mon, May 04, 2015 at 01:56:42PM -0600, Jens Axboe wrote: > On 05/04/2015 01:51 PM, Shaohua Li wrote: > >On Mon, May 04, 2015 at 01:17:19PM -0600, Jens Axboe wrote: > >>On 05/02/2015 06:31 PM, Shaohua Li wrote: > >>>Normally if driver is busy to dispatch a request the logic is like below: > >>>block layer: driver: > >>> __blk_mq_run_hw_queue > >>>a. blk_mq_stop_hw_queue > >>>b. rq add to ctx->dispatch > >>> > >>>later: > >>>1. blk_mq_start_hw_queue > >>>2. __blk_mq_run_hw_queue > >>> > >>>But it's possible step 1-2 runs between a and b. And since rq isn't in > >>>ctx->dispatch yet, step 2 will not run rq. The rq might get lost if > >>>there are no subsequent requests kick in. > >> > >>Good catch! But the patch introduces a potentially never ending loop > >>in __blk_mq_run_hw_queue(). Not sure how we can fully close it, but > >>it might be better to punt the re-run after adding the requests back > >>to the worker. That would turn a potential busy loop (until requests > >>complete) into something with nicer behavior, at least. Ala > >> > >>if (!test_bit(BLK_MQ_S_STOPPED, &hctx->state)) > >> kblockd_schedule_delayed_work_on(blk_mq_hctx_next_cpu(hctx), > >> &hctx->run_work, 0); > > > >My first version of the patch is like this, but I changed my mind later. > >The assumption is driver will stop queue if it's busy to dispatch > >request. If the driver is buggy, we will have the endless loop here. > >Should we assume drivers will not do the right thing? > > There's really no contract that says the driver MUST stop the queue > for busy. It could, legitimately, decide to just always run the > queue when requests complete. > > It might be better to simply force this behavior. If we get a BUSY, > stop the queue from __blk_mq_run_hw_queue(). And if the bit isn't > still set on re-add, then we know we need to re-run it. I think that > would be a cleaner API, less fragile, and harder to get wrong. The > down side is that now this stop happens implicitly by the core, and > the driver must now have an asymmetric queue start when it frees the > limited resource that caused the BUSY return. Either that, or we > define a 2nd set of start/stop bits, one used exclusively by the > driver and one used exclusively by blk-mq. Then blk-mq could restart > the queue on completion of a request, since it would then know that > blk-mq was the one that stopped it. Agree. I'll make the rerun async for now and leave above as a future improvement. >From 3e767da0e9f1044659c605120e09726ffd1aeab0 Mon Sep 17 00:00:00 2001 Message-Id: <3e767da0e9f1044659c605120e09726ffd1aeab0.1430770649.git.shli@fb.com> From: Shaohua Li Date: Fri, 1 May 2015 16:39:39 -0700 Subject: [PATCH] blk-mq: don't lose requests if a stopped queue restarts Normally if driver is busy to dispatch a request the logic is like below: block layer: driver: __blk_mq_run_hw_queue a. blk_mq_stop_hw_queue b. rq add to ctx->dispatch later: 1. blk_mq_start_hw_queue 2. __blk_mq_run_hw_queue But it's possible step 1-2 runs between a and b. And since rq isn't in ctx->dispatch yet, step 2 will not run rq. The rq might get lost if there are no subsequent requests kick in. Signed-off-by: Shaohua Li --- block/blk-mq.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/block/blk-mq.c b/block/blk-mq.c index ade8a2d..e1a5b9e 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -855,6 +855,16 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) spin_lock(&hctx->lock); list_splice(&rq_list, &hctx->dispatch); spin_unlock(&hctx->lock); + /* + * the queue is expected stopped with BLK_MQ_RQ_QUEUE_BUSY, but + * it's possible the queue is stopped and restarted again + * before this. Queue restart will dispatch requests. And since + * requests in rq_list aren't added into hctx->dispatch yet, + * the requests in rq_list might get lost. + * + * blk_mq_run_hw_queue() already checks the STOPPED bit + **/ + blk_mq_run_hw_queue(hctx, true); } } -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/