Received: by 2002:a25:7ec1:0:0:0:0:0 with SMTP id z184csp2631014ybc; Mon, 18 Nov 2019 02:09:26 -0800 (PST) X-Google-Smtp-Source: APXvYqwVaqCd+uRGP6wyd0tJxDxR3egT9SoUlmpIi6GBIa0VE/5ESHsz2etgGyZDzXJnJXiPeFSX X-Received: by 2002:a5d:6a83:: with SMTP id s3mr28090185wru.159.1574071766371; Mon, 18 Nov 2019 02:09:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1574071766; cv=none; d=google.com; s=arc-20160816; b=T9nvl32ab0lGmE05bZX7daH1VYPn8nsVRV3zx6PLlrXGwPQykdgrLjG5Idkk8I/IqT tD7bLPwyH+Xw7BOUJ6sbld8EKuPvcUsZVu6JCJVv2IQJRIy3vqNV6JIyQCOEfZ3cVNp7 r1hRWo/tlfeV9mLUDVtrcRO0PcdGuCbawiLHtQTSwXw5+MJJidOzb7llUEiVq5COHXm4 fdloKPYTRQ7SJW+glYGFfSvhG0pX67PgZTMDbh2uN9cjlosU95TZnwfdbyoyZFx8aPTZ BIRZp5vt5a4E4ZQyubgXM+pECdexd7KkvRuWbVCCpylNJz+Ks0vSQ8A4OnKpN0LgZFXI 1dVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=Hvo5Jw2ZPltQxjs8EbPFo5GfrTv0JS2zLabE1PRBYSA=; b=CZhzd4ypm280UCx6XOzAb/scZ+n3lY4V7GKQ9YMzRUpvGrvE4sjrI4rld9uR0b4o/c IfE1hTN/7nKsUH8BY++uB5dr2J4WMnCuSVeggeJif6S8pMSmdjJ2/w4xDsosuO8zlKz2 r2aql2kAy8tKdPBnxnaVzjxfbSKq0GoSqgibT6Or0sIrEHZmDJ6DqPM8n1+nexq3bQD5 82Ix8WIq3HFfTwaKrCM7DQ2ZqEZ7mx6q/4wKF30DYrjbDSi6I+pxWTPVxel1kRYrJp00 E1RlLybWCcWg7lJdq90icQ+QAZ8X+ycFYuBFuI72zTre9civdgcua5E90h5jJX4XAH79 S+ig== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=oq8tqJYk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w1si13300541eda.433.2019.11.18.02.09.02; Mon, 18 Nov 2019 02:09:26 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=oq8tqJYk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726775AbfKRKEo (ORCPT + 99 others); Mon, 18 Nov 2019 05:04:44 -0500 Received: from mail-lj1-f195.google.com ([209.85.208.195]:36689 "EHLO mail-lj1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726759AbfKRKEn (ORCPT ); Mon, 18 Nov 2019 05:04:43 -0500 Received: by mail-lj1-f195.google.com with SMTP id k15so18185947lja.3 for ; Mon, 18 Nov 2019 02:04:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Hvo5Jw2ZPltQxjs8EbPFo5GfrTv0JS2zLabE1PRBYSA=; b=oq8tqJYkfKe496OYK6xiqh34OBWXU0P4avcI3PB3Q3yP2DcUkJJSknYUHlJH1Kn+7k WXmL59zHOEQjEnnEmOZgh1QANDUtRlmMMgVV8zGtA9GvJWIDsslUgDbPxDPLr0c1wouR 5/0PDzIGGOigv4KRLUZyiDzQc3uAYmKUrioGNkeUaJAdt9hKqqbpPGOVGQQFKlPR9GHV ItN5qxhnylTOqQbVme4r8bTkSjo5SL8tZMKKZY6kB9uaGu3heWvQznD6QlBLIVQ3268s 4XjB/k8uphuXBrNS0hJFcXp2ka8A3Q1NNS9X5HIrnZvbESVc04fwVqw+DIQBU2/ELCWF Hm3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Hvo5Jw2ZPltQxjs8EbPFo5GfrTv0JS2zLabE1PRBYSA=; b=MHruJR0QAAdq8PifBsuk38MXbw1qiVMUtU0o8V6vlVT6ygvlXpvMpCP9dxZKsZMuXy uCQVz62jm3XilmWQscnR9Ymq50olmcobtTYrArwissa3Dvd3AlsWT3O4R0OQwfxMkXHX irBwI+GR9vBuy7fi5GMLE53A11412XZzIYpW8YGne2PFRnQch0rHif6XremZjc/hCGW4 iCY28Ed02xeExyDXBmOtXe8ropHs6KDRsvcUbh2ApPmHFi64ZWKseqhTxgsSW5ALdR77 pOiJ9f9Hbak/eN/dVZIOlWd1fS4tuNt6eMrzNxpHPa/Mr7faLuuBmWivIVvrRWtmg1v6 hsTw== X-Gm-Message-State: APjAAAV4OvjQWBtcktLYgO9bIBKlDIcxnO+Xe/gGoWLUpQiYmMFLp0fW BIh9UZabUyf/AICK8bSVkeLFmqCtYTyPuszQmnWKjw== X-Received: by 2002:a2e:9104:: with SMTP id m4mr6254727ljg.63.1574071479813; Mon, 18 Nov 2019 02:04:39 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: "(Exiting) Baolin Wang" Date: Mon, 18 Nov 2019 18:04:28 +0800 Message-ID: Subject: Re: [PATCH v6 0/4] Add MMC software queue support To: Baolin Wang Cc: Arnd Bergmann , Adrian Hunter , Ulf Hansson , asutoshd@codeaurora.org, Orson Zhai , Lyra Zhang , Linus Walleij , Vincent Guittot , linux-mmc , "linux-kernel@vger.kernel.org" , Hannes Reinecke , linux-block Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Arnd, On Tue, 12 Nov 2019 at 16:48, Baolin Wang wrote: > > On Tue, Nov 12, 2019 at 12:59 AM Arnd Bergmann wrote: > > > > On Mon, Nov 11, 2019 at 1:58 PM Baolin Wang wrote: > > > On Mon, 11 Nov 2019 at 17:28, Arnd Bergmann wrote: > > > > On Mon, Nov 11, 2019 at 8:35 AM Baolin Wang wrote: > > > > - Removing all the context switches and workqueues from the data submission > > > > path is also the right idea. As you found, there is still a workqueue inside > > > > of blk_mq that is used because it may get called from atomic context but > > > > the submission may get blocked in __mmc_claim_host(). This really > > > > needs to be changed as well, but not in the way I originally suggested: > > > > As Hannes suggested, the host interrrupt handler should always use > > > > request_threaded_irq() to have its own process context, and then pass a > > > > flag to blk_mq to say that we never need another workqueue there. > > > > > > So you mean we should complete the request in the host driver irq > > > thread context, then issue another request in this context by calling > > > blk_mq_run_hw_queues()? > > > > Yes. I assumed there was already code that would always run > > blk_mq_run_hw_queue() at I/O completion, but I can't find where > > that happens today. > > OK. Now we will complete a request in block softirq, which means the > irq thread of host driver should call blk_mq_complete_request() to > complete this request (triggering the block softirq) and call > blk_mq_run_hw_queues() to dispatch another request in this context. > > > > > As I understand, the main difference to today is that > > __blk_mq_delay_run_hw_queue() can call into __blk_mq_run_hw_queue > > directly rather than using the delayed work queue once we > > can skip the BLK_MQ_F_BLOCKING check. > > Right. Need to improve this as you suggested. > > > > > > > - With that change in place calling a blocking __mmc_claim_host() is > > > > still a problem, so there should still be a nonblocking mmc_try_claim_host() > > > > for the submission path, leading to a BLK_STS_DEV_RESOURCE (?) > > > > return code from mmc_mq_queue_rq(). Basically mmc_mq_queue_rq() > > > > should always return right away, either after having queued the next I/O > > > > or with an error, but not waiting for the device in any way. > > > > > > Actually not only the mmc_claim_host() will block the MMC request > > > processing, in this routine, the mmc_blk_part_switch() and > > > mmc_retune() can also block the request processing. Moreover the part > > > switching and tuning should be sync operations, and we can not move > > > them to a work or a thread. > > > > Ok, I see. > > > > Those would also cause requests to be sent to the device or the host > > controller, right? Maybe we can treat them as "a non-IO request > > Right. > > > has successfully been queued to the device" events, returning > > busy from the mmc_mq_queue_rq() function and then running > > the queue again when they complete? > > Yes, seems reasonable to me. > > > > > > > - For the packed requests, there is apparently a very simple way to implement > > > > that without a software queue: mmc_mq_queue_rq() is allowed to look at > > > > and dequeue all requests that are currently part of the request_queue, > > > > so it should take out as many as it wants to submit at once and send > > > > them all down to the driver together, avoiding the need for any further > > > > round-trips to blk_mq or maintaining a queue in mmc. > > > > > > You mean we can dispatch a request directly from > > > elevator->type->ops.dispatch_request()? but we still need some helper > > > functions to check if these requests can be packed (the package > > > condition), and need to invent new APIs to start a packed request (or > > > using cqe interfaces, which means we still need to implement some cqe > > > callbacks). > > > > I don't know how the dispatch_request() function fits in there, > > what Hannes told me is that in ->queue_rq() you can always > > look at the following requests that are already queued up > > and take the next ones off the list. Looking at bd->last > > tells you if there are additional requests. If there are, you can > > look at the next one from blk_mq_hw_ctx (not sure how, but > > should not be hard to find) > > > > I also see that there is a commit_rqs() callback that may > > go along with queue_rq(), implementing that one could make > > this easier as well. > > Yes, we can use queue_rq()/commit_rqs() and bd->last (now bd->last may > can not work well, see [1]), but like we talked before, for packed > request, we still need some new interfaces (for example, a interface > used to start a packed request, and a interface used to complete a > packed request), but at last we got a consensus that we should re-use > the CQE interfaces instead of new invention. > > [1] https://lore.kernel.org/patchwork/patch/1102897/ > > > > > > > - The DMA management (bounce buffer, map, unmap) that is currently > > > > done in mmc_blk_mq_issue_rq() should ideally be done in the > > > > init_request()/exit_request() (?) callbacks from mmc_mq_ops so this > > > > can be done asynchronously, out of the critical timing path for the > > > > submission. With this, there won't be any need for a software queue. > > > > > > This is not true, now the blk-mq will allocate some static request > > > objects (usually the static requests number should be the same with > > > the hardware queue depth) saved in struct blk_mq_tags. So the > > > init_request() is used to initialize the static requests when > > > allocating them, and call exit_request to free the static requests > > > when freeing the 'struct blk_mq_tags', such as the queue is dead. So > > > we can not move the DMA management into the init_request/exit_request. > > > > Ok, I must have misremembered which callback that is then, but I guess > > there is some other place to do it. > > I checked the 'struct blk_mq_ops', and I did not find a ops can be > used to do DMA management. And I also checked UFS driver, it also did > the DMA mapping in the queue_rq() (scsi_queue_rq() ---> > ufshcd_queuecommand() ---> ufshcd_map_sg()). Maybe I missed something? > > Moreover like I said above, for the packed request, we still need > implement something (like the software queue) based on the CQE > interfaces to help to handle packed requests. After some investigation and offline discussion with you, I still have some concerns about your suggestion. 1) Now blk-mq have not supplied some ops to prepare a request, which is used to do some DMA management asynchronously. But yes, we can introduce new ops for blk-mq. But there are still some remaining preparation in mmc_mq_queue_rq(), like mmc part switch. For software queue, we can prepare a request totally after issuing one. 2) I wonder if it is appropriate that using the irq threaded context to dispatch next request, actually we will still introduce a context switch here. Now we will complete a request in the hard irq handler and kick the softirq to do time-consuming operations, like DMA unmapping , and will start next request in the hard irq handler without context switch. Moreover if we remove the BLK_MQ_F_BLOCKING in future like you suggested, then we can remove all context switch. And I think we can dispatch next request in the softirq context (actually the CQE already did). 3) For packed request support, I did not see an example that block driver can dispatch a request from the IO scheduler in queue_rq() and no APIs supported from blk-mq. And we do not know where can dispatch a request in queue_rq(), from IO scheduler? from ctx? or from hctx->dispatch list? and if this request can not be passed to host now, how to do it? Seems lots of complicated things. Moreover, we still need some interfaces for the packed request handling, from previous discussion, we still need something like MMC software queue based on the CQE to help to handle the packed request. So I think I still need to introduce the MMC software queue, on the one hand is that it can really improve the performance from fio data and avoid a long latency, on the other hand we can expand it to support packed request easily in future. Thanks. (Anyway I will still post the V7 to address Adrian's comments and to see if we can get a consensus there). -- Baolin Wang Best Regards