Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965273AbcJ1QGZ (ORCPT ); Fri, 28 Oct 2016 12:06:25 -0400 Received: from mout.kundenserver.de ([212.227.126.134]:50721 "EHLO mout.kundenserver.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752947AbcJ1QGW (ORCPT ); Fri, 28 Oct 2016 12:06:22 -0400 From: Arnd Bergmann To: Jens Axboe Cc: Linus Walleij , Ulf Hansson , Paolo Valente , Christoph Hellwig , Bart Van Assche , Jan Kara , Tejun Heo , linux-block@vger.kernel.org, Linux-Kernal , Mark Brown , Hannes Reinecke , Grant Likely , James Bottomley , Bartlomiej Zolnierkiewicz Subject: Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler Date: Fri, 28 Oct 2016 18:05:35 +0200 Message-ID: <2423164.8ioLhGisxr@wuerfel> User-Agent: KMail/5.1.3 (Linux/4.4.0-34-generic; KDE/5.18.0; x86_64; ; ) In-Reply-To: <3d0b38bb-537d-94ff-574f-587bad949fdd@kernel.dk> References: <1477474082-2846-1-git-send-email-paolo.valente@linaro.org> <3d0b38bb-537d-94ff-574f-587bad949fdd@kernel.dk> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Provags-ID: V03:K0:fkNoaImAXtekk0o3A9YK7SOXuBoDqIdyWtb9vJhbmULl0goKiKL X9YaAh9Kt5ERP4hrpEzvHCnHX2CdH+gl2Bfzm6FqOXWgIiBwjI6pWcOWDUkfz29QZtxkUmf Z92zrZnb+ozVfIHmBreMA9jRxTWyi4N3M6U0pSZ3Z5/bOjvZYOGfoLyDQvN3sVBI4vXm3hU eJfDDltleVgW+eJuelddQ== X-UI-Out-Filterresults: notjunk:1;V01:K0:SQvIEtek7G8=:9JaWhmvos2pb8T2O0wvs3W rHezrGHWD6rM1ptI0niMcR9wCq01pKoT6sGxNMfoZkdTBoDOJVSXZ2Gksp+7761w6OM0bff0N LNIbg3EPb/zKqA9X6cnY4qsViM8uytR7GW1OGi+xORC7J8TRDfiSrKyJyk5JeKJjJv3bUjaXh BIgDtnDcBvG9MOY6OHeVnRFVGSgLrhKnagfC3+DTpmyeMs2/SvDDzhTy7na5Ua1lKUZUyp4C8 f1K0GlG4rhhZ5/G/ZCCxY7JvapyJJjC27Q0DPmrWl7O8Ne8jLg1Gg1ge0dUglA7vkqKDeMI6q mi6kz/slabkcYPdgy3mR3ziWIBpQUAiA1ntCZHU/As2jcqQHR1SHo4Jl7rEgh7CpOp8OUIRWW pToZzoirT3Hj5lQCPlHSHZinK4o1sCHV9J8/VGTESroPdQBkn5TqloZ68aMU7hhbiVdL/Ulc2 HeYIXuKt6k1EbqrEEc/JJ1yepoBX3IoCw6Cp4Spro+PW5j/dZN45XNM5njcMhqPESxCYXZQde CteDAYNUoqANq+CwQ5dOrFOtB3LC82vAnVedN/lgj5l6Gngnm7kpxBH7zUU0r9M9fFNuH4E77 fZEqXaGwINIqsXEls9sUTmoJIQ4n2zTc675QZVhvRVk1j8iAL4C4NwZE/Ckfc6jRRvj62MkJb WM301dXosG8BjVKobi5n2mJtTSGZJZM9V0biES4+JRglKdo1j+UcfnDuDiKcGINizfbHDJKbc h3EqVBFFZNOFNS6j Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2171 Lines: 47 On Friday, October 28, 2016 9:30:07 AM CEST Jens Axboe wrote: > On 10/28/2016 03:32 AM, Linus Walleij wrote: > > The patch to enable MQ looks like this: > > https://git.kernel.org/cgit/linux/kernel/git/linusw/linux-stericsson.git/commit/?h=mmc-mq&id=8f79b527e2e854071d8da019451da68d4753f71d > > BTW, another viable "hack" for the depth issue would be to expose more > than one hardware queue. It's meant to map to a distinct submission > region in the hardware, but there's nothing stopping the driver from > using it differently. Might not be cleaner than just increasing the > queue depth on a single queue, though. > > That still won't solve the issue of lying about it and causing IO > scheduler confusion, of course. > > Also, 4.8 and newer have support for BLK_MQ_F_BLOCKING, if you need to > block in ->queue_rq(). That could eliminate the need to offload to a > kthread manually. I think the main reason for the kthread is that on ARM and other architectures, the dma mapping operations are fairly slow (for cache flushes or bounce buffering) and we want to minimize the time between subsequent requests being handled by the hardware. This is not unique to MMC in any way, MMC just happens to be common on ARM and it is limited by its lack of hardware command queuing. It would be nice to do a similar trick for SCSI disks, especially USB mass storage, maybe also SATA, which are the next most common storage devices on non-coherent ARM systems (SATA nowadays often comes with NCQ, so it's less of an issue) It may be reasonable to tie this in with the I/O scheduler: if you don't have a scheduler, the access to the device is probably rather direct and you want to avoid any complexity in the kernel, but if preparing a request is expensive and the hardware has no queuing, you probably also want to use a scheduler. We should probably also try to understand how this could work out with USB mass storage, if there is a solution at all, and then do it for MMC in a way that would work on both. I don't think the USB core can currently split the dma_map_sg() operation from the USB command submission, so this may require some deeper surgery there. Arnd