Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1946238AbbDXVqM (ORCPT ); Fri, 24 Apr 2015 17:46:12 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58320 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1946216AbbDXVqH (ORCPT ); Fri, 24 Apr 2015 17:46:07 -0400 Message-ID: <1429911962.26534.13.camel@fedoraproject.org> Subject: Re: loop block-mq conversion scalability issues From: "Justin M. Forbes" To: Ming Lei Cc: linux-kernel , tom.leiming@gmail.com Date: Fri, 24 Apr 2015 16:46:02 -0500 In-Reply-To: <20150424105908.47de489c@tom-T450> References: <1429823050.26534.9.camel@redhat.com> <20150424105908.47de489c@tom-T450> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2720 Lines: 64 On Fri, 2015-04-24 at 10:59 +0800, Ming Lei wrote: > Hi Justin, > > Thanks for the report. > > On Thu, 23 Apr 2015 16:04:10 -0500 > "Justin M. Forbes" wrote: > > > The block-mq conversion for loop in 4.0 kernels is showing us an > > interesting scalability problem with live CDs (ro, squashfs). It was > > noticed when testing the Fedora beta that the more CPUs a liveCD image > > was given, the slower it would boot. A 4 core qemu instance or bare > > metal instance took more than twice as long to boot compared to a single > > CPU instance. After investigating, this came directly to the block-mq > > conversion, reverting these 4 patches will return performance. More > > details are available at > > https://bugzilla.redhat.com/show_bug.cgi?id=1210857 > > I don't think that reverting the patches is the ideal solution so I am > > looking for other options. Since you know this code a bit better than I > > do I thought I would run it by you while I am looking as well. > > I can understand the issue because the default @max_active for > alloc_workqueue() is quite big(512), which may cause too much > context switchs, then loop I/O performance gets decreased. > > Actually I have written the kernel dio/aio based patch for decreasing > both CPU and memory utilization without sacrificing I/O performance, > and I will try to improve and push the patch during this cycle and hope > it can be merged(kernel/aio.c change is dropped, and only fs change is > needed on fs/direct-io.c). > > But the following change should help for your case, could you test it? > > --- > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > index c6b3726..b1cb41d 100644 > --- a/drivers/block/loop.c > +++ b/drivers/block/loop.c > @@ -1831,7 +1831,7 @@ static int __init loop_init(void) > } > > loop_wq = alloc_workqueue("kloopd", > - WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND, 0); > + WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND, 32); > if (!loop_wq) { > err = -ENOMEM; > goto misc_out; > Patch tested, it made things work (I gave up after 5 minutes and boot still seemed hung). I also tried values of 1, 16, 64, and 128). Everything below 128 was much worse than the current situation. Setting it at 128 seemed about the same as booting without the patch. I can do some more testing over the weekend, but I don't think this is the correct solution. I would be interested in testing your dio/aio patches as well though. Thanks, Justin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/