Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932111Ab1BRTzF (ORCPT ); Fri, 18 Feb 2011 14:55:05 -0500 Received: from mx1.redhat.com ([209.132.183.28]:12548 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753551Ab1BRTzC (ORCPT ); Fri, 18 Feb 2011 14:55:02 -0500 Date: Fri, 18 Feb 2011 14:54:55 -0500 From: Vivek Goyal To: Chad Talbott Cc: jaxboe@fusionio.com, guijianfeng@cn.fujitsu.com, mrubin@google.com, teravest@google.com, jmoyer@redhat.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH] Avoid preferential treatment of groups that aren't backlogged Message-ID: <20110218195454.GI26654@redhat.com> References: <20110210013211.21573.69260.stgit@neat.mtv.corp.google.com> <20110210020946.GA27040@redhat.com> <20110210035738.GC27040@redhat.com> <20110211181533.GG8773@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20110211181533.GG8773@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3441 Lines: 71 On Fri, Feb 11, 2011 at 01:15:33PM -0500, Vivek Goyal wrote: > On Thu, Feb 10, 2011 at 04:36:25PM -0800, Chad Talbott wrote: > > On Thu, Feb 10, 2011 at 10:57 AM, Chad Talbott wrote: > > > On Wed, Feb 9, 2011 at 7:57 PM, Vivek Goyal wrote: > > >> If you ran different random readers in different groups of differnet > > >> weight with group_isolation=1, then there is a case of having service > > >> differentiation. In that case we will idle for 8ms on each group before > > >> we expire the group. So in these test cases are low weight groups not > > >> submitting IO with-in 8ms? Putting a random reader in separate group > > >> with think time > 8, I think is going to hurt a lot because for every > > >> single IO dispatched group is going to weight for 8ms before it is > > >> expired. > > > > > > You're right about the behavior of group_idle. ?We have more > > > experience with earlier kernels (before group_idle). ?With this patch > > > we are able to achieve isolation without group_idle even with these > > > large ratios. ?(Without group_idle the random reader workloads will > > > get marked seeky, and idling is disabled. ?Without group_idle, we have > > > to remember the vdisktime to get isolation.) > > > > > >> Can you run blktrace and verify what's happenig? > > > > > > I can run a blktrace, and I think it will show what you expect. > > > > So, I ran the following two tests and took a blktrace. > > > > 950 rdrand, 50 rdrand.delay10 > > weight 950 random reader with low think time vs weight 50 random > > reader with 10ms think time > > > > 950 rdrand, 50 rdrand.delay50 # 50ms think time > > weight 950 random reader with low think time vs weight 50 random > > reader with 50ms think time > > > > I find that we are still idling for these random readers, even the one > > with 50ms think time. group_idle is 0 according to blktrace. > > > > With this patch, both of these cases have correct isolation. Without > > this patch, the small weight reader is able to get more than its > > share. > > > > I think that idling for a random reader with a 50ms think time is > > likely a bug, but a separate issue. > > Thanks for checking this out. I agree that for a low weight random > reader/writer which high think time, we need to remember the vdisktime > otherwise it will showup as a fresh new candidate and get more done. > > Having said that, one can say that random reader/writer doing small > amount of IO should be able to get job done really fast and the one > who are hogging the disk for long time, should get higher vdisktime. > > So with this scheme, a random reader/writer shall have to be of higher > weight to get the job done fast. A low weight reader/writer will still > get higher vdisktime and get lesser share. I think it is reasonable. > > And yes, even with group_idle=0 if we are idling on a 50ms thinktime > random reader it sounds like a bug. Thinking more about it, I think it must be happening because of the fact that random IO goes on sync-noidle tree of group and there we idle on whole tree. I think if you set slice_idle=0 along with group_idle=0, that idling should go away. Thanks Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/