Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965676AbbEEQbU (ORCPT ); Tue, 5 May 2015 12:31:20 -0400 Received: from mail-qc0-f176.google.com ([209.85.216.176]:34014 "EHLO mail-qc0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964948AbbEEQbQ (ORCPT ); Tue, 5 May 2015 12:31:16 -0400 Date: Tue, 5 May 2015 12:31:12 -0400 From: Tejun Heo To: Peter Zijlstra Cc: Zefan Li , Mike Galbraith , Ingo Molnar , LKML , Cgroups Subject: Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach() Message-ID: <20150505163112.GU1971@htj.duckdns.org> References: <1430709236.3129.42.camel@gmail.com> <5546F80B.3070802@huawei.com> <1430716247.3129.44.camel@gmail.com> <1430717964.3129.62.camel@gmail.com> <554737AE.5040402@huawei.com> <20150504123738.GZ21418@twins.programming.kicks-ass.net> <55483EF7.7070905@huawei.com> <20150505141049.GN21418@twins.programming.kicks-ass.net> <20150505141838.GR1971@htj.duckdns.org> <20150505151949.GQ21418@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150505151949.GQ21418@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3022 Lines: 71 Hello, Peter. On Tue, May 05, 2015 at 05:19:49PM +0200, Peter Zijlstra wrote: > > I don't think we can kludge this. For all other resources, we're > > defining the limits that can't be crossed so nesting them w/ -1 by > > default is fine. RR slices are different it that we're really slicing > > up and guaranteeing a portion of something finite, so unlimited by > > default thing doesn't really work here. > > Note that you _could_ do the same thing with IO bandwidth; esp. with > these modern no-seek-penalty devices this could make sense. Yeah, maybe. It currently is too unpredictable to do that (at least from OS side w/ all the layering) but that is a possibility. > > The problem is that this is tied to the normal cpu controller. Users > > who don't have any intention of mucking with RT scheduling end up > > being dragged into it. Given the strict nature of RR slicing, I'm > > don't even think it's actually useful to make the slicing > > hierarchical. From cgroup's POV, it'd be best if RR slicing can be > > detached. > > Like in the other mail; hierarchy still makes perfect sense for the > container case. We'd still need an on-demand arbitration mechanism across containers no matter what we do which might as well take care of everything. But please see below. > > > The whole RR/FIFO thing is so enormously broken (by definition; this > > > truly is unfixable) that you simply _cannot_ automate it. > > > > Yeah, exactly. > > I don't think you're quite agreeing to the same reasons I am. My main > objection to the whole SCHED_RR/FIFO thing as defined by POSIX is that > it does not in fact allow the OS to do what an OS _should_ do, namely > resource arbitration and control. > > The whole rt-cgroup controller tries to somewhat contain that, but > fundamentally once you use RR/FIFO you've given up your system to > userspace control -- which btw is why its usually limited to root. > > SCHED_DEADLINE avoids all these problems, at the cost of a more complex > setup. > > But the fact that both need fixed portions of a limited total does not > in fact mean they're broken. But that does make them pretty different from others. What bothers me the most about RR slices right now is that it's tightly coupled with the rest of cpu controller while having a very different set of characteristics. Maybe this is something mandated by the underlying structure and we have to live with it but it definitely isn't an ideal situation. What I don't want to happen is controllers failing migrations willy-nilly for random reasons leaving users baffled, which we've actually been doing unfortunately. Maybe we need to deal with this fixed resource arbitration as a separate class and allow them to fail migration w/ -EBUSY. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/