Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934068AbbENUzY (ORCPT ); Thu, 14 May 2015 16:55:24 -0400 Received: from mail-db3on0071.outbound.protection.outlook.com ([157.55.234.71]:52672 "EHLO emea01-db3-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S933912AbbENUzU (ORCPT ); Thu, 14 May 2015 16:55:20 -0400 Authentication-Results: vger.kernel.org; dkim=none (message not signed) header.d=none; Message-ID: <55550BA5.4040905@ezchip.com> Date: Thu, 14 May 2015 16:55:01 -0400 From: Chris Metcalf User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: , Andy Lutomirski CC: Ingo Molnar , Peter Zijlstra , Gilad Ben Yossef , Steven Rostedt , Ingo Molnar , Andrew Morton , Rik van Riel , Tejun Heo , Thomas Gleixner , Frederic Weisbecker , Christoph Lameter , "linux-doc@vger.kernel.org" , Linux API , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH 4/6] nohz: support PR_DATAPLANE_QUIESCE References: <1431107927-13998-1-git-send-email-cmetcalf@ezchip.com> <1431107927-13998-5-git-send-email-cmetcalf@ezchip.com> <20150512093349.GH21418@twins.programming.kicks-ass.net> <20150512095030.GD11477@gmail.com> <20150512103805.GJ21418@twins.programming.kicks-ass.net> <20150512125200.GB17244@gmail.com> <20150513175150.GL6776@linux.vnet.ibm.com> In-Reply-To: <20150513175150.GL6776@linux.vnet.ibm.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [12.216.194.146] X-ClientProxiedBy: BLUPR11CA0070.namprd11.prod.outlook.com (10.141.30.38) To AM2PR02MB0772.eurprd02.prod.outlook.com (25.163.146.16) X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:AM2PR02MB0772; X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(5005006)(3002001);SRVR:AM2PR02MB0772;BCL:0;PCL:0;RULEID:;SRVR:AM2PR02MB0772; X-Forefront-PRVS: 0576145E86 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(979002)(6049001)(6009001)(24454002)(51704005)(377454003)(479174004)(64126003)(77096005)(59896002)(83506001)(15975445007)(33656002)(5001960100002)(65806001)(65956001)(46102003)(66066001)(65816999)(50986999)(87266999)(76176999)(54356999)(42186005)(189998001)(47776003)(5001770100001)(81156007)(122386002)(4001350100001)(92566002)(93886004)(36756003)(50466002)(62966003)(77156002)(2950100001)(40100003)(19580395003)(23746002)(87976001)(86362001)(80316001)(99136001)(18886065003)(969003)(989001)(999001)(1009001)(1019001);DIR:OUT;SFP:1101;SCL:1;SRVR:AM2PR02MB0772;H:[10.7.0.41];FPR:;SPF:None;MLV:ovrnspm;PTR:InfoNoRecords;LANG:en; X-OriginatorOrg: ezchip.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 May 2015 20:55:15.3343 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM2PR02MB0772 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2406 Lines: 52 On 05/12/2015 08:52 AM, Ingo Molnar wrote: > What I suggested is that it might make sense to offer a system call, > for example a sched_setparam() variant, that makes such guarantees. > > Say if user-space does: > > ret = sched_setscheduler(0, BIND_ISOLATED, &isolation_params); > > ... then we would get the task moved to an isolated domain and get a 0 > return code if the kernel is able to do all that and if the current > uid/namespace/etc. has the required permissions and such. Unfortunately I don't know nearly as much about the scheduler and scheduler policies as I might, since I mostly focused on make the scheduler stay out of the way. :-) This does seem like another way to set a policy bit on a process. I assume you could only validly issue this call on a nohz_full core, and that you're not assuming it migrates the cpu to such a core? You suggested that BIND_ISOLATED would not replace the usual scheduler policies, but perhaps SCHED_ISOLATED as a full replacement would make sense - it would make it an error to have any other schedulable task on that core. I guess that brings it around to whether the "cpu_isolated" task just loses when another task is scheduled on the core with it (the current approach I'm proposing) or if it ends up truly owning the core and other processes can be denied the right to run there: which in that case clearly does get us into the area of requiring privileges to set up, as Andy pointed out later. This would leave the notion of "strict" as proposed elsewhere as a separate thing, but presumably it could still be a prctl() as originally proposed. I admit I don't know enough to say whether this sounds like a better approach than just using a prctl() to set the cpu_isolated state. My instinct is that it's cleanest to avoid requiring permissions to do this, and to simply enable the quiescing semantics the process requested when it happens to be alone on a core. If so, it's somewhat orthogonal to the actual scheduler policy in force, so best not to conflate it with the notion of scheduler code at all via sched_setscheduler()? -- Chris Metcalf, EZChip Semiconductor http://www.ezchip.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/