Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp2827791pxj; Mon, 14 Jun 2021 08:05:03 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw00PZ2OwjTqhRTMo+8P5velfyOFxnp/oW8JKfepLQcI3bpORP7Zg+29dUecDq0FOqCSz4S X-Received: by 2002:a17:906:5488:: with SMTP id r8mr15896011ejo.374.1623683103728; Mon, 14 Jun 2021 08:05:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1623683103; cv=none; d=google.com; s=arc-20160816; b=Urjz4Rk0mPJob1nuuf2h1JBn8AOeDgVz5yjmOBH8aZWL8UoCub2vaI4sZTO14HHXl8 Usr9qM7JW6VC5xXVQRyqC9KW3kP6fkpDvbubSXuRNyTVX972ubxFips2WHlTVFJKYYr5 jjmyHVjWAvrSMw6URsYNZOYaQfJq98vJmoYFcU2Hsp+01fFx0/w0g7yxMU1tmRdRhhNV pJdhoZ1BRMvA35AuO0THLkSOnCIS3TD+s2BB5AWo+5dU4wlNLpewnZDeSpIjqlizzDqE SYfRI/FBb4iBmVYrCuI1b2dqJub11AGvqh742gPNTLnTdpFOG3NOu/0GfEwSQrKpLO6F t7Pw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=p55E31IVtAU9zT3xZhuiw2/yvS8uN3f+YQfEPlvnmA8=; b=Q+xoY9ARpr4FmKiUnjx+3PQ+aEK2SG4HNXGK30w9OOr2KMIOB5vyuwcBWHQp3jFcww U0whgQC+2Zi5PYa0wuVeUA3aouY7Vlk2YcVu+n7X83BwR8XNSdNEyvw0gIHI6uNwVRNe a/3MfmrkkmG0QePdR6oo049vpKtNIEorTf7xLkjD8noCVnrfj9+dN8n/zItj//XTe1wA S2L3+fSjUVxzU1DDZmXl5YOMI8dd7fm4Fm9lPZRHgb31py8/iy25CgQfvPyofsyWvxQo dTAaJ3nJqGnplhUNo/f5er7a4B5APRFE9dQvtP7l8NsfmzWP83UHYvMQIQZwUROZ1h8h 2q8g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id m19si5964704edd.117.2021.06.14.08.04.32; Mon, 14 Jun 2021 08:05:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233171AbhFNPFg (ORCPT + 99 others); Mon, 14 Jun 2021 11:05:36 -0400 Received: from foss.arm.com ([217.140.110.172]:38132 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232869AbhFNPFe (ORCPT ); Mon, 14 Jun 2021 11:05:34 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id AB0361FB; Mon, 14 Jun 2021 08:03:31 -0700 (PDT) Received: from e107158-lin.cambridge.arm.com (e107158-lin.cambridge.arm.com [10.1.195.57]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3CC263F70D; Mon, 14 Jun 2021 08:03:30 -0700 (PDT) Date: Mon, 14 Jun 2021 16:03:27 +0100 From: Qais Yousef To: Quentin Perret Cc: mingo@redhat.com, peterz@infradead.org, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rickyiu@google.com, wvw@google.com, patrick.bellasi@matbug.net, xuewen.yan94@gmail.com, linux-kernel@vger.kernel.org, kernel-team@android.com Subject: Re: [PATCH v2 3/3] sched: Make uclamp changes depend on CAP_SYS_NICE Message-ID: <20210614150327.3humrvztv3fxurvk@e107158-lin.cambridge.arm.com> References: <20210610151306.1789549-1-qperret@google.com> <20210610151306.1789549-4-qperret@google.com> <20210611124820.ksydlg4ncw2xowd3@e107158-lin.cambridge.arm.com> <20210611132653.o5iljqtmr2hcvtsl@e107158-lin.cambridge.arm.com> <20210611141737.spzlmuh7ml266c5a@e107158-lin.cambridge.arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/11/21 14:43, Quentin Perret wrote: > On Friday 11 Jun 2021 at 15:17:37 (+0100), Qais Yousef wrote: > > On 06/11/21 13:49, Quentin Perret wrote: > > > Thinking about it a bit more, a more involved option would be to have > > > this patch as is, but to also introduce a new RLIMIT_UCLAMP on top of > > > it. The semantics could be: > > > > > > - if the clamp requested by the non-privileged task is lower than its > > > existing clamp, then allow; > > > - otherwise, if the requested clamp is less than UCLAMP_RLIMIT, then > > > allow; > > > - otherwise, deny, > > > > > > And the same principle would apply to both uclamp.min and uclamp.max, > > > and UCLAMP_RLIMIT would default to 0. > > > > > > Thoughts? > > > > That could work. But then I'd prefer your patch to go as-is. I don't think > > uclamp can do with this extra complexity in using it. > > Sorry I'm not sure what you mean here? Hmm. I understood this as a new flag to sched_setattr() syscall first, but now I get it. You want to use getrlimit()/setrlimit()/prlimit() API to impose a restriction. My comment was in regard to this being a sys call extension, which it isn't. So please ignore it. > > > We basically want to specify we want to be paranoid about uclamp CAP or not. In > > my view that is simple and can't see why it would be a big deal to have > > a procfs entry to define the level of paranoia the system wants to impose. If > > it is a big deal though (would love to hear the arguments); > > Not saying it's a big deal, but I think there are a few arguments in > favor of using rlimit instead of a sysfs knob. It allows for a much > finer grain configuration -- constraints can be set per-task as well as > system wide if needed, and it is the standard way of limiting resources > that tasks can ask for. Is it system wide or per user? > > > requiring apps that > > want to self regulate to have CAP_SYS_NICE is better approach. > > Rlimit wouldn't require that though, which is also nice as CAP_SYS_NICE > grants you a lot more power than just clamps ... Now I better understand your suggestion. It seems a viable option I agree. I need to digest it more still though. The devil is in the details :) Shouldn't the default be RLIM_INIFINITY? ie: no limit? We will need to add two limit, RLIMIT_UCLAMP_MIN/MAX, right? We have the following hierarchy now: 1. System Wide (/proc/sys/kerenl/sched_util_clamp_min/max) 2. Cgroup 3. Per-Task In that order of priority where 1 limits/overrides 2 and 3. And 2 limits/overrides 3. Where do you see the RLIMIT fit in this hierarchy? It should be between 2 and 3, right? Cgroup settings should still win even if the user/processes were limited? If the framework decided a user can't request any boost at all (can't increase its uclamp_min above 0). IIUC then setting the hard limit of RLIMIT_UCLAMP_MIN to 0 would achieve that, right? Since the framework and the task itself would go through the same sched_setattr() call, how would the framework circumvent this limit? IIUC it has to raise the RLIMIT_UCLAMP_MIN first then perform sched_setattr() to request the boost value, right? Would this overhead be acceptable? It looks considerable to me. Also, Will prlimit() allow you to go outside what was set for the user via setrlimit()? Reading the man pages it seems to override, so that should be fine. For 1 (System Wide) limits, sched_setattr() requests are accepted, but the effective uclamp is *capped by* the system wide limit. Were you thinking RLIMIT_UCLAMP* will behave similarly? If they do, we have consistent behavior with how the current system wide limits work; but this will break your use case because tasks can change the requested uclamp value for a task, albeit the effective value will be limited. RLIMIT_UCLAMP_MIN=512 p->uclamp[UCLAMP_min] = 800 // this request is allowed but // Effective UCLAMP_MIN = 512 If not, then RLIMIT_UCLAMP_MIN=no limit p->uclamp[UCLAMP_min] = 800 // task changed its uclamp_min to 800 RLIMIT_UCLAMP_MIN=512 // limit was lowered for task/user what will happen to p->uclamp[UCLAMP_MIN] in this case? Will it be lowered to match the new limit? And this will be inconsistent with the current system wide limits we already have. Sorry too many questions. I was mainly thinking loudly. I need to spend more time to dig into the details of how RLIMITs are imposed to understand how this could be a good fit. I already see some friction points that needs more thinking. Thanks -- Qais Yousef