Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp427611ybv; Thu, 20 Feb 2020 00:52:05 -0800 (PST) X-Google-Smtp-Source: APXvYqwQII/YBuLZGYmfARxkyodPlhJxSe9b1t3n/rl9LQIO1I9u2eKr5ODcStuKnFcxJViVDsSo X-Received: by 2002:a9d:6289:: with SMTP id x9mr23157525otk.8.1582188725093; Thu, 20 Feb 2020 00:52:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582188725; cv=none; d=google.com; s=arc-20160816; b=tqogt9yffXn/OBxRiNPuYyw+uX0Uat/7JlWjaojlUJzdwzXc0e3xdKpqgZ8jvmXWMn YMXHBrXuOAmL2qQ47fdiHCAcVyTZbIN6lHAgkhg1fIQr+Z1Q5eTsgkk6f3YlPfFvgqdj xqG1nbbQRPDL7JhWiq0l8RZ8tFTgiJgc0a7JSZ7w+FKPN9sGTMP4uqtXUe4vKZRE18C2 8xkZuz2r3HISjeB0XGfj5w5keWER/hhXJrZngkYTwxovtVPXwXp3HZVjP1wrhgihTmtR +ynWe7vsZ1rWF5VGGW8TiTwOLbihKlHJ0xOyBjSEucgHyFAAgLInnRLXAyJo6t6xr24g /3pg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :references:cc:to:from:subject; bh=Cpdmbr6v0jdsDXT1XR88hcMPJ0roHoNw5ALXRyCm/SQ=; b=eq9IsHwtsQ1MnTlmOP1In8dmDqqbdtebL3/HFFcmtnP9xFNMA+z3/raZEtH3nacHjd tsB73dfC34kia+hGKZm8k8pKc1h3o0fOffk3sccPF4VYOTqVtpowPKJLs3lR4FuJZayG RZgUHpbLaXPsMUIXprjeYDNWMDRsK01MMIUaFFYl08WFkcV4DmrkJB4J1ovCRnVdgjNG Ju4HKi/F6HcSn9Q2jIg+Q2OgfSxdhfb7OcA8OeEb0kPS0W3fYRSLTw/rvJ/J72c7lSpg xv8LdPe+cdG5bXg6hRVWw664Kom9OgHRPqG6jGLyfzi6HTLuDDP9fwHuLrX4N3N4/jz3 Dwcw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x2si1385131otk.29.2020.02.20.00.51.51; Thu, 20 Feb 2020 00:52:05 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726799AbgBTIuT (ORCPT + 99 others); Thu, 20 Feb 2020 03:50:19 -0500 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:28534 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726501AbgBTIuT (ORCPT ); Thu, 20 Feb 2020 03:50:19 -0500 Received: from pps.filterd (m0098413.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 01K8o7aa085722 for ; Thu, 20 Feb 2020 03:50:18 -0500 Received: from e06smtp05.uk.ibm.com (e06smtp05.uk.ibm.com [195.75.94.101]) by mx0b-001b2d01.pphosted.com with ESMTP id 2y93kfrspt-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 20 Feb 2020 03:50:17 -0500 Received: from localhost by e06smtp05.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 20 Feb 2020 08:50:15 -0000 Received: from b06avi18626390.portsmouth.uk.ibm.com (9.149.26.192) by e06smtp05.uk.ibm.com (192.168.101.135) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 20 Feb 2020 08:50:10 -0000 Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com [9.149.105.58]) by b06avi18626390.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 01K8nDHI40698146 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 20 Feb 2020 08:49:13 GMT Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 04B734C046; Thu, 20 Feb 2020 08:50:09 +0000 (GMT) Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9D7DD4C052; Thu, 20 Feb 2020 08:50:06 +0000 (GMT) Received: from localhost.localdomain (unknown [9.124.35.29]) by d06av22.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 20 Feb 2020 08:50:06 +0000 (GMT) Subject: Re: [PATCH v3 0/3] Introduce per-task latency_nice for scheduler hints From: Parth Shah To: chris hyser , vincent.guittot@linaro.org, patrick.bellasi@matbug.net, valentin.schneider@arm.com, dhaval.giani@oracle.com, dietmar.eggemann@arm.com Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, qais.yousef@arm.com, pavel@ucw.cz, qperret@qperret.net, David.Laight@ACULAB.COM, pjt@google.com, tj@kernel.org References: <20200116120230.16759-1-parth@linux.ibm.com> <8ed0f40c-eeb4-c487-5420-a8eb185b5cdd@linux.ibm.com> <971909ed-d4e0-6afa-d20b-365ede5a195e@linux.ibm.com> <8e984496-e89b-d96c-d84e-2be7f0958ea4@oracle.com> <1e216d18-7ec0-4a0d-e124-b730d6e03e6f@oracle.com> Date: Thu, 20 Feb 2020 14:20:05 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 x-cbid: 20022008-0020-0000-0000-000003ABD933 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 20022008-0021-0000-0000-00002203DD80 Message-Id: <7429e0ae-41ff-e9c4-dd65-3ef1919f5f50@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138,18.0.572 definitions=2020-02-20_02:2020-02-19,2020-02-20 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 spamscore=0 clxscore=1015 phishscore=0 mlxlogscore=999 impostorscore=0 priorityscore=1501 adultscore=0 lowpriorityscore=0 malwarescore=0 suspectscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2001150001 definitions=main-2002200066 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2/20/20 2:04 PM, Parth Shah wrote: > > > On 2/19/20 11:53 PM, chris hyser wrote: >> >> >> On 2/19/20 9:15 AM, chris hyser wrote: >>> >>> >>> On 2/19/20 5:09 AM, Parth Shah wrote: >>>> Hi Chris, >>>> >>>> On 2/19/20 4:30 AM, chris hyser wrote: >>>>> On 2/17/20 3:57 AM, Parth Shah wrote: >>>>>> >>>>>> >>>>>> On 1/16/20 5:32 PM, Parth Shah wrote: >>>>>>> This is the 3rd revision of the patch set to introduce >>>>>>> latency_{nice/tolerance} as a per task attribute. >>>>>>> >>>>>>> The previous version can be found at: >>>>>>> v1: https://lkml.org/lkml/2019/11/25/151 >>>>>>> v2: https://lkml.org/lkml/2019/12/8/10 >>>>>>> >>>>>>> Changes in this revision are: >>>>>>> v2 -> v3: >>>>>>> - This series changes the longer attribute name to "latency_nice" as per >>>>>>>     the comment from Dietmar Eggemann >>>>>>> https://lkml.org/lkml/2019/12/5/394 >>>>>>> v1 -> v2: >>>>>>> - Addressed comments from Qais Yousef >>>>>>> - As per suggestion from Dietmar, moved content from newly created >>>>>>>     include/linux/sched/latency_tolerance.h to kernel/sched/sched.h >>>>>>> - Extend sched_setattr() to support latency_tolerance in tools >>>>>>> headers UAPI >>>>>>> >>>>>>> >>>>>>> Introduction: >>>>>>> ============== >>>>>>> This patch series introduces a new per-task attribute latency_nice to >>>>>>> provide the scheduler hints about the latency requirements of the >>>>>>> task [1]. >>>>>>> >>>>>>> Latency_nice is a ranged attribute of a task with the value ranging >>>>>>> from [-20, 19] both inclusive which makes it align with the task nice >>>>>>> value. >>>>>>> >>>>>>> The value should provide scheduler hints about the relative latency >>>>>>> requirements of tasks, meaning the task with "latency_nice = -20" >>>>>>> should have lower latency requirements than compared to those tasks with >>>>>>> higher values. Similarly a task with "latency_nice = 19" can have higher >>>>>>> latency and hence such tasks may not care much about latency. >>>>>>> >>>>>>> The default value is set to 0. The usecases discussed below can use this >>>>>>> range of [-20, 19] for latency_nice for the specific purpose. This >>>>>>> patch does not implement any use cases for such attribute so that any >>>>>>> change in naming or range does not affect much to the other (future) >>>>>>> patches using this. The actual use of latency_nice during task wakeup >>>>>>> and load-balancing is yet to be coded for each of those usecases. >>>>>>> >>>>>>> As per my view, this defined attribute can be used in following ways >>>>>>> for a >>>>>>> some of the usecases: >>>>>>> 1 Reduce search scan time for select_idle_cpu(): >>>>>>> - Reduce search scans for finding idle CPU for a waking task with lower >>>>>>>     latency_nice values. >>>>>>> >>>>>>> 2 TurboSched: >>>>>>> - Classify the tasks with higher latency_nice values as a small >>>>>>>     background task given that its historic utilization is very low, for >>>>>>>     which the scheduler can search for more number of cores to do task >>>>>>>     packing.  A task with a latency_nice >= some_threshold (e.g, == 19) >>>>>>>     and util <= 12.5% can be background tasks. >>>>>>> >>>>>>> 3 Optimize AVX512 based workload: >>>>>>> - Bias scheduler to not put a task having (latency_nice == -20) on a >>>>>>>     core occupying AVX512 based workload. >>>>>>> >>>>>>> >>>>>>> Series Organization: >>>>>>> ==================== >>>>>>> - Patch 1: Add new attribute latency_nice to task_struct. >>>>>>> - Patch 2: Clone parent task's attribute to the child task on fork >>>>>>> - Patch 3: Add support for sched_{set,get}attr syscall to modify >>>>>>>              latency_nice of the task >>>>>>> >>>>>>> >>>>>>> The patch series can be applied on tip/sched/core at the >>>>>>> commit 804d402fb6f6 ("sched/rt: Make RT capacity-aware") >>>>>>> >>>>>>> >>>>>>> References: >>>>>>> ============ >>>>>>> [1]. Usecases for the per-task latency-nice attribute, >>>>>>>        https://lkml.org/lkml/2019/9/30/215 >>>>>>> [2]. Task Latency-nice, "Subhra Mazumdar", >>>>>>>        https://lkml.org/lkml/2019/8/30/829 >>>>>>> [3]. Introduce per-task latency_tolerance for scheduler hints, >>>>>>>        https://lkml.org/lkml/2019/12/8/10 >>>>>>> >>>>>>> >>>>>>> Parth Shah (3): >>>>>>>     sched: Introduce latency-nice as a per-task attribute >>>>>>>     sched/core: Propagate parent task's latency requirements to the >>>>>>> child >>>>>>>       task >>>>>>>     sched: Allow sched_{get,set}attr to change latency_nice of the task >>>>>>> >>>>>>>    include/linux/sched.h            |  1 + >>>>>>>    include/uapi/linux/sched.h       |  4 +++- >>>>>>>    include/uapi/linux/sched/types.h | 19 +++++++++++++++++++ >>>>>>>    kernel/sched/core.c              | 21 +++++++++++++++++++++ >>>>>>>    kernel/sched/sched.h             | 18 ++++++++++++++++++ >>>>>>>    tools/include/uapi/linux/sched.h |  4 +++- >>>>>>>    6 files changed, 65 insertions(+), 2 deletions(-) >>>>>>> >>>>>> >>>>>> Its been a long time and few revisions since the beginning of the >>>>>> discussion around the latency-nice. Hence thought of asking if there >>>>>> is/are >>>>>> any further work that needs to be done for adding latency-nice >>>>>> attribute or >>>>>> am I missing any piece in here? >>>>> >>>>> All, I was asked to take a look at the original latency_nice patchset. >>>>> First, to clarify objectives, Oracle is not interested in trading >>>>> throughput for latency. What we found is that the DB has specific tasks >>>>> which do very little but need to do this as absolutely quickly as >>>>> possible, >>>>> ie extreme latency sensitivity. Second, the key to latency reduction in >>>>> the >>>>> task wakeup path seems to be limiting variations of "idle cpu" search. The >>>>> latter particularly interests me as an example of "platform size based >>>>> latency" which I believe to be important given all the varying size VMs >>>>> and >>>>> containers. >>>>> >>>>> Parth, I've been using your v3 patchset as the basis of an investigation >>>>> into the measurable effects of short-circuiting this search. I'm not quite >>>>> ready to put anything out, but the patchset is working well. The only >>>> >>>> That's a good news as you are able to get a usecase of this patch-set. >>>> >>>>> feedback I have is that currently non-root can set the value negative >>>>> which >>>>> is inconsistent with 'nice' and I would think a security hole. >>>>> >>>> >>>> I would assume you mean 'latency_nice' here. >>>> >>>>  From my testing, I was not able to set values for any root owned task's >>>> latency_nice value by the non-root user. Also, my patch-set just piggybacks >>>> on the already existing sched_setattr syscall and hence it should not allow >>>> non-root user to do any modifications. Can you confirm this by changing >>>> nice (renice) value of a root task from non-root user. >>>> >>>> I have done the sanity check in the code and thinking where it could >>>> possibly have gone wrong. So, can you please specify what values were you >>>> able to set outside the [-20, 19] range? >>> >>> The checks prevent being outside that range. But negative numbers -20 to >>> -1 did not need root. Let me dig some more. I verified this explicitly >>> before sending the email so something is up. >> >> I went digging. This is absolutely repeatable. I checked that I do not >> unknowingly have CAP_SYS_NICE as a user. So first, are we tying >> latency_nice to CAP_SYS_NICE? Seems like a reasonable thing, but not sure I >> saw this stated anywhere. Second, the only capability checked in >> __sched_setscheduler() in the patch I have is CAP_SYS_NICE and those checks >> will not return a -EPERM for a negative latency_tolerance (in the code, aka >> latency_nice). Do I have the correct version of the code? Am I missing >> something? > > You are right. I have not added permission checks for setting the > latency_nice value. For the task_nice, non-root user has no permission to > set the value lower than the current value which is not the case with the > latency_nice. > > In order to align with the permission checks like task_nice, I will add the > check similar to task_nice and send out the v4 of the series soon. > > > Thanks for pointing out. > - Parth > The below diff works out well enough in-order to align permission checks with NICE. diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 2bfcff5623f9..ef4a397c9170 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4878,6 +4878,10 @@ static int __sched_setscheduler(struct task_struct *p, return -EINVAL; if (attr->sched_latency_nice < MIN_LATENCY_NICE) return -EINVAL; + /* Use the same security checks as NICE */ + if (attr->sched_latency_nice < p->latency_nice && + !can_nice(p, attr->sched_latency_nice)) + return -EPERM; } if (pi) With the above in effect, A non-root user can only increase the value upto +19, and once increased cannot be decreased. e.g., a user once sets the value latency_nice = 19, the same user cannot set the value latency_nice = 18. This is the same effect as with NICE. Is such permission checks required? Unlike NICE, we are going to use latency_nice for scheduler hints only, and so won't it make more sense to allow a user to increase/decrease the values of their owned tasks? - Parth