Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp382136ybe; Wed, 4 Sep 2019 00:57:16 -0700 (PDT) X-Google-Smtp-Source: APXvYqwz7WzDH5n9zOEdogR7RXZwClW5k+NUCwD+YAwdYG45ymPDXTZHM4ILF0cLy/b1kN+fnME2 X-Received: by 2002:a17:90a:9409:: with SMTP id r9mr3653905pjo.10.1567583835919; Wed, 04 Sep 2019 00:57:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1567583835; cv=none; d=google.com; s=arc-20160816; b=kNs5b64keJE7e3VsCB9+kd0x01zJT8achFfy5XsbFaeS0ie5MXHSMuXztvVkhclqUR Vd0qvon3GTIXM7U5VR1jc62n4gbo6a8BrH1JHsekSwTNmlfCUqrcQQYQV5Efk9wj9B2/ U+clYvX0V+97Si16F6IlxtjQyZDDvL9MxNyc7HgGWAF2t7uMYMJQznIw3Qv73dVQXIoq ji9nVY78CwVWqoP+TO3hhc5jFB6+Z4Qkdmy2DchqizFnkoWR3Re6MgWlQ4tn/81oTAa0 17UFhV2L9pJRazOvOyIFzmrlmnoifOGuMKvDivVGyFwFwad4cIiKBBKKCnusqyvbAz7v o0bQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=bGC4vy9qwOG0kdz868DLU/bUuZ0P+dFsEcUWOjjDq5E=; b=c/17MrBGeWlx01pLWyRgmQvtW2we7hqvjg13uRuzfwDs6yc4QEKFKe6eSMJMOF5fsH IzZV9UC4KGRoIkP1n73PjfL04NZ2qKyNJMz9BHygDRqot8+8E0MyIhysvqpzXkdpuc/S d3McPFIhzOmSrlhhjrQ1rgHnYYe26IyjBp9fnY79eaRbvmunZz6SJaWuDbtk9/neWnPl edxxkhnxdo7asGw1EugcBBtNbwAmVY51Xx2mzuv1Zx02aX83/RMr4Asn/z3c2aHKc8+D iuB3xUt5+RqarLfGWj3sqKVu1xAmzpIkq5ty5C6w+WVEI7t64+D4PEPGI0HfSFV61jtt WynQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=aPD7E0fW; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 186si16691978pgc.248.2019.09.04.00.57.00; Wed, 04 Sep 2019 00:57:15 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=aPD7E0fW; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729399AbfIDHzi (ORCPT + 99 others); Wed, 4 Sep 2019 03:55:38 -0400 Received: from mail-wr1-f66.google.com ([209.85.221.66]:41400 "EHLO mail-wr1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728717AbfIDHzi (ORCPT ); Wed, 4 Sep 2019 03:55:38 -0400 Received: by mail-wr1-f66.google.com with SMTP id j16so20117809wrr.8 for ; Wed, 04 Sep 2019 00:55:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=bGC4vy9qwOG0kdz868DLU/bUuZ0P+dFsEcUWOjjDq5E=; b=aPD7E0fWFoD0Re5MUDQ9jsMFsd9e57fapMHTHSO28IvivLwfqKZgJiXS3kgMoi9G0n DeOE/d720ORx6f6gobO9KhoU+83SBQp68oKxUlcQDyYWDvi5dvmZvWQpPOp3NeNlnGgj QSGBzUYsaNdl0L2AfBjkmF0UzbUvKEzJQoR7je6fB6/7kj0pSodNiDSXQOQdtvYqaNmS 4V5POC+yOUfdexNatq1pvkyeM2N299PM8nPYsfDWr4RTLgbwJHO0g0fVhsmW66sZ+jD2 iyo6khaDxx3p8bzozlQXMmDzuwXno+eXSH+oL8yjYR1QL0KaoOSiCbZjxZdi9+h7wOw7 AEIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=bGC4vy9qwOG0kdz868DLU/bUuZ0P+dFsEcUWOjjDq5E=; b=iTpn47h10QN5UU1p11jDRLAS4wLGpV3jhia6xdOIdcefgvbeqc2ZNUUSKCUPlv6EdX PYXj0cdGFRy2PdEtLc9edorjXOT6o0YzcztY9+eOxhKtSRLXVwXuO53pLXWk+J6Ld9NW XcfXUZrWG3ld5FWFHI4MxqfL7ZV6jXn/9X8BFv15j/rddgJVw2e46tWQomAn2xE5eb+r tUxCnGtk+vQg/9gtOfzO0XuNKB+xwiFZ53Uzo9UJTmntuHr8W0htpJAiXbNkgwZTZYNv FywZmTtWYepSjnq1ynjespG3pgIccwnduiQnkPOFGq2nxfCAFdg41cvNvXUHH+aYaH0s JTvw== X-Gm-Message-State: APjAAAU7wqkhsQJQAxkbn7Sk77Yw4KoMXCWsqmC2dDNDn3y6vNSj7pAe KFBXSK2Xk69e4MEUD4XxI16iISgC X-Received: by 2002:adf:ba4a:: with SMTP id t10mr45864581wrg.325.1567583735317; Wed, 04 Sep 2019 00:55:35 -0700 (PDT) Received: from gmail.com (2E8B0CD5.catv.pool.telekom.hu. [46.139.12.213]) by smtp.gmail.com with ESMTPSA id z5sm19191386wrl.33.2019.09.04.00.55.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Sep 2019 00:55:34 -0700 (PDT) Date: Wed, 4 Sep 2019 09:55:32 +0200 From: Ingo Molnar To: Thadeu Lima de Souza Cascardo Cc: linux-kernel@vger.kernel.org, Peter Zijlstra , Ingo Molnar , Patrick Bellasi , Thomas Gleixner , Arnaldo Carvalho de Melo , Linus Torvalds , Jiri Olsa Subject: [PATCH] sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code Message-ID: <20190904075532.GA26751@gmail.com> References: <20190903171645.28090-1-cascardo@canonical.com> <20190903171645.28090-2-cascardo@canonical.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190903171645.28090-2-cascardo@canonical.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Thadeu Lima de Souza Cascardo wrote: > After commit a509a7cd7974 (sched/uclamp: Extend sched_setattr() to support > utilization clamping), using sched_getattr with size 48 will return E2BIG. > > This breaks, for example, chrt. > $ chrt -p $$ > chrt: failed to get pid 26306's policy: Argument list too long > $ > > With this fix, when using the old size, the utilization clamping values will be > absent from sched_attr. When using the new size or some larger size, they will > be present. > > After the fix, chrt works again. > $ chrt -p $$ > pid 4649's current scheduling policy: SCHED_OTHER > pid 4649's current scheduling priority: 0 > $ > > The drawback with this solution is that userspace will ignore there are > non-default utilization clamps, but it's arguable whether returning E2BIG in > this case makes sense when that same userspace doesn't know about those values > anyway. a509a7cd7974 breaks the ABI when CONFIG_UCLAMP_TASK=y, so there's really no choice here but to fix it. But this is not the right fix: > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -5183,8 +5183,10 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr, > attr.sched_nice = task_nice(p); > > #ifdef CONFIG_UCLAMP_TASK > - attr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value; > - attr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value; > + if (size >= SCHED_ATTR_SIZE_VER1) { > + attr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value; > + attr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value; > + } > #endif The real bug is that the whole sched_read_attr() logic of checking non-zero bits in new ABI components is arguably broken, and pretty much any extension of the ABI will spuriously break the ABI as you just found out. That's way too fragile. Instead how about something like the patch below? It cleans up the ABI logic: - if user-attributes have the same size as kernel attributes then the logic is unchanged. - if user-attributes are larger than the kernel knows about then simply skip the extra bits, but set attr->size to the (smaller) kernel size so that tooling can (in principle) handle older kernel as well. - if user-attributes are smaller than the kernel knows about then just copy whatever user-space can accept. Also clean up the whole logic: - Simplify the code flow - there's no need for 'ret' for example. - Standardize on 'kattr/uattr' and 'ksize/usize' naming to make sure we always know which side we are dealing with. - Why is it called 'read' when what it does is to copy to user? This code is so far away from VFS read() semantics that the naming is actively confusing. Name it sched_attr_copy_to_user() instead, which mirrors other copy_to_user() functionality. - Move the attr->size assignment from the head of sched_getattr() to the sched_attr_copy_to_user() function. Nothing else within the kernel should care about the size of the structure. With this patch I believe the sched_getattr() syscall now nicely supports an extensible ABI in both a forward and backward compatible fashion, and will also fix the bug. As an added bonus the bogus -EFBIG return is removed as well. This needs to be fixed before v5.3 is released. NOTE: one additional open question is whether to set attr->size to the *larger* value if the kernel has a newer ABI. This would allow user-space to detect a new ABI. I've also Cc:-ed perf tooling ABI experts, which handles things in a similar fashion. (And if it doesn't or shouldn't then please chime in.) Thanks, Ingo ------ Subject: sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code From: Ingo Molnar Date: Sat Aug 24 17:41:57 CEST 2019 Reported-by: Thadeu Lima de Souza Cascardo Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Signed-off-by: Ingo Molnar --- kernel/sched/core.c | 80 ++++++++++++++++++++++++++-------------------------- 1 file changed, 40 insertions(+), 40 deletions(-) Index: linux/kernel/sched/core.c =================================================================== --- linux.orig/kernel/sched/core.c +++ linux/kernel/sched/core.c @@ -5299,37 +5299,40 @@ out_unlock: return retval; } -static int sched_read_attr(struct sched_attr __user *uattr, - struct sched_attr *attr, - unsigned int usize) +/* + * Copy the kernel size attribute structure (which might be larger + * than what user-space knows about) to user-space. + * + * Note that all cases are valid: user-space buffer can be larger or + * smaller than the kernel-space buffer. The usual case is that both + * have the same size. + */ +static int +sched_attr_copy_to_user(struct sched_attr __user *uattr, + struct sched_attr *kattr, + unsigned int usize) { - int ret; + unsigned int ksize = sizeof(*kattr); - if (!access_ok(uattr, usize)) + if (!access_ok(uattr, max(usize, ksize))) return -EFAULT; /* - * If we're handed a smaller struct than we know of, - * ensure all the unknown bits are 0 - i.e. old - * user-space does not get uncomplete information. + * sched_getattr() ABI forwards and backwards compatibility: + * + * If usize == ksize then we just copy everything to user-space and all is good. + * + * If usize < ksize then we only copy as much as user-space has space for, + * this keeps ABI compatibility as well. We skip the rest. + * + * If usize > ksize then user-space is using a newer version of the ABI, + * which part the kernel doesn't know about. Just ignore it - tooling can + * detect the kernel's knowledge of attributes from the attr->size value + * which is set to ksize in this case. */ - if (usize < sizeof(*attr)) { - unsigned char *addr; - unsigned char *end; - - addr = (void *)attr + usize; - end = (void *)attr + sizeof(*attr); - - for (; addr < end; addr++) { - if (*addr) - return -EFBIG; - } - - attr->size = usize; - } + kattr->size = min(usize, ksize); - ret = copy_to_user(uattr, attr, attr->size); - if (ret) + if (copy_to_user(uattr, kattr, kattr->size)) return -EFAULT; return 0; @@ -5339,20 +5342,18 @@ static int sched_read_attr(struct sched_ * sys_sched_getattr - similar to sched_getparam, but with sched_attr * @pid: the pid in question. * @uattr: structure containing the extended parameters. - * @size: sizeof(attr) for fwd/bwd comp. + * @usize: sizeof(attr) that user-space knows about, for forwards and backwards compatibility. * @flags: for future extension. */ SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr, - unsigned int, size, unsigned int, flags) + unsigned int, usize, unsigned int, flags) { - struct sched_attr attr = { - .size = sizeof(struct sched_attr), - }; + struct sched_attr kattr; struct task_struct *p; int retval; - if (!uattr || pid < 0 || size > PAGE_SIZE || - size < SCHED_ATTR_SIZE_VER0 || flags) + if (!uattr || pid < 0 || usize > PAGE_SIZE || + usize < SCHED_ATTR_SIZE_VER0 || flags) return -EINVAL; rcu_read_lock(); @@ -5365,25 +5366,24 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pi if (retval) goto out_unlock; - attr.sched_policy = p->policy; + kattr.sched_policy = p->policy; if (p->sched_reset_on_fork) - attr.sched_flags |= SCHED_FLAG_RESET_ON_FORK; + kattr.sched_flags |= SCHED_FLAG_RESET_ON_FORK; if (task_has_dl_policy(p)) - __getparam_dl(p, &attr); + __getparam_dl(p, &kattr); else if (task_has_rt_policy(p)) - attr.sched_priority = p->rt_priority; + kattr.sched_priority = p->rt_priority; else - attr.sched_nice = task_nice(p); + kattr.sched_nice = task_nice(p); #ifdef CONFIG_UCLAMP_TASK - attr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value; - attr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value; + kattr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value; + kattr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value; #endif rcu_read_unlock(); - retval = sched_read_attr(uattr, &attr, size); - return retval; + return sched_attr_copy_to_user(uattr, &kattr, usize); out_unlock: rcu_read_unlock();