Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp687715ybe; Wed, 4 Sep 2019 06:20:51 -0700 (PDT) X-Google-Smtp-Source: APXvYqzv/+Mn67dblpY5nE3DwIa0sMYb+lYD32iaMtLxBtAxswimQjJxVsiGftzB5IuBI0pm5a2V X-Received: by 2002:aa7:93a8:: with SMTP id x8mr13488451pff.151.1567603250993; Wed, 04 Sep 2019 06:20:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1567603250; cv=none; d=google.com; s=arc-20160816; b=fHSr15VGr8W8OJ/poZllW/9KUiSg7AOaMZX65ETDBHFk2/rwhtMHRh3NUiMreJYqay ke6t8JlYSqqttpcWz7StLNdKOQt/V3+cWD3nWVh0XhuZFkehHwuZbBnrGrVt+eOIgnT6 AzdaTtPb88ihOUPOT/4ppl+ewh/454m338pqlw2Qw2bPyWG3g2daZRwiLN1JFarFY1OA HS5D5lSVrfqmnEix4WmT5CvDb9lxxgDydEshrBWJ1UOv7BT2tvNuqnP/lNdv2lPrmH5r Vi4wqrQCq+ZR1f+kGLAFTf1B74lKE5Amact5pz3xnWPytEPAN0CUNF5uLfTXV4yWJ39E 8ToA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=SJch+u8X1J2Bx5oP7wQ8HDDuioWa3a0Kve4BdZ80EsU=; b=r3o00W/VPEiUqkyhxcU9CZtccpyUXWEJ6HLR3GgUVx20Sp6/0QZs2YayxI+zsI4I7i yJmf42gLoxL7kGMOUoVxPAkWR1e976L9LE3MBNV4vUnNJbfOcJxiNpc2uqMGm4plhVxD Jv0mX7y9J/ARSnBHRrDP7CkqPPs187G7t2E5uh+/PBAoWY11G/8Xikp2HZsc+1lJngBh Yg+afVq8u87b01m0jLLwmfxt1J0ayvZ15WVqgToZldwypV52SU18LHO2Etl1umduE9Qy HXYRXJg+EijA0zMwD8WnLGQAWVPmHqtj/+D+MDQRhOfrVkCf3UOdh2j9sj8Xu9vCBfa/ y07w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y9si19994536pfm.25.2019.09.04.06.20.35; Wed, 04 Sep 2019 06:20:50 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730273AbfIDNTp (ORCPT + 99 others); Wed, 4 Sep 2019 09:19:45 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:53992 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727900AbfIDNTo (ORCPT ); Wed, 4 Sep 2019 09:19:44 -0400 Received: from 1.general.cascardo.us.vpn ([10.172.70.58] helo=calabresa) by youngberry.canonical.com with esmtpsa (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.76) (envelope-from ) id 1i5VCF-0006HK-Oc; Wed, 04 Sep 2019 13:19:40 +0000 Date: Wed, 4 Sep 2019 10:19:34 -0300 From: Thadeu Lima de Souza Cascardo To: Ingo Molnar Cc: Dietmar Eggemann , linux-kernel@vger.kernel.org, Peter Zijlstra , Ingo Molnar , Patrick Bellasi , Thomas Gleixner , Arnaldo Carvalho de Melo , Linus Torvalds , Jiri Olsa Subject: Re: [PATCH v3] sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code Message-ID: <20190904131932.GG4101@calabresa> References: <20190903171645.28090-1-cascardo@canonical.com> <20190903171645.28090-2-cascardo@canonical.com> <20190904075532.GA26751@gmail.com> <20190904084934.GA117671@gmail.com> <20190904085519.GA127156@gmail.com> <20190904103925.GA60962@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190904103925.GA60962@gmail.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 04, 2019 at 12:39:25PM +0200, Ingo Molnar wrote: > > * Dietmar Eggemann wrote: > > > > -v3 attached. Build and minimally boot tested. > > > > > > Thanks, > > > > > > Ingo > > > > > > > This patch fixes the issue (almost). > > > > LTP's sched_getattr01 passes again. But IMHO the prio 'chrt -p $$' > > should be 0 instead of -65536. > > Yeah, I forgot to zero-initialize the structure ... > > Does the attached -v4 work any better for you? > > Thanks, > > Ingo Hi, Ingo. Thanks for the patch. It works just fine for me, I have only one comment about it, below. Acked-by: Thadeu Lima de Souza Cascardo Tested-by: Thadeu Lima de Souza Cascardo I also tested LTP, chrt, and that 5.2 (before the breaking commit) and 5.3 with your patch behave the same when the user size is larger than what the kernel knows, the return is 0 and the struct size is set to the known value. There is one odd behaviour now, which is whenever size is set between VER0 and VER1, we will have a partial struct filled up, instead of getting E2BIG or EINVAL. > > ===============================> > Date: Wed, 4 Sep 2019 09:55:32 +0200 > Subject: [PATCH] sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code > > Thadeu Lima de Souza Cascardo reported that 'chrt' broke on recent kernels: > > $ chrt -p $$ > chrt: failed to get pid 26306's policy: Argument list too long > > and he has root-caused the bug to the following commit increasing sched_attr > size and breaking sched_read_attr() into returning -EFBIG: > > a509a7cd7974 ("sched/uclamp: Extend sched_setattr() to support utilization clamping") > > The other, bigger bug is that the whole sched_getattr() and sched_read_attr() > logic of checking non-zero bits in new ABI components is arguably broken, > and pretty much any extension of the ABI will spuriously break the ABI. > That's way too fragile. > > Instead implement the perf syscall's extensible ABI instead, which we > already implement on the sched_setattr() side: > > - if user-attributes have the same size as kernel attributes then the > logic is unchanged. > > - if user-attributes are larger than the kernel knows about then simply > skip the extra bits, but set attr->size to the (smaller) kernel size > so that tooling can (in principle) handle older kernel as well. > > - if user-attributes are smaller than the kernel knows about then just > copy whatever user-space can accept. > > Also clean up the whole logic: > > - Simplify the code flow - there's no need for 'ret' for example. > > - Standardize on 'kattr/uattr' and 'ksize/usize' naming to make sure we > always know which side we are dealing with. > > - Why is it called 'read' when what it does is to copy to user? This > code is so far away from VFS read() semantics that the naming is > actively confusing. Name it sched_attr_copy_to_user() instead, which > mirrors other copy_to_user() functionality. > > - Move the attr->size assignment from the head of sched_getattr() to the > sched_attr_copy_to_user() function. Nothing else within the kernel > should care about the size of the structure. > > With these fixes the sched_getattr() syscall now nicely supports an > extensible ABI in both a forward and backward compatible fashion, and > will also fix the chrt bug. > > As an added bonus the bogus -EFBIG return is removed as well, which as > Thadeu noted should have been -E2BIG to begin with. > > Reported-by: Thadeu Lima de Souza Cascardo > Cc: Arnaldo Carvalho de Melo > Cc: Jiri Olsa > Cc: Linus Torvalds > Cc: Patrick Bellasi > Cc: Peter Zijlstra > Cc: Thomas Gleixner > Fixes: a509a7cd7974 ("sched/uclamp: Extend sched_setattr() to support utilization clamping") > Link: https://lkml.kernel.org/r/20190904075532.GA26751@gmail.com > Signed-off-by: Ingo Molnar > --- > kernel/sched/core.c | 82 +++++++++++++++++++++++++++-------------------------- > 1 file changed, 42 insertions(+), 40 deletions(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 010d578118d6..3c64eb28dd70 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -5105,39 +5105,44 @@ SYSCALL_DEFINE2(sched_getparam, pid_t, pid, struct sched_param __user *, param) > return retval; > } > > -static int sched_read_attr(struct sched_attr __user *uattr, > - struct sched_attr *attr, > - unsigned int usize) > +/* > + * Copy the kernel size attribute structure (which might be larger > + * than what user-space knows about) to user-space. > + * > + * Note that all cases are valid: user-space buffer can be larger or > + * smaller than the kernel-space buffer. The usual case is that both > + * have the same size. > + */ > +static int > +sched_attr_copy_to_user(struct sched_attr __user *uattr, > + struct sched_attr *kattr, > + unsigned int usize) > { > - int ret; > + unsigned int ksize = sizeof(*kattr); > > - if (!access_ok(uattr, usize)) > + if (!access_ok(uattr, ksize)) > return -EFAULT; > I believe this should be either usize or kattr->size and moved below (or just reuse min(usize,ksize)). Otherwise, an old binary that uses the old ABI (that is, VER0 as size) may get EFAULT here. This should almost never in practice. I tried, but I could hardly use an address that would fail access_ok. But, theoretically, that still breaks ABI. Regards. Cascardo. > /* > - * If we're handed a smaller struct than we know of, > - * ensure all the unknown bits are 0 - i.e. old > - * user-space does not get uncomplete information. > + * sched_getattr() ABI forwards and backwards compatibility: > + * > + * If usize == ksize then we just copy everything to user-space and all is good. > + * > + * If usize < ksize then we only copy as much as user-space has space for, > + * this keeps ABI compatibility as well. We skip the rest. > + * > + * If usize > ksize then user-space is using a newer version of the ABI, > + * which part the kernel doesn't know about. Just ignore it - tooling can > + * detect the kernel's knowledge of attributes from the attr->size value > + * which is set to ksize in this case. > */ > - if (usize < sizeof(*attr)) { > - unsigned char *addr; > - unsigned char *end; > - > - addr = (void *)attr + usize; > - end = (void *)attr + sizeof(*attr); > - > - for (; addr < end; addr++) { > - if (*addr) > - return -EFBIG; > - } > + kattr->size = min(usize, ksize); > > - attr->size = usize; > - } > - > - ret = copy_to_user(uattr, attr, attr->size); > - if (ret) > + if (copy_to_user(uattr, kattr, kattr->size)) > return -EFAULT; > > + printk("sched_attr_copy_to_user(): copied %d bytes. (ksize: %d, usize: %d)\n", kattr->size, ksize, usize); > + > return 0; > } > > @@ -5145,20 +5150,18 @@ static int sched_read_attr(struct sched_attr __user *uattr, > * sys_sched_getattr - similar to sched_getparam, but with sched_attr > * @pid: the pid in question. > * @uattr: structure containing the extended parameters. > - * @size: sizeof(attr) for fwd/bwd comp. > + * @usize: sizeof(attr) that user-space knows about, for forwards and backwards compatibility. > * @flags: for future extension. > */ > SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr, > - unsigned int, size, unsigned int, flags) > + unsigned int, usize, unsigned int, flags) > { > - struct sched_attr attr = { > - .size = sizeof(struct sched_attr), > - }; > + struct sched_attr kattr = { }; > struct task_struct *p; > int retval; > > - if (!uattr || pid < 0 || size > PAGE_SIZE || > - size < SCHED_ATTR_SIZE_VER0 || flags) > + if (!uattr || pid < 0 || usize > PAGE_SIZE || > + usize < SCHED_ATTR_SIZE_VER0 || flags) > return -EINVAL; > > rcu_read_lock(); > @@ -5171,25 +5174,24 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr, > if (retval) > goto out_unlock; > > - attr.sched_policy = p->policy; > + kattr.sched_policy = p->policy; > if (p->sched_reset_on_fork) > - attr.sched_flags |= SCHED_FLAG_RESET_ON_FORK; > + kattr.sched_flags |= SCHED_FLAG_RESET_ON_FORK; > if (task_has_dl_policy(p)) > - __getparam_dl(p, &attr); > + __getparam_dl(p, &kattr); > else if (task_has_rt_policy(p)) > - attr.sched_priority = p->rt_priority; > + kattr.sched_priority = p->rt_priority; > else > - attr.sched_nice = task_nice(p); > + kattr.sched_nice = task_nice(p); > > #ifdef CONFIG_UCLAMP_TASK > - attr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value; > - attr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value; > + kattr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value; > + kattr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value; > #endif > > rcu_read_unlock(); > > - retval = sched_read_attr(uattr, &attr, size); > - return retval; > + return sched_attr_copy_to_user(uattr, &kattr, usize); > > out_unlock: > rcu_read_unlock();