Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp213899imm; Mon, 2 Jul 2018 10:12:32 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJLbSv9DBjE5uia/7MAVj59xPeMiqPcc6W4K/+9RB64GvrvWZt5s8DJJaxC9VU1oZfI3My0 X-Received: by 2002:a65:6689:: with SMTP id b9-v6mr22453390pgw.326.1530551552063; Mon, 02 Jul 2018 10:12:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530551552; cv=none; d=google.com; s=arc-20160816; b=rlzAoD6xd4znWZhupMH7WumobsWPpwU0P+dO77fIn7CBOiFH1nQgtk2PTHBCVyh5gC zGxKq2Z2xuAcee/FoaI1TyT+nrmQKEL0pu/nauHnbhY8rXN2L5CTVnQ76ih244AzLaV0 vqMLPNSTc+JKDaWp1II4ODcnqtGFHSVsZw2qgrfO5Ko2Q2iEqzBIWs0rxTDkhvgQ7wpG RGSdVVb2OowIKvXJ2cSLgoivY4r6KB77c3bJpAQnToemIQ8H6a4S4yBt9dYpkohgRjzU /MtpIxz2F5zJMy79vd49ZhaKyOOcYrzasXgqLMrCi4ZrgkenNBrU4ZBP+A3m2O9Iaf7f 1JRg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature :arc-authentication-results; bh=4Y1cveiJhaXMM6jgnLyJ2woebnzzxIJZl+exoPSarUE=; b=VsUGxcMGnsEVcfANqQXKj53mfZZhQfsUZbwo9VLQ/fN8/M9na4qj6j2RD99blhH0CV x/eCsv7BZjh8ZmWQhfmVjkdlHRlo3qxt5DrMEENpHXm/eyEV4BmMc7UmL/kLhTjJsh/X eIEVse0dQ7Jrq9QP12v4givDwLrTTrH3sunh8SA8eJEQ8E/tW7WCuCQSioxoV9tSX0yC DTi9+eRc4l5ilz0AiNt4lgLBsHB7HRW2BvZonRcDRjSh6lqV53265QNXXwBHrjwFiopE pEiJYYmF82YhS2ucEXM+g0IogA+GIyWlXV2V//uWbvz8tWXawGo9+5RV3JTxrbm1bKEz k10w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=0Ag2z94O; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z13-v6si3043790pgk.127.2018.07.02.10.12.16; Mon, 02 Jul 2018 10:12:32 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=0Ag2z94O; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752324AbeGBRL2 (ORCPT + 99 others); Mon, 2 Jul 2018 13:11:28 -0400 Received: from mail-wm0-f67.google.com ([74.125.82.67]:40692 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752137AbeGBRL1 (ORCPT ); Mon, 2 Jul 2018 13:11:27 -0400 Received: by mail-wm0-f67.google.com with SMTP id z13-v6so9475499wma.5 for ; Mon, 02 Jul 2018 10:11:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amacapital-net.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=4Y1cveiJhaXMM6jgnLyJ2woebnzzxIJZl+exoPSarUE=; b=0Ag2z94OJiQwzFARuZz7ggPdAPfpJJEwSX4Vp+JmBlen/Eku82FQUOqNvrmDYbriMP Q35JDfMTF1WfW6YtAZCNLnmwfeuVBF9egW+s9T8jOHdyNEKP5Ebv466F7us2+zn2jSrs T6fnjtagfMFWXtnZ2iUbnjjb93WbxRLi7bDDF2PeVX2uYJ4L1EL6BHP8CGW5oaNDzEWY +oCLZgZhlRVpSa8HipOyfXcZlwmyEEc45msKCcKFFPsYWY6yLIJ1VfcrUIQX54QxabnL JRMFjhNET0zOG6tTGGl7Y7bdI0WpX+OXTg7VmKSQ5/QYYtJ49QRjlNIynXl3xXUfCAhs WGZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=4Y1cveiJhaXMM6jgnLyJ2woebnzzxIJZl+exoPSarUE=; b=DkndB+o5yWn7rmB/RldvPrfvoQ3VqnuGPqPvjQ/P7nA8M7fREHCUNbXtby2zXtn+nu 3KjwNYyTE2BFo13r3jfObMbB+avEAgDpjBASuu04UBbKK0zPEe8hxkEpvuFhNenL2GsT N0JB/pQDf59ogyA+5pngU6BgECKDWWqj3PfhnFAhxaBtv6FY76nlIz81v3m1X4g080vc B7sSTJq+VpHxee9hxF7yZIPynNAEsqtMVkTLVGcNxcB90AYehtOrBSEk7YFPrrsXEhEG EH5UKkNvMkJsbiZOUKt2+DEEC+oYAM0e69SWTBcml7P5rjuf4VdgxCty20ngwcy7nVJc noDQ== X-Gm-Message-State: APt69E2xILVMDYXou2MSzVaOqAXDnN9YPXiYgylHPF5Jlok6JbfWXHDE q0UxLdOk6GTgSfXgzzTUO+KMUZjZUdzHD63hvwtW3g== X-Received: by 2002:a1c:f20d:: with SMTP id s13-v6mr8446986wmc.36.1530551486082; Mon, 02 Jul 2018 10:11:26 -0700 (PDT) MIME-Version: 1.0 References: <20180628162359.9054-1-mathieu.desnoyers@efficios.com> <729451355.9702.1530284622326.JavaMail.zimbra@efficios.com> <247789350.9741.1530288432573.JavaMail.zimbra@efficios.com> <184287091.10022.1530301738384.JavaMail.zimbra@efficios.com> <1527399163.10673.1530541966296.JavaMail.zimbra@efficios.com> In-Reply-To: <1527399163.10673.1530541966296.JavaMail.zimbra@efficios.com> From: Andy Lutomirski Date: Mon, 2 Jul 2018 10:11:13 -0700 Message-ID: Subject: Re: [RFC PATCH for 4.18 1/2] rseq: validate rseq_cs fields are < TASK_SIZE To: Mathieu Desnoyers Cc: Linus Torvalds , Andrew Lutomirski , Thomas Gleixner , LKML , Linux API , Peter Zijlstra , "Paul E. McKenney" , Boqun Feng , Dave Watson , Paul Turner , Andrew Morton , Russell King , Ingo Molnar , "H. Peter Anvin" , Andi Kleen , Christoph Lameter , Ben Maurer , Steven Rostedt , Josh Triplett , Catalin Marinas , Will Deacon , Michael Kerrisk , Joel Fernandes Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jul 2, 2018 at 7:32 AM Mathieu Desnoyers wrote: > > ----- On Jun 29, 2018, at 4:39 PM, Andy Lutomirski luto@amacapital.net wrote: > > > On Fri, Jun 29, 2018 at 12:48 PM, Mathieu Desnoyers > > wrote: > >> There are two aspects I'm concerned about here: > >> > >> 1) security: we don't want 32-bit user-space to feed a 64-bit value over 4GB > >> as abort_ip that may end up causing OOPSes on architectures that would > >> lack proper validation of those values on return to userspace. > > > > I'm not too worried about this. As long as you're doing it from > > signal-delivery context (which you are AFAICT) you're fine. > > No, it's not just signal-delivery context. It's _also_ called from > return to usermode loop, which can by called on return from > interrupt/trap/syscall. > TIF_NOTIFY_RESUME context in the exit slowpath is fine, too. > > > > But I re-read the code and I think I have a really straightforward > > solution. Two choices: > > > > (1) Change instruction_pointer_set() to return an error code if the > > address passed in is garbage in a way that could cause unexpected > > behavior (like >=2^32 on x86_64 if regs->cs is 32-bit). It has very > > very few callers. > > This would take care of my security concern wrt abort_ip, but would not > provide consistent behavior for the other fields. Also, perhaps this > kind of change should aim the next merge window ? It's not about security. The idea is that instruction_pointer_set() should return some indication of whether it actually set the instruction pointer to the requested value. On x86, if you have !user_64bit_mode(regs) and you call instruction_pointer_set() to set ip to 0xbaadc0de12345678, then you end up with a state where we will probably execute user code at the address 0x12345678. Conversely, if you have user_64bit_mode(regs) == true and you set ip to 0xbaadc0de12345678, then you will end up sending a signal to the task because 0xbaadc0de12345678 is not executable (and, in fact, is highly likely to be noncanonical). So I would argue that the semantics *should* be: /* * Attempts to modify @regs such that the next user instruction to be executed is * the instruction at @addr. instruction_pointer_set() may return false to indicate * that addr was invalid in the sense that the next user instruction executed * might be some other address instead. The most likely cause is that * regs refers to a 32-bit compat context, addr != (u32)addr, and the architecture * might silently truncate the address on the next return to user code. * * instruction_pointer_set() must only be called from a context in which the architecture * allows arbitrary modifications of @regs. * * Architecture implementations promise that calling instruction_pointer_set() will not * crash or otherwise corrupt the kernel when called from a valid context, regardless * of what value is passed in @addr. */ bool instruction_pointer_set(struct pt_regs *regs, unsigned long addr); > > > > > (2) Add instruction_pointer_validate() to go along with > > instruction_pointer_set(). > > > > That should be enough to solve the problem, right? > > This would only handle the "security" part of the matter, which > is specifically related to rseq->rseq_cs->abort_ip. > > What is left is ensuring that we have consistent behavior for > other fields: > > [ Note: we have introduced this helper macro: LINUX_FIELD_u32_u64 > which defines a field which is 64-bit for 64-bit processes, and 32-bit > with 32-bit of padding for 32-bit processes. ] > > * rseq->rseq_cs: (userspace pointer to user-space, updated by user-space > with single-copy atomicity): current type: LINUX_FIELD_u32_u64, > cannot be changed to __u64 due to single-copy atomicity requirement, > > * rseq->rseq_cs->start_ip: currently a LINUX_FIELD_u32_u64, > could become a __u64, > > * rseq->rseq_cs->post_commit_ip: currently a LINUX_FIELD_u32_u64, > could become a __u64, > > * rseq->rseq_cs->abort_ip: currently a LINUX_FIELD_u32_u64, > could become a __u64, > > For abort_ip, changing the type to __u64 and using the > instruction_pointer_validate() approach you propose would work. > > For start_ip and post_commit_ip, we need to decide whether we > want to kill a 32-bit process setting the high bits or if we just > accept and use the full __u64 content on both 32-bit and 64-bit > kernels. Those two fields are only used for arithmetic comparison. > Using the full __u64 content means using 64-bit arithmetic on > 32-bit native kernels though. Just use the 64-bit values, I think. I see no point in killing the task. > > For rseq->rseq_cs, we cannot use __u64 due to single-copy atomicity > update requirement for 32-bit processes. However, we are using this > field in a copy_from_user(), so it will EFAULT if the high-bits are > set by a compat 32-bit task on a 64-bit kernel. We can therefore check > that the padding is zeroed explicitly on a native 32-bit kernel to > provide a consistent behavior. Specifically because rseq->rseq_cs is > checked with access_ok(), it is therefore enough to check the padding > when __LP64__ is not defined by the preprocessor. Agreed. > > But rather than trying to play games with input validation, I would > favor an approach that would allow rseq to validate all its inputs > straightforwardly. Introducing user_64bit_mode(struct pt_regs *) > across all architectures would allow doing just that. I would be okay with that, too, but I think it would have to be user_64bit_mode(task, regs), since sane architectures would have the task bitness somewhere other than in regs. x86 is IMO rather weird in this regard. When I added user_64bit_mode(), I didn't envision its use outside x86 arch code. > AFAIU this could be achieved by re-introducing is_compat_task() on x86 as: > > #ifdef CONFIG_COMPAT > static bool is_compat_task(void) > { > return user_64bit_mode(current_pt_regs())); > } > #else > static bool is_compat_task(void) { return false; }; > #endif > > Or am I missing something ? is_compat_task() historically literally meant "am I in a compat system call". It never worked consistently on x86 outside of syscall context. While I do have fundamental objections to having a generic concept of "is this a compat task?" on Linux, that's not why I removed is_compat_task(). I removed it because it didn't do what the name suggested. Unfortunately, while it's gone from generic code, it's still there on non-x86 arches, and it probably still has inconsistent semantics. So I don't want to re-add it. But I think that the limited solution of changing instruction_pointer_set() really is a sufficient architecture-dependent change to fully solve your problem.