Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp1045067imm; Fri, 29 Jun 2018 10:24:36 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdN86uU7agPZF9x/h631q12MT+P6IO5VRG2gdfTbhOgok+FCWxOCL9/jYhRmypyZpCAKKjT X-Received: by 2002:a62:859c:: with SMTP id m28-v6mr15410786pfk.42.1530293076315; Fri, 29 Jun 2018 10:24:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530293076; cv=none; d=google.com; s=arc-20160816; b=xlQ04WIqCGuQkLfcjFFONlbvWBR0f9L9ZfQkAyNPXhWkbOk5JrviuJbMo/Lkvp9QdM 3jRFsqV7NyflFYAuI+HefRbrMlEuNbelFSHvwgL8hEBoZKhdoQ/1U0EJ3z2pU05hvOA6 sdaf7ah52sZbXCGVTyfM+MACycLKwYwYFGN8eLVmqUZyhGyIDxuKWlSv3skP/mCjY7T2 +zoS5ZwTqa2LjBV1dZZ0v1K4hLvVz+dn5g8wzBYnK3zHHi8DOiOurWtZ0LJFCj82sHkX i6v/cEfQv8H5m75uDIVA3EapCqo7TFYHIz76N+ZFU3V5tRJQzGWYJyknogw9ZWnjrd5G kxkQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=XI5UkdckCnG7xTh4owxw7x8a8ud/+R2SLmnuFHICi6E=; b=Ssw1nL2Jt+bq9T3vQ4984zMge7o3Vu5feIxrKE7gYK0DALkCOCUuD88LIvZ9neiw35 Ap35HNzOyF7WtA5I9k8g5xxfbz6madTiRB8X9GDUUppjL3WdJ8WqJGFI8WEhg1HCR6lW 2qAVAdlTdIQ5NBkZ1rEcHHEzdAj6rfAuenZR4McHUP4mceKN0x+kbQXg8xr24AcR2lYA F7Bw7xFkWDfvCSErw2H6OGQLXMIZ95M99RNA3gQPN9V8FdvvGMADN2TAtDd7Is8DTtrG 9l+OXpbEs4ERBfcTueSkB2OzN0ZmY4sqtISqKIyJBxOtZ46k/H7D57lXU/rI1Ct4MKJs e8Sw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b="h3B/oeX6"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t127-v6si10151639pfc.174.2018.06.29.10.24.21; Fri, 29 Jun 2018 10:24:36 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b="h3B/oeX6"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966836AbeF2QH7 (ORCPT + 99 others); Fri, 29 Jun 2018 12:07:59 -0400 Received: from mail-io0-f193.google.com ([209.85.223.193]:39633 "EHLO mail-io0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S936139AbeF2QH5 (ORCPT ); Fri, 29 Jun 2018 12:07:57 -0400 Received: by mail-io0-f193.google.com with SMTP id e13-v6so8907369iof.6 for ; Fri, 29 Jun 2018 09:07:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amacapital-net.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=XI5UkdckCnG7xTh4owxw7x8a8ud/+R2SLmnuFHICi6E=; b=h3B/oeX6VgUALb6aSXySncgpi0OdIURgIQkNUEei7IW6P06Z65JUq6minPNZiqxg7r oJiSqVWhbJTIScrzhapCAtfCJDFu1iFiL7mEJwlrUw/7qu+ZAYDXfhCAiW9mwxMNdh/D rK1cf0JLCMqPmE417o6EGyN2mK2ezJ2WNxkBUgNf3OlwDiL5F4bm9h7pD6ce8BbW3Od7 4s3hWrYbT6XIF4KYuxruAEja1cwm0IOmzGdcVMgy1QdCsBdEvXLl6KJdPz7WA4x+ykQM hAMg+qo1CVs8x6lBni+L96VFQHzXkmTN3S/4MH5hNIjrVGvH+TOhTsCp9+7RlqpxuTbv pDAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=XI5UkdckCnG7xTh4owxw7x8a8ud/+R2SLmnuFHICi6E=; b=ObpAjdMU1IqfLZCb2TzpYdWB46QUVXy2gvT+DVvkRNczdo2j0Gmva4it5yF9qgavNp 2CE+v0VYP7Ruih6MAXcGTWlTO1OEjbcnRe+TNVH8B80CbYWAyZVFUyMucTyxzXPnmhfy xTIHZHsOpKPRsUc8tOdjehSm4U+MuR7CGUQKbXhUl1KV9/oZJwXL0/K9yr6dJmS76FXE AJbNJqEFDEd/jhnWRZDCDN5L0M1Ez/1SBikx/jCBjXLIDP+0nuXi9Un+bPInYQw+2SIS O7RSAx+0wlGES6wVmZjiIQjPIHvOXXBoCN6fk7KYe9Qm2NGUXLM70eXfraeQd+Aq0vJS oqcA== X-Gm-Message-State: APt69E1bCHqFpnqFu+dgEv3DNeDKHV5/D1Xrza8SegG3UfKQ60IGsSK/ 7WXsxsQiAmlq8wOKRlzucPQEZYWE0RvPQoiaOYm33A== X-Received: by 2002:a6b:abc6:: with SMTP id u189-v6mr13063570ioe.30.1530288476622; Fri, 29 Jun 2018 09:07:56 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a1c:7e92:0:0:0:0:0 with HTTP; Fri, 29 Jun 2018 09:07:36 -0700 (PDT) In-Reply-To: References: <20180628162359.9054-1-mathieu.desnoyers@efficios.com> <9200ED2A-AE4B-4094-81C9-E92240B4840F@amacapital.net> <1706339668.9644.1530281144560.JavaMail.zimbra@efficios.com> <729451355.9702.1530284622326.JavaMail.zimbra@efficios.com> From: Andy Lutomirski Date: Fri, 29 Jun 2018 09:07:36 -0700 Message-ID: Subject: Re: [RFC PATCH for 4.18 1/2] rseq: validate rseq_cs fields are < TASK_SIZE To: Linus Torvalds Cc: Mathieu Desnoyers , Andrew Lutomirski , Thomas Gleixner , linux-kernel , linux-api , Peter Zijlstra , "Paul E. McKenney" , Boqun Feng , Dave Watson , Paul Turner , Andrew Morton , Russell King , Ingo Molnar , "H. Peter Anvin" , Andi Kleen , Chris Lameter , Ben Maurer , rostedt , Josh Triplett , Catalin Marinas , Will Deacon , Michael Kerrisk , Joel Fernandes Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 29, 2018 at 8:27 AM, Linus Torvalds wrote: > > > On Fri, Jun 29, 2018, 08:03 Mathieu Desnoyers > wrote: >> >> >> Considering those inconsistencies between architectures (either >> the task gets killed, or the top bits are silently cleared), I'm >> very much tempted to be restrictive in the inputs accepted by >> rseq, and not rely on architectures as providing consistent >> validation of the return IP. >> >> Thoughts ? > > > Then you need to make it a compat system call, since clearly you and Andy > want the 32-bit case to do something different from the 64-bit case. I personally would like the compat and non-compat cases to do exactly the same thing. If abort_ip is the address (as a u64) of a valid executable instruction and an abort happens, then that instruction should get executed. If abort_ip does not point to user-executable memory, then the process should get a signal. The problem isn't with rseq per se -- it's with the daft way that the x86 return-to-userspace instructions work. If I apply this patch: diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c index 3b2490b81918..26e4ba44e87b 100644 --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -346,6 +346,8 @@ __visible void do_int80_syscall_32(struct pt_regs *regs) { enter_from_user_mode(); local_irq_enable(); + if (!user_64bit_mode(regs)) + regs->ip |= (1UL << 32); do_syscall_32_irqs_on(regs); } the kernel *still works*. But this unconditionally uses the IRET path, and I don't even want to speculate as to what the hell happens if we exit using SYSRETL, or LRET, or SYSEXITL, or if Intel or AMD ever gives us a new mode that gets rid of the espfix shite. IOW, I don't think that the x86 entry code should make a promise that it will continue ignoring the high bits of regs->ip when !user_64bit_mode(regs) on a 64-bit kernel. The problem with rseq as it stands is that this oddity gets accidentally exposed all the way to userspace. If we're not careful, it'll be possible for a slightly buggy user program to goof up the code that generates the data structure that supplies abort_ip such that the high bits are garbage (0xcccccccc due to padding, for example), and the kernel will do exactly what the user code requested, and we'll get regs->ip = 0xcccccccc00000000 | (the actual intended abort_ip), and we'll pass that crap value all the way to IRET, and IRET will truncate it back down to the correct value. And then some CPU will add new behavior or we'll invoke SYSRETL or whatever on some weird CPU, and the program will crash. And we'll be sad. I suppose we could handle this in the entry code by coming up with a way to reject out-of-bounds regs->ip for 32-bit tasks, but that's going to be a bit messy and will slow down normal code that doesn't use rseq. Other than rseq, I don't think that there's any real issue. The only ways to get regs->ip >= 2^32 in a 32-bit task involve ptrace or manual fiddling with signal contexts, and I don't expect to ever have any real software depend on precisely what happens.