Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp413881imm; Thu, 28 Jun 2018 23:24:38 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJh4kMOckb9R69nvUu80yieUqZ4tPLwjYyBaPnBHAPKaDRhDOHE5vtv/5CBX0aFy7upQxeo X-Received: by 2002:a17:902:8b8c:: with SMTP id ay12-v6mr13536588plb.74.1530253478838; Thu, 28 Jun 2018 23:24:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530253478; cv=none; d=google.com; s=arc-20160816; b=HOUlvc1Wpg1wlOMm8njd2A7wvQZrdMQtQdqW2JseisqNZZPhi9q5ZvlBvsJMUOGI9J Ighb6rceRfOQngf6akSIl32LCMNycPBmHeYnQDPxS4vqL26gEglA5kiXmYB7RtEV8Oxx hA8qVbAVqtELUI043An9UnOHOr9//2z64P4g/JFPl7HAXQDlR6zl3N3q9QYfeFg9nHRe wmbfDzqqrxNla24xpTDYrDiOMhoqLueZc5l2ft7mtjxdTCpbkAqRNcna8o12WiCbBVCP k5362AYFGTFta9+KlVtx3pOPT3ZMzqdhvejQGHZGJTbLr7lyiIF8cP63nr6TI7Q3E9wP /VUw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:thread-index:thread-topic :content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:dkim-signature:dkim-filter :arc-authentication-results; bh=YckiZvn5wBuACqA+RRhpifkE73IH1S7cDe+V0Iqhaqw=; b=HxYXwWtkqBZ5Tkvu1n/J8sjW7uW80hfEky9z6KeM82WbU6lMCFmNGtfC1S3Sp4dZqm Qrkb1uMpGbWhJLaw/gxfgTkz2tdI50if/Gqo2RCvf7WIHPVvxGa5Z0b+PXTM0UzLKy4W bM2Hy969Ad3W3cDip75pXohKGYjAH/7N04p/jTxsE0ddrdLh1atcw+LCc2bTYpBKSKBs TOa/vMhwTwhYz0029XjpHcRXOI6K97Cv8rh7j3eSUSK+lmKgCLSiGR+HwhW2qj9qfhUG E1kzx9Qekk82pp0Lu7CWgCHY/MYp7G0G5LrEH4HlOb+Ynb4QN2EZXANQueYIWrniJrpb 7pew== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b=AtoCL+gY; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s25-v6si7596545pgd.188.2018.06.28.23.24.24; Thu, 28 Jun 2018 23:24:38 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b=AtoCL+gY; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935805AbeF1W3W (ORCPT + 99 others); Thu, 28 Jun 2018 18:29:22 -0400 Received: from mail.efficios.com ([167.114.142.138]:55972 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932522AbeF1W3V (ORCPT ); Thu, 28 Jun 2018 18:29:21 -0400 Received: from localhost (ip6-localhost [IPv6:::1]) by mail.efficios.com (Postfix) with ESMTP id 19CE222ECB4; Thu, 28 Jun 2018 18:29:20 -0400 (EDT) Received: from mail.efficios.com ([IPv6:::1]) by localhost (mail02.efficios.com [IPv6:::1]) (amavisd-new, port 10032) with ESMTP id nBjt-Rf-398N; Thu, 28 Jun 2018 18:29:19 -0400 (EDT) Received: from localhost (ip6-localhost [IPv6:::1]) by mail.efficios.com (Postfix) with ESMTP id 55F1F22ECAD; Thu, 28 Jun 2018 18:29:19 -0400 (EDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.efficios.com 55F1F22ECAD DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=default; t=1530224959; bh=YckiZvn5wBuACqA+RRhpifkE73IH1S7cDe+V0Iqhaqw=; h=Date:From:To:Message-ID:MIME-Version; b=AtoCL+gYlUo8KkqhJyJKtvYmuVouIw3xNYrgGp8MYZlU3l5TWq9/NyFooXSW9zchs VK46h6ICcr1Mul5f/LjoCLAcU5gvl7J3a/Z2DEu5wFXPttU4+HbR28PtbJ4+hQ0IgA wN+9HXZH8fFErK7LWxPM2zleK4IWNRnYcWK0M8vqwsOIZUs4XOtFQ+SDNCwg2dRaRQ ZbwlL297JAEtX5qDk99SRs70ZIAYhrrtp2SSInmr52YowgnlJsFbjJ2rLfLECz3JLu ZDqQmbrhzXyNT06bP7rsbCLmTnG/FndhIgwLRd1lpJAD95VR8iaI3KxldjXUYtGXWk iwz63fQ0zfHTg== X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([IPv6:::1]) by localhost (mail02.efficios.com [IPv6:::1]) (amavisd-new, port 10026) with ESMTP id ub1kI70s2xVN; Thu, 28 Jun 2018 18:29:19 -0400 (EDT) Received: from mail02.efficios.com (mail02.efficios.com [167.114.142.138]) by mail.efficios.com (Postfix) with ESMTP id 37ABF22ECA4; Thu, 28 Jun 2018 18:29:19 -0400 (EDT) Date: Thu, 28 Jun 2018 18:29:18 -0400 (EDT) From: Mathieu Desnoyers To: Linus Torvalds Cc: Andy Lutomirski , Thomas Gleixner , linux-kernel , linux-api , Peter Zijlstra , "Paul E. McKenney" , Boqun Feng , Dave Watson , Paul Turner , Andrew Morton , Russell King , Ingo Molnar , "H. Peter Anvin" , Andi Kleen , Chris Lameter , Ben Maurer , rostedt , Josh Triplett , Catalin Marinas , Will Deacon , Michael Kerrisk , Joel Fernandes Message-ID: <1821591223.9479.1530224958939.JavaMail.zimbra@efficios.com> In-Reply-To: References: <20180628162359.9054-1-mathieu.desnoyers@efficios.com> Subject: Re: [RFC PATCH for 4.18 1/2] rseq: validate rseq_cs fields are < TASK_SIZE MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [167.114.142.138] X-Mailer: Zimbra 8.8.8_GA_2096 (ZimbraWebClient - FF52 (Linux)/8.8.8_GA_1703) Thread-Topic: rseq: validate rseq_cs fields are < TASK_SIZE Thread-Index: 11XCFtdD9OAfrodbawODhQ2jv3dcYg== Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ----- On Jun 28, 2018, at 5:22 PM, Linus Torvalds torvalds@linux-foundation.org wrote: > On Thu, Jun 28, 2018 at 1:23 PM Andy Lutomirski wrote: >> >> This is okay with me for a fix outside the merge window. Can you do a >> followup for the next merge window that fixes it better, though? In >> particular, TASK_SIZE is generally garbage. I think a better fix >> would be something like adding a new arch-overridable helper like: >> >> static inline unsigned long current_max_user_addr(void) { return TASK_SIZE; } > > We already have that. It's called "user_addr_max()". > > It's the limit we use for user accesses. Good point. I will simply replace TASK_SIZE with user_addr_max() then. > That said, I don't see why we should even check the IP. It's not like > that's done by signal handling either. The goal stated by Andy and Thomas is to design rseq so it does not need any compat syscall whatsoever. Considering that this is a new system call, we should be able to do that. One thing we want is to provide a consistent behavior when a 32-bit binary is executed on native 32-bit kernel or on 64-bit kernel in compat mode, even if userspace chooses to put garbage in the upper 32-bit of padding within 64-bit fields. Now let's look at the comparison with signals. If we look at ia32_setup_frame() for instance, it sets: regs->ip = (unsigned long) ksig->ka.sa.sa_handler; where the sa_handler pointer has been received as input as a 32-bit pointer by the compat syscall rt_sigaction through struct compat_sigaction. I agree with you that it does not appear to validate that it's below user_addr_max(), but at least it's guaranteed to never be over 32-bit. The first big difference between rseq and signals: rseq does not have a compat structure. The layout is the same for both 32-bit and 64-bit userspace (see include/uapi/linux/rseq.h). Another difference between rseq and signals is that rseq only registers the TLS struct rseq for each thread. Then, it's up to user-space to update the rseq_cs field of struct rseq to indicate that it enters a critical section. So the actual content of (struct rseq *)->rseq_cs is updated with single-copy atomicity after registration of rseq. Therefore, the current critical section pointed to by the current rseq_cs user-space pointer also changes after registration. So we cannot validate the content of the rseq_cs field, nor of the fields contained within every possible struct rseq_cs descriptor when registering rseq through sys_rseq: those need to be read when returning to user-space. Not just on return from system call, but also on return from interrupt/trap after a preemption. This is very much different from registering a sigaction, where the kernel can validate or truncate the content of sa_handler at will. Without validation of the content of e.g. rseq_cs->abort_ip (read as a 64-bit integer by a 64-bit kernel), we end up setting the return IP to that address on abort, even though user-space may have put garbage in the high bits: instruction_pointer_set(regs, (unsigned long)rseq_cs.abort_ip); without any validation or truncation whatsoever. I'm concerned that some architecture code may not deal so well without prior validation or truncation of the IP register content upper 32 bits when returning to a 32-bit compat task. This is why I'm considering comparison of abort_ip against user_addr_max() to ensure we're not provided with an incorrect user input. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com