Received: by 10.213.65.68 with SMTP id h4csp1587380imn; Thu, 29 Mar 2018 07:27:13 -0700 (PDT) X-Google-Smtp-Source: AIpwx4/+IMC1PO6+EzPJTiEvnzNhWJVMGDPnMFjXi/NSIKfaVmWN16gNcW0KkBGjdvzcyJCUCFlI X-Received: by 10.98.68.86 with SMTP id r83mr6576976pfa.145.1522333633818; Thu, 29 Mar 2018 07:27:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522333633; cv=none; d=google.com; s=arc-20160816; b=epzSmWpQVEgnD0mJF2Ay2tKSIIhlsbT/siOWwfQ2DbeZhJ/x+HWheBTBrIW5lWA+sw LMkkrXoZWDu2QlQ9E9n1UcHzzSZkIez+4hINj2Q5A06/VzN5q9m6u4LjztTRk8Al6fKG LKaGsZzPG0x41K6mdQxKkTAZOs0OYCz1TBvAzcC0KGmDlICxHXXZqpcPGa8kzvyupK7Q maEjGX2NBOfEDpR4RmLMl8hqhKItNUjYH0g3ZCnJYylV6O1Kt1rg2AlS++CEDh/HQIAR 53a4enGCP3icImGGQsiaDXGNwZnxx9MVdRjZ/jANaOaJSlgEr4Jkcck239C0KDZQrU/r AHMw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=bfNrq9oWMmrySo7t4MiGbIp6C1aOSsDOTdvTEGff9/E=; b=DWdUrJ/lvYD02dRwxCIn9h9IIrJjMUfHvc7vfpzEp31n7mEkxTD+DmIsv4f0H5ElEw eTMgTcTQLoJHY8uOWvRJzYmSuFBMzTjdS/ayKuLP1KWs5+y8ugLdk4NLZAVjwWC5VXH5 swkY0PXUODKmWSam2RvXRnbWzkSN8+HCwpWP1Z6mSFsr9fxD9gGCN2UsJkyggU8cxO3k m4zx2j8CFx7geUWqfukYSFWwEqmXqttGCL0E5F1D/EP+NhvRW4SvCziRs0AwwHEoGDXQ k8DM2AO23+BrX7GltoJ1PzRE3FgQw4WOEvE1MmzM5Mb+OLMWOoo/TL4P1NsZrFNNtub4 0PDQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=dSlwO6h8; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a34-v6si6171286pld.215.2018.03.29.07.27.00; Thu, 29 Mar 2018 07:27:13 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=dSlwO6h8; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752069AbeC2OYP (ORCPT + 99 others); Thu, 29 Mar 2018 10:24:15 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:47590 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750884AbeC2OYN (ORCPT ); Thu, 29 Mar 2018 10:24:13 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=bfNrq9oWMmrySo7t4MiGbIp6C1aOSsDOTdvTEGff9/E=; b=dSlwO6h8tw5nLNzNzyMc0hWC4 cwanyCbWxfpeQZ+ePtktOFZZtivTCT+Iw0nMaDulQGU4WDMlWa++hev9jImGaFFdxkXUJrTG8OHYU 7uSdIE2wTD1RuJzXerSFZbNTctNhZ9zvosqKv506w1B4xtG81HGX8yUNzIt9I3gLdZFIwxJJryaLm P0BIfEjjejD+mSHeOwuew5MSEtiRjiKeWOA2xRj2TnkroqlXyDGUq3tXOnM/4GAMB08kELgAiv46s 0ZcbOP0900VaSPQ1vamWkX/4w8u3YcopQhGcMqK+WB07t49osBx0e1W7S8RiSkvPLMnqwoNIuNpEA l2DtEEZLg==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by bombadil.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1f1YSq-0008R4-70; Thu, 29 Mar 2018 14:23:40 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id B64E1200A779C; Thu, 29 Mar 2018 16:23:38 +0200 (CEST) Date: Thu, 29 Mar 2018 16:23:38 +0200 From: Peter Zijlstra To: Mathieu Desnoyers Cc: Thomas Gleixner , "Paul E. McKenney" , Boqun Feng , Andy Lutomirski , Dave Watson , linux-kernel , linux-api , Paul Turner , Andrew Morton , Russell King , Ingo Molnar , "H. Peter Anvin" , Andrew Hunter , Andi Kleen , Chris Lameter , Ben Maurer , rostedt , Josh Triplett , Linus Torvalds , Catalin Marinas , Will Deacon , Michael Kerrisk , Alexander Viro Subject: Re: [RFC PATCH for 4.17 02/21] rseq: Introduce restartable sequences system call (v12) Message-ID: <20180329142338.GD4043@hirez.programming.kicks-ass.net> References: <20180327160542.28457-1-mathieu.desnoyers@efficios.com> <20180328145946.GH4082@hirez.programming.kicks-ass.net> <265889560.1.1522250045589.JavaMail.zimbra@efficios.com> <20180328152814.GI4082@hirez.programming.kicks-ass.net> <533214853.56.1522251426819.JavaMail.zimbra@efficios.com> <20180328174935.GK4082@hirez.programming.kicks-ass.net> <181076499.279.1522268382303.JavaMail.zimbra@efficios.com> <87410797.545.1522331641598.JavaMail.zimbra@efficios.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87410797.545.1522331641598.JavaMail.zimbra@efficios.com> User-Agent: Mutt/1.9.3 (2018-01-21) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 29, 2018 at 09:54:01AM -0400, Mathieu Desnoyers wrote: > Let's say we disallow system calls from rseq critical sections. A few points > arise: > > - We still need to allow traps (page faults, breakpoints, ...) within rseq c.s., > > - We still need to allow interrupts within rseq c.s., Sure, but all those are different entry points, so that shouldn't be a problem. > - We need to decide whether we just document that syscalls within rseq c.s. > are not supported, or we enforce a behavior if this happens (e.g. SIGSEGV). > If we enforce a SIGSEGV, we'd have to figure out whether it's worth it to > add extra branches to the system call fast path to validate this. Without enforcement someone will eventually do this :/ We might (maybe) get away with it being a debug option somewhere, but even that sounds like trouble. > - We need to carefully consider the case of system calls issued within signal > handlers nested on top of rseq. When RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL is > _not_ set, neither in the rseq c.s. descriptor nor in the TLS @flags, > it's pretty much straightforward: upon signal delivery, the kernel moves the > ip to abort, and clears the tls @rseq_cs pointer. This means that any system > call issued within the signal handler is not actually within the rseq c.s. > upon which the signal is nested. > > The case I worry about is if a thread sets the RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL > flag in its TLS @flags field (useful in a debugging scenario where we want a > debugger to single-step through the rseq c.s. and observe registers at each step). > Arguably, this is only ever used in development. However, it does allow a situation > where a system call executed within a signal handler can nest over a rseq c.s.. > So if we choose to be very strict and SIGSEGV any syscall nested over rseq > c.s., we may very well end up killing the process for no good reason in this > scenario. Yes, that needs a little thought; but when we run the signal handler, the IP would no longer be inside the active RSEQ, right? > - We need to decide whether all syscalls are disallowed, or if we want to pick > specific ones (e.g. fork()). All.