Received: by 10.213.65.68 with SMTP id h4csp379256imn; Wed, 28 Mar 2018 05:31:30 -0700 (PDT) X-Google-Smtp-Source: AIpwx49B2yPt8gYA6zd2W4IbxxRKSloXEzXh7I7i9k9r3JL/BjCAlvGv2HR6iww85dpB0Em0HFAU X-Received: by 10.99.53.6 with SMTP id c6mr2476382pga.413.1522240290831; Wed, 28 Mar 2018 05:31:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522240290; cv=none; d=google.com; s=arc-20160816; b=CqB729QFNGSwheThtfwP48Jf4jNZxrGtsUL5q+G+++r56/5D7qzCBoGfetpiVHV+NO U98IJZpbeK/R6MuZ1NDgZj3AkjkTHjpTZPNyq1ZJ3sTE+oBjFtO/5BHVCcPvXOUyNaSP hM1aV8AwohJy5eoYWc4P0oHGZOnoU7o/6fOYeED6Y1Ap8INVwrKDPzN0SwtT+nSoja/V heCRYxbYi4eYMKFfRFh4EXoOjlYFDVHtvM8hi/tjMXjSiEGzogQAaI8VOGNrRD8Q1/ja sq5ScbnWs/xmw6ZCg3GhPLfydYqcJE8wpzp5RnJF1r2QRAaFrIqBUTAe7kxXldZH4pjv fHLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=8X3WscnZn5eGpBVESWX1p2xB/zgOkrsq1ap/dKJ6ZVg=; b=VtXllgSenKheRpSIeWmde5dxmcXdns5cxKNxvKHRSRC5zHuLoSuBU5L89pEjCiUr64 UrQPVg3dpjfwSR9x8hdakIt3Z5kNvX3CTcbW7UR6iN4Lm1MsCoZMKz6pg3lZR4sRqVy5 OlBSQi0ON+y7SxoxfultOnyBjEe4k80dmlLJjZGjIpYqxfjwLfLlHFH2u1as52MQ1Hqy G2kc87SziL532SbR1P3vADMrmUVc1HsonTyHmKnCPyyfAhi8q8BKbYbcvo8Hb/XMhu8p 7QHalAEMUgN4+ZS7AesFvsTruRmf01zqlgulpBjWcgeDMTbN19jerWhHeLx85tZzYXmp X5Zw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=sVszCnEz; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l75si2723994pfj.375.2018.03.28.05.31.15; Wed, 28 Mar 2018 05:31:30 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=sVszCnEz; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752639AbeC1MaN (ORCPT + 99 others); Wed, 28 Mar 2018 08:30:13 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:35098 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752488AbeC1MaK (ORCPT ); Wed, 28 Mar 2018 08:30:10 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=8X3WscnZn5eGpBVESWX1p2xB/zgOkrsq1ap/dKJ6ZVg=; b=sVszCnEzvh7ofeROtSJHL3jTX Xj7TemuOipmGD9odeonbYwOqmKGYSv0CKpIMX3HzKv34ZSFjHRJjECSFqT/D6TENTOnA8ZegTtUno BuFGA/mSVdW2Hjqe26OMKJ3pkspIVh12p3ZRxUmeojh7ImhsfvVzydrFwox48eRdXiigQZ9TsPMws zjVhfmmdZKCAEdenQyZyG4512YFoXHEX9YyH82Jt5ZWzV/wfgJozdAAhOG6xknmJ7yq3IaaX3th4q 87YbNx0ytJLuEih5MeTfb1R2O2fJxE8MNU8rFE7gXgJkhGhGa6QNAL+njWAm/pebQRDUmwo0E82BV nwU0KEaYw==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by bombadil.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1f1AD6-0006b2-HJ; Wed, 28 Mar 2018 12:29:48 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 2D23720298BAC; Wed, 28 Mar 2018 14:29:46 +0200 (CEST) Date: Wed, 28 Mar 2018 14:29:46 +0200 From: Peter Zijlstra To: Mathieu Desnoyers Cc: "Paul E . McKenney" , Boqun Feng , Andy Lutomirski , Dave Watson , linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, Paul Turner , Andrew Morton , Russell King , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , Andrew Hunter , Andi Kleen , Chris Lameter , Ben Maurer , Steven Rostedt , Josh Triplett , Linus Torvalds , Catalin Marinas , Will Deacon , Michael Kerrisk , Alexander Viro Subject: Re: [RFC PATCH for 4.17 02/21] rseq: Introduce restartable sequences system call (v12) Message-ID: <20180328122946.GU4043@hirez.programming.kicks-ass.net> References: <20180327160542.28457-1-mathieu.desnoyers@efficios.com> <20180327160542.28457-3-mathieu.desnoyers@efficios.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180327160542.28457-3-mathieu.desnoyers@efficios.com> User-Agent: Mutt/1.9.3 (2018-01-21) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 27, 2018 at 12:05:23PM -0400, Mathieu Desnoyers wrote: > +static int rseq_update_cpu_id(struct task_struct *t) > +{ > + uint32_t cpu_id = raw_smp_processor_id(); u32 > + > + if (__put_user(cpu_id, &t->rseq->cpu_id_start)) > + return -EFAULT; > + if (__put_user(cpu_id, &t->rseq->cpu_id)) > + return -EFAULT; > + trace_rseq_update(t); > + return 0; > +} > + > +static int rseq_reset_rseq_cpu_id(struct task_struct *t) > +{ > + uint32_t cpu_id_start = 0, cpu_id = RSEQ_CPU_ID_UNINITIALIZED; u32 > + > + /* > + * Reset cpu_id_start to its initial state (0). > + */ > + if (__put_user(cpu_id_start, &t->rseq->cpu_id_start)) > + return -EFAULT; > + /* > + * Reset cpu_id to RSEQ_CPU_ID_UNINITIALIZED, so any user coming > + * in after unregistration can figure out that rseq needs to be > + * registered again. > + */ > + if (__put_user(cpu_id, &t->rseq->cpu_id)) > + return -EFAULT; > + return 0; > +} > + > +static int rseq_get_rseq_cs(struct task_struct *t, > + unsigned long *start_ip, > + unsigned long *post_commit_offset, > + unsigned long *abort_ip, > + uint32_t *cs_flags) > +{ > + struct rseq_cs __user *urseq_cs; > + struct rseq_cs rseq_cs; > + unsigned long ptr; > + u32 __user *usig; > + u32 sig; > + int ret; > + > + ret = __get_user(ptr, &t->rseq->rseq_cs); > + if (ret) > + return ret; > + if (!ptr) > + return 0; > + urseq_cs = (struct rseq_cs __user *)ptr; > + if (copy_from_user(&rseq_cs, urseq_cs, sizeof(rseq_cs))) > + return -EFAULT; > + if (rseq_cs.version > 0) > + return -EINVAL; > + > + /* Ensure that abort_ip is not in the critical section. */ > + if (rseq_cs.abort_ip - rseq_cs.start_ip < rseq_cs.post_commit_offset) > + return -EINVAL; The kernel will not crash if userspace messes that up right? So why do we care to check? > + > + *cs_flags = rseq_cs.flags; > + *start_ip = rseq_cs.start_ip; > + *post_commit_offset = rseq_cs.post_commit_offset; > + *abort_ip = rseq_cs.abort_ip; Then this becomes a straight struct assignment. > + > + usig = (u32 __user *)(rseq_cs.abort_ip - sizeof(u32)); > + ret = get_user(sig, usig); > + if (ret) > + return ret; > + > + if (current->rseq_sig != sig) { > + printk_ratelimited(KERN_WARNING > + "Possible attack attempt. Unexpected rseq signature 0x%x, expecting 0x%x (pid=%d, addr=%p).\n", > + sig, current->rseq_sig, current->pid, usig); > + return -EPERM; > + } Is there any text that explains the thread model and possible attack that this signature prevents? I failed to find any, which raises the question, why is it there.. > + return 0; > +} > + > +static int rseq_need_restart(struct task_struct *t, uint32_t cs_flags) u32 > +{ > + uint32_t flags, event_mask; u32 > + int ret; > + > + /* Get thread flags. */ > + ret = __get_user(flags, &t->rseq->flags); > + if (ret) > + return ret; > + > + /* Take critical section flags into account. */ > + flags |= cs_flags; > + > + /* > + * Restart on signal can only be inhibited when restart on > + * preempt and restart on migrate are inhibited too. Otherwise, > + * a preempted signal handler could fail to restart the prior > + * execution context on sigreturn. > + */ > + if (unlikely(flags & RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL)) { > + if ((flags & (RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE > + | RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT)) != > + (RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE > + | RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT)) > + return -EINVAL; Please put operators at the end of the previous line, not at the start of the new line when you have to break statements. Also, that's unreadable. #define RSEQ_CS_FLAGS (RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT | \ RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL | \ RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE) if (unlikely((flags & RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL) && (flags & RSEQ_CS_FLAGS) != RSEQ_CS_FLAGS)) return -EINVAL; > + } > + > + /* > + * Load and clear event mask atomically with respect to > + * scheduler preemption. > + */ > + preempt_disable(); > + event_mask = t->rseq_event_mask; > + t->rseq_event_mask = 0; > + preempt_enable(); > + > + event_mask &= ~flags; > + if (event_mask) > + return 1; > + return 0; return !!(event_mask & ~flags); > +} > + > +static int clear_rseq_cs(struct task_struct *t) > +{ > + unsigned long ptr = 0; > + > + /* > + * The rseq_cs field is set to NULL on preemption or signal > + * delivery on top of rseq assembly block, as well as on top > + * of code outside of the rseq assembly block. This performs > + * a lazy clear of the rseq_cs field. > + * > + * Set rseq_cs to NULL with single-copy atomicity. > + */ > + return __put_user(ptr, &t->rseq->rseq_cs); __put_user(0UL, &t->rseq->rseq_cs); ? > +} > + > +static int rseq_ip_fixup(struct pt_regs *regs) > +{ > + unsigned long ip = instruction_pointer(regs), start_ip = 0, > + post_commit_offset = 0, abort_ip = 0; valid C, but yuck. Just have two 'unsigned long' lines. Also, why the =0, the below call to rseq_get_rseq_cs() will either initialize of fail. > + struct task_struct *t = current; > + uint32_t cs_flags = 0; u32 > + bool in_rseq_cs = false; > + int ret; > + > + ret = rseq_get_rseq_cs(t, &start_ip, &post_commit_offset, &abort_ip, > + &cs_flags); ret = rseq_get_rseq_cs(t, &start_ip, &post_commit_offset, &abort_ip, &cs_flags); > + if (ret) > + return ret; > + > + /* > + * Handle potentially not being within a critical section. > + * Unsigned comparison will be true when > + * ip >= start_ip, and when ip < start_ip + post_commit_offset. > + */ > + if (ip - start_ip < post_commit_offset) > + in_rseq_cs = true; > + > + /* > + * If not nested over a rseq critical section, restart is > + * useless. Clear the rseq_cs pointer and return. > + */ > + if (!in_rseq_cs) > + return clear_rseq_cs(t); That all seems needlessly complicated; isn't: if (ip - start_ip >= post_commit_offset) return clear_rseq_cs(); equivalent? Nothing seems to use that variable after this. > + ret = rseq_need_restart(t, cs_flags); > + if (ret <= 0) > + return ret; > + ret = clear_rseq_cs(t); > + if (ret) > + return ret; > + trace_rseq_ip_fixup(ip, start_ip, post_commit_offset, abort_ip); > + instruction_pointer_set(regs, (unsigned long)abort_ip); > + return 0; > +}