Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp4364897ybi; Tue, 11 Jun 2019 05:35:11 -0700 (PDT) X-Google-Smtp-Source: APXvYqxqAi4rjLPtwCfL/M3qrdTdTdu5h0LQeawe9+5qJM9WcaWGIErcBkK8lC55x2Ztk+nPTnzt X-Received: by 2002:a62:198e:: with SMTP id 136mr77226089pfz.180.1560256511796; Tue, 11 Jun 2019 05:35:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560256511; cv=none; d=google.com; s=arc-20160816; b=gej9piPzARD1ys0sUaQ3ZmGo7NWFGhpQtlSSY/xT+p1+ecwHPQeWikYOSg0/GbzVGz ZC8DAU1wHzMh4/EpYSEY1PgpmqdQQcfFnzLM85QtLf8+O75f2Bd+B0jz/TYKZyX4MtRS B/YHDzYvZASOH+JxC2OT6QUSUbL9ZSGXV+Sc43QOdc3JF76/0FK6PNI+Ep83WFx6x4At 3dJvFsOPqthSdiJnZHRLoNTjB7lS5tRLgMx4Z4PLYV8mutx3KZozkpKB4piIzJVjqUz1 ztYILDXlEqTCc+ORIK7FJ1HEeQaoZPSDbBvelj84X9jlI19tHC5sP66tRYOQZXbC7jfU XSTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=6SfjPMJfJHXhFOuS4hHJtSok6+MviQggMxwQadR+ODU=; b=EvIS9tlYOK5IEks+Xn6VT6371YfC3AB9kmgEoTP4GP+I7eYP7loYP8tajFIp37Q7ou q/RmurAxOmedqeqCrW0LJcNJ1e5sCSStmWlxJrbE3m37Cs8iYtHqLwoI19zU9fRKs/WT x9zKYT/ug2QQ2yANQBsumRtn/d3GQJyRHHZDCTRvLHrCOnV2eO2/yA0Nw5QPxW44copY 8xAKbzstGoDX9J9ByLP9/pS7F8g2y6T9QP9AJE068QlLLqCvajpi9Os9Jwm2AYUwUorw MC0YccI3kybZly0uyTmHgIe6e5ZYmEzBgOkBGjeEb1ilhfnPMigH3YifwnQHZVwt9jfE PNhQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=azxqKpBi; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 91si12268597plc.54.2019.06.11.05.34.56; Tue, 11 Jun 2019 05:35:11 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=azxqKpBi; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389455AbfFKMeh (ORCPT + 99 others); Tue, 11 Jun 2019 08:34:37 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:44112 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388065AbfFKMeg (ORCPT ); Tue, 11 Jun 2019 08:34:36 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=6SfjPMJfJHXhFOuS4hHJtSok6+MviQggMxwQadR+ODU=; b=azxqKpBi8AYd3gOR0t53pd3PK UtAqx2C9K1595CyeJjb9kEyprz6opDhF950FJt9Nk+l3xvKhjfcRGL95xm6PlOix1FhONm/BKalPZ ElF/zWpYJ1J9Cs+nYI5MJIq1umjAtlYaXF0GYpE8qCm8oP3qkzpnZnaF84W9kds/be32yFaBj+eo8 QCkTJmDEyFWrbDLP3OXGvRKpjwzj4NMpLRQhUvJ88HIihRPf63MihorIHms60t8C9tyFllOoa42u2 2wIywudv8zZTiRSwwDp+vFCQd6T1a7NxdwThcxGYqkbLdR0jei+sOTS1NVxZ6hWPn+njM9u4DjQNE YNI9bgsvQ==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by bombadil.infradead.org with esmtpsa (Exim 4.92 #3 (Red Hat Linux)) id 1hafyW-0000Ku-8z; Tue, 11 Jun 2019 12:34:04 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 8A4BC202173E1; Tue, 11 Jun 2019 14:34:02 +0200 (CEST) Date: Tue, 11 Jun 2019 14:34:02 +0200 From: Peter Zijlstra To: Andy Lutomirski Cc: Masami Hiramatsu , x86@kernel.org, linux-kernel@vger.kernel.org, Ard Biesheuvel , Andy Lutomirski , Steven Rostedt , Ingo Molnar , Thomas Gleixner , Linus Torvalds , Jason Baron , Jiri Kosina , David Laight , Borislav Petkov , Julia Cartwright , Jessica Yu , "H. Peter Anvin" , Nadav Amit , Rasmus Villemoes , Edward Cree , Daniel Bristot de Oliveira Subject: Re: [PATCH 08/15] x86/alternatives: Teach text_poke_bp() to emulate instructions Message-ID: <20190611123402.GH3463@hirez.programming.kicks-ass.net> References: <20190605130753.327195108@infradead.org> <20190605131945.005681046@infradead.org> <20190608004708.7646b287151cf613838ce05f@kernel.org> <20190607173427.GK3436@hirez.programming.kicks-ass.net> <3DA961AB-950B-4886-9656-C0D268D521F1@amacapital.net> <20190611080307.GN3436@hirez.programming.kicks-ass.net> <20190611120834.GG3463@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190611120834.GG3463@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 11, 2019 at 02:08:34PM +0200, Peter Zijlstra wrote: > On Tue, Jun 11, 2019 at 10:03:07AM +0200, Peter Zijlstra wrote: > > On Fri, Jun 07, 2019 at 11:10:19AM -0700, Andy Lutomirski wrote: > > > > I am surely missing some kprobe context, but is it really safe to use > > > this mechanism to replace more than one instruction? > > > > I'm not entirely up-to-scratch here, so Masami, please correct me if I'm > > wrong. > > > > So what happens is that arch_prepare_optimized_kprobe() <- > > copy_optimized_instructions() copies however much of the instruction > > stream is required such that we can overwrite the instruction at @addr > > with a 5 byte jump. > > > > arch_optimize_kprobe() then does the text_poke_bp() that replaces the > > instruction @addr with int3, copies the rel jump address and overwrites > > the int3 with jmp. > > > > And I'm thinking the problem is with something like: > > > > @addr: nop nop nop nop nop > > > > We copy out the nops into the trampoline, overwrite the first nop with > > an INT3, overwrite the remaining nops with the rel addr, but oops, > > another CPU can still be executing one of those NOPs, right? > > > > I'm thinking we could fix this by first writing INT3 into all relevant > > instructions, which is going to be messy, given the current code base. > > Maybe not that bad; how's something like this? > > (completely untested) > > --- > arch/x86/kernel/alternative.c | 20 ++++++++++++++++++-- > 1 file changed, 18 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c > index 0d57015114e7..8f643dabea72 100644 > --- a/arch/x86/kernel/alternative.c > +++ b/arch/x86/kernel/alternative.c > @@ -24,6 +24,7 @@ > #include > #include > #include > +#include > > int __read_mostly alternatives_patched; > > @@ -849,6 +850,7 @@ static void do_sync_core(void *info) > > static bool bp_patching_in_progress; > static void *bp_int3_handler, *bp_int3_addr; > +static unsigned int bp_int3_length; > > int poke_int3_handler(struct pt_regs *regs) > { > @@ -867,7 +869,11 @@ int poke_int3_handler(struct pt_regs *regs) > if (likely(!bp_patching_in_progress)) > return 0; > > - if (user_mode(regs) || regs->ip != (unsigned long)bp_int3_addr) > + if (user_mode(regs)) > + return 0; > + > + if (regs->ip < (unsigned long)bp_int3_addr || > + regs->ip >= (unsigned long)bp_int3_addr + bp_int3_length) > return 0; Bugger, this isn't right. It'll jump to the beginning of the trampoline, even if it is multiple instructions in, which would lead to executing instructions twice, which would be BAD. _maybe_, depending on what the slot looks like, we could do something like: offset = regs->ip - (unsigned long)bp_int3_addr; regs->ip = bp_int3_handler + offset; That is; jump into the slot at the same offset we hit the INT3, but this is quickly getting yuck. > /* set up the specified breakpoint handler */ > @@ -900,9 +906,12 @@ NOKPROBE_SYMBOL(poke_int3_handler); > void text_poke_bp(void *addr, const void *opcode, size_t len, void *handler) > { > unsigned char int3 = 0xcc; > + void *kaddr = addr; > + struct insn insn; > > bp_int3_handler = handler; > bp_int3_addr = (u8 *)addr + sizeof(int3); > + bp_int3_length = len - sizeof(int3); > bp_patching_in_progress = true; > > lockdep_assert_held(&text_mutex); > @@ -913,7 +922,14 @@ void text_poke_bp(void *addr, const void *opcode, size_t len, void *handler) > */ > smp_wmb(); > > - text_poke(addr, &int3, sizeof(int3)); > + do { > + kernel_insn_init(&insn, kaddr, MAX_INSN_SIZE); > + insn_get_length(&insn); > + > + text_poke(kaddr, &int3, sizeof(int3)); > + > + kaddr += insn.length; > + } while (kaddr < addr + len); > > on_each_cpu(do_sync_core, NULL, 1); >