Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp6468230ybi; Mon, 8 Jul 2019 03:33:29 -0700 (PDT) X-Google-Smtp-Source: APXvYqydV/pOx7iwIRY17hnNHiG84/XIeL0fBykDdO3CcUHI9yGvhZA3jVAZHnJBC8/4aWkOyBjs X-Received: by 2002:a17:902:a03:: with SMTP id 3mr23357450plo.302.1562582009689; Mon, 08 Jul 2019 03:33:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1562582009; cv=none; d=google.com; s=arc-20160816; b=DUoZU2EloDL4x8bQP6ndBiTFhfaRj9+qIkYXW8RWthYH/FOj5cW5iE46IbIrlx6R3V cPj58Iw/yWBU1Z0AdE1wpvbNcmVNCLEG2BikIQdrGuN+LOKUez9JKqhnAjAXhXL5rCXt Os13weILEuT+jPizHPd6ghKUjli+SZnCoByhn7HdNz3Z6colAN3da1dELmq2Fs7ffy96 ezMlL1uc/8VUn0E6dvf9w3ulvg2lN1gy4F+a75tlOlMgK1bUsEuFcpYstvlvKQG2wC2N OwqRf5D1PrtWG2PYexnKs4I8Uc6bJgCO4Whgm9eBf1yhQtKzLBE06AiIJaCQrJQYxxza CV8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=ZtdcoTJjChZhuPwlagtEicRdXE6fEWR8USCQzYBj3As=; b=F6gLzIkVMYgcB2dewH0IuHTUMK1onm3ItFHkFimYMJfCsNjOnyVco8zuzXqS+44MOE 15/JomXnT81Y5lyB8/ssCtHsaOTXA//XZsDpkxx/X85yvgcfvgE+Cv5REEakNlBl09Ck sVPBje1hhStCuDmHdfKUkoUh0+3Hw+gO4Ts0ffBAAJCH+pJrlknYsjSbDjpOlu0YCsQq P/yvuxDfpZG84hnbxw66Ym3oRQjNj84xzE1oanXYX3TFLu8qRmWkf15VWU5k9v1yuuta fd753JRIC/UtzMCGfAM4F3FOhhORzeTNciXCOM02kclwR8KZ1qskMOpNECu2nS/WVMaC S8ZA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=KlQ9rDMj; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o7si11424116pgq.459.2019.07.08.03.33.14; Mon, 08 Jul 2019 03:33:29 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=KlQ9rDMj; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729489AbfGHHtE (ORCPT + 99 others); Mon, 8 Jul 2019 03:49:04 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:39582 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725872AbfGHHtD (ORCPT ); Mon, 8 Jul 2019 03:49:03 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=ZtdcoTJjChZhuPwlagtEicRdXE6fEWR8USCQzYBj3As=; b=KlQ9rDMj5O2oKgRR1ZVrSXvUR XU7cMcDvPgTu4Yb6UzfhpJCk5+aWfMLToecJ23SOGlfDKIOqnrOtuz7PcZjuUBRqtfhv8VPXrS4mW FLU+w0npjYzog8aSpyh2PTF19VyckbJ8hmWZnQ9bouqs/o88uTq705C1YmRkTSGmLDHfSvgOUIukx kBFyXGgzzrlNp58xqEeoOt+P7huHAt/5cvsm7c2WwzJyn9eCzYL0IexGYXChiD+G4QflroGJcBZgc 0uQtSudvMHBuf0lfLQ0PGqaAbqwKEV7Go/1EXK2KIw+SFchsSFFDSRivCL4vS8KXNMUW3QUgUH3LD bUpVnwxgA==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by bombadil.infradead.org with esmtpsa (Exim 4.92 #3 (Red Hat Linux)) id 1hkONt-0002mS-Qu; Mon, 08 Jul 2019 07:48:26 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 5BAD420AA5C74; Mon, 8 Jul 2019 09:48:23 +0200 (CEST) Date: Mon, 8 Jul 2019 09:48:23 +0200 From: Peter Zijlstra To: Eiichi Tsukata Cc: Linus Torvalds , Thomas Gleixner , Borislav Petkov , Ingo Molnar , Steven Rostedt , Andrew Lutomirski , Peter Anvin , Dave Hansen , Juergen Gross , Linux List Kernel Mailing , He Zhe , Joel Fernandes Subject: Re: [PATCH v2 5/7] x86/mm, tracing: Fix CR2 corruption Message-ID: <20190708074823.GV3402@hirez.programming.kicks-ass.net> References: <20190704195555.580363209@infradead.org> <20190704200050.534802824@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Jul 06, 2019 at 08:07:22PM +0900, Eiichi Tsukata wrote: > > > On 2019/07/05 11:18, Linus Torvalds wrote: > > On Fri, Jul 5, 2019 at 5:03 AM Peter Zijlstra wrote: > >> > >> Despire the current efforts to read CR2 before tracing happens there > >> still exist a number of possible holes: > > > > So this whole series disturbs me for the simple reason that I thought > > tracing was supposed to save/restore cr2 and make it unnecessary to > > worry about this in non-tracing code. > > > > That is very much what the NMI code explicitly does. Why shouldn't all > > the other tracing code do the same thing in case they can take page > > faults? > > > > So I don't think the patches are wrong per se, but this seems to solve > > it at the wrong level. This solves it all at one site. If we're going to sprinkle CR2 save/restore over 'all' sites we're bound to miss some and we'll be playing catch-up forever :/ > Steven previously tried to fix it by saving CR2 in TRACE_IRQS_OFF: > https://lore.kernel.org/lkml/20190320221534.165ab87b@oasis.local.home/ That misses the context tracking tracepoint, which is also before we load CR2. > To prevent userstack trace code from reading user stack before it becomes ready, > checking current->in_execve in stack_trace_save_user() can help Steven's approach, > though trace_sched_process_exec() is called before current->in_execve = 0 so it changes > current behavior. > > The PoC code is as follows: > > diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c > index 2abf27d7df6b..30fa6e1b7a87 100644 > --- a/arch/x86/kernel/stacktrace.c > +++ b/arch/x86/kernel/stacktrace.c > @@ -116,10 +116,12 @@ void arch_stack_walk_user(stack_trace_consume_fn consume_entry, void *cookie, > const struct pt_regs *regs) > { > const void __user *fp = (const void __user *)regs->bp; > + unsigned long address; > > if (!consume_entry(cookie, regs->ip, false)) > return; > > + address = read_cr2(); > while (1) { > struct stack_frame_user frame; > > @@ -131,11 +133,14 @@ void arch_stack_walk_user(stack_trace_consume_fn consume_entry, void *cookie, > break; > if (frame.ret_addr) { > if (!consume_entry(cookie, frame.ret_addr, false)) > - return; > + break; > } > if (fp == frame.next_fp) > break; > fp = frame.next_fp; > } > + > + if (address != read_cr2()) > + write_cr2(address); > } > And this misses the perf unwinder, which we can reach if we attach perf to the tracepoint. We can most likely also do user accesses from kprobes, which we can similarly attach to tracepoints, and there's eBPF, and who knows what else... Do we really want to go put CR2 save/restore around every single thing that can cause a fault (any fault) and is reachable from a tracepoint? That's going to be a long list of things ... :/ Or are we going to put the CR2 save/restore on every single tracepoint? But then we also need it on the mcount/fentry stubs and we again have multiple places. Whereas if we stick it in the entry path, like I proposed, we fix it in one location and we're done.