Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp6464008ybx; Mon, 11 Nov 2019 09:30:40 -0800 (PST) X-Google-Smtp-Source: APXvYqwaUKA5p75o1PZQyOzSBgeJwLGeGCSo8A/VGebGj71j8jSWaNbxmOu5RmHTNOhLijmHoqCf X-Received: by 2002:a17:906:9458:: with SMTP id z24mr90923ejx.289.1573493440210; Mon, 11 Nov 2019 09:30:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573493440; cv=none; d=google.com; s=arc-20160816; b=wwJ1ihRtWrx82EtyoitpOxBLMkx3GfYmooR9LarQedWPH0Q0f6fhHOk7lt2X0WThk5 oYUGHg5uzYwTEUaIr90bMGaoGwlbxxIFxFRxRS3bTbO+MEgmPa4HHaymu0LYiwu9KFqF E5JlEFH/VSs8Lz8+/fnY/nuQqJDc7bMsx2VpxSDzEXy+2HuBcllVRsEIFeYbYp71WZoF hQWWMXnlimiM4bzHuZqmJZU3JTa/31IuGlq4/IR0YkJ16wZ2trDXbcRZGUUQa0fnEHgq 0Gmgqn8jWcN9FoxznbHAhBd7TalFXpFz31ztSEYYkcWSiw+gScNH2yDz5c07YTq+BLYP 931A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=sC95qrk+Zj9DYHna+L6C+E7YkYbFvj6JBMUxDc9MrrQ=; b=PRGnSI2vdBciJsK3Az4em8M+Ide794avQtMkz7uk6cR/h3s58CA0zoGozoqq0nCHXf J1QpRK3g3vSWA7BFFA4dz61wMIZhrX8n7TJXDlaCWk9ZxfRjx4mFp8G0aSXfGegqULjI RfogfVYf7gAMYmfAsm5xYDamtNhqLbI52q3Td1aCr52clmlDb6UPTAnG+2ZFCyK1/wB8 pHrdflFD/f+7CNe5DQ7Gojl3JxH+uaGrTfXtRL47PIIDDaF2IJvhrQWWBzR5G69kEBE3 MJDm/A2dK11Nm4LBgapTVpqtbu8uce0aCarrPTr8iO3lv5+ZibWKIj8UXD6pQ8BsvKVZ bpnw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=JJQdLKK4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w47si11696652edd.326.2019.11.11.09.30.16; Mon, 11 Nov 2019 09:30:40 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=JJQdLKK4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726946AbfKKR3n (ORCPT + 99 others); Mon, 11 Nov 2019 12:29:43 -0500 Received: from mail.kernel.org ([198.145.29.99]:34416 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726871AbfKKR3n (ORCPT ); Mon, 11 Nov 2019 12:29:43 -0500 Received: from willie-the-truck (236.31.169.217.in-addr.arpa [217.169.31.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 0BB03214DB; Mon, 11 Nov 2019 17:29:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1573493381; bh=sTAo81ga35nrHBYg28tMpBJ/akeUP/g9Q4S64bmMeaw=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=JJQdLKK4o0G/WsS9+0TT3E7WSBqAUdOB7w0kXHs77zcZObbGuO6/EKDAJ0buPJ2kI KTZVdE9aAvcktEevu5+hFy3Gb6WqRg2qow+1+MeBukRqBi6oMD7joqWnucwOowGmKg ZTT7m+1eVSyAJS0+w7OjltUNuh7RigPAMgXJogr0= Date: Mon, 11 Nov 2019 17:29:35 +0000 From: Will Deacon To: Peter Zijlstra Cc: Leo Yan , Mark Rutland , Mike Leach , Adrian Hunter , Ingo Molnar , Borislav Petkov , "H . Peter Anvin" , x86@kernel.org, Alexander Shishkin , Mathieu Poirier , Arnaldo Carvalho de Melo , Jiri Olsa , linux-kernel@vger.kernel.org Subject: Re: [PATCH RFC 1/6] perf/x86: Add perf text poke event Message-ID: <20191111172935.GA11972@willie-the-truck> References: <20191030124659.GQ4114@hirez.programming.kicks-ass.net> <20191030141950.GB21153@leoy-ThinkPad-X240s> <20191030162325.GT4114@hirez.programming.kicks-ass.net> <20191031073136.GC21153@leoy-ThinkPad-X240s> <20191101100440.GU4131@hirez.programming.kicks-ass.net> <20191104022346.GC26019@leoy-ThinkPad-X240s> <20191108150530.GA7721@leoy-ThinkPad-X240s> <20191111144642.GM4114@hirez.programming.kicks-ass.net> <20191111153925.GC10488@willie-the-truck> <20191111160505.GZ4097@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191111160505.GZ4097@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 11, 2019 at 05:05:05PM +0100, Peter Zijlstra wrote: > On Mon, Nov 11, 2019 at 03:39:25PM +0000, Will Deacon wrote: > > > Backing up though, I think I'm missing details about what this thread is > > trying to achieve. You're adding perf events so that coresight trace can > > take into account modifications of the kernel text, right? > > Yes, because ARM-CS / Intel-PT need to know the exact text at any one > time in order to correctly decode their traces. > > > If so: > > > > * Does this need to take into account more than just jump_label()? > > jump_label seems to be the initial target Adrian set, but yes, it needs > to cover 'everything'. Including alternatives, which are what get me worried since the potential for recursion is pretty high there (on arm64, at least). > That is, all single instruction patching crud around: > > - optimized kprobes > (regular kprobes are exempt because they rely on exceptions) > - jump_labels > - static_call (still pending but still) > - ftrace fentry > > We also need a solution for whole new chunks of text: > > - modules > - ftrace trampolines > - optprobe trampolines > - JIT stuff > > but that is so far not included; I had some ideas about a /dev/mem based > interface that would report new ranges and wait for acks (from open > file-desc) before freeing them. I think that it would be nice not to end up with two very different interfaces for this. But I see this is still just an RFC, so maybe the full picture will emerge once we solve more of these use-cases. > > * What consistency guarantees are needed by userspace? It's not clear to > > me how it correlates the new event with execution on other CPUs. Is this > > using timestamps or something else? > > Something else :-) > > Both CS/PT basically have a bit-stream encoding of taken / not-taken > decisions. To decode they read the text until a conditional > branch-point, then consume one bit from the stream and continue. > > (note how unconditional branches -- jump_labels -- are expected to be > correct in this scheme) > > This means they need an exact representation of the text to be able to > accurately decode. > > This means timestamps or PRE/POST modify events are not acceptible. > Because until the I-FLUSH has completed we do not know which CPU has > which version of the instruction. > > Instead we rely on exceptions; exceptions are differently encoded in the > CS/PT data streams. > > The scheme used is: > > - overwrite target instruction with an exception (INT3 on x86, BRK on arm) > - sync (IPI broadcast CPUID or I-FLUSH completion) Hmm. Wouldn't this sync also need to drain the trace buffers for all other CPUs so that we make sure that the upcoming TEXT_POKE event occurs after all prior trace data, which could've been from before the breakpoint was installed? > at this point we know the instruction _will_ trap and CS/PT can observe > this alternate flow. That is, the exception handler will emulate the > instruction. > > - emit the TEXT_POKE event with both the old and new instruction > included > > - overwrite the target instruction with the new instruction > - sync > > at this point the new instruction should be valid. > > Using this scheme we can at all times follow the instruction flow. > Either it is an exception and the exception encoding helps us navigate, > or, on either size, we'll know the old/new instruction. > > > * What about module loading? > > I mentioned that above; that is still pending. > > > Finally, the whole point of the '_nosync()' stuff is that we don't have > > to synchronise. Therefore, there is a window where you really can't tell > > whether the instruction has been updated from the perspective of another > > CPU. > > Right, and much of that is preserved with the above scheme, nowhere do > you have to wait/disturb other CPUs, except for the I-FLUSH completion. > > Does this help? Yes, thank you for explaining it! Will