Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp827969ybx; Wed, 30 Oct 2019 05:48:23 -0700 (PDT) X-Google-Smtp-Source: APXvYqzaWPwncNm+JTqv5MGjIV+3AAhIL023CeAcJiSVqZ2W1s9CPmdhAJSwJqWWIqURF6l46zsF X-Received: by 2002:a17:906:1f8b:: with SMTP id t11mr8708067ejr.191.1572439703320; Wed, 30 Oct 2019 05:48:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1572439703; cv=none; d=google.com; s=arc-20160816; b=PA1tQp+ScUlU1RFJIAJ+pktX5/qnR/vrZgMUTDV5/K1Z1x1uK99B4NYImTtJa7O1Lt bgGGHIKb5f5Krr56UnA044O/iHkhCNrC3nW28gh9wfUN5bxLv5Qev7WMKasfiEMrAjDG bXlzf7lcfCRVw+SJZmT7eO3/zi3uRWFyacDMFSajKYa2cj7IHyn7oY58UN+7CbnDsp/o gNYKzrcskjhNsCc34LpwnEv1TlsXCyxYHD9mcNieZaAuMQo94a6agr4JLwDYfMFogwwz SKu+wmO4zBtkWx13vzQHA548iZw2A+980cnIC+H3Lc0mvgCWhxGU2xk1cqX6L+m+2f31 xWSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=/m49eZ61fGR01vML73iup+AhH6sl213r3BQETPC+rI8=; b=DuoclgpLpAgJRoIIItZzRDA/oWKuyqfYUum25d4NOi+MapVsjR81ReKab2+XAzFrbh wFUnrFlySdbDRTyS+Z+4qv1Vscef9XB4RSBDVOKAp3K24A3lcCNumHAtj7BESZCLDlXD 3J2Fwv+R8zBaELerdKx/6jk1+OamG91Z/t0kXfuIeYiBv9mVFD3NeVa65svd6MCEoFrD gvXBErkbS3NeQ8111hCTQDup/pjHBA0dG8XxT4XFcY9z6eEwxHx/q4DymqrHuoDU3hy+ rHCDGpTn0q1ZiVXR7SaA60Ew5hASn2xoC/P9xcUiVR3+cQ5KeQ1AfAGiJs0BkrJq1YeF 0yHQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=merlin.20170209 header.b=GsSorIn9; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id fy3si1064406ejb.267.2019.10.30.05.48.00; Wed, 30 Oct 2019 05:48:23 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=merlin.20170209 header.b=GsSorIn9; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726411AbfJ3MrT (ORCPT + 99 others); Wed, 30 Oct 2019 08:47:19 -0400 Received: from merlin.infradead.org ([205.233.59.134]:50706 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726088AbfJ3MrT (ORCPT ); Wed, 30 Oct 2019 08:47:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=merlin.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=/m49eZ61fGR01vML73iup+AhH6sl213r3BQETPC+rI8=; b=GsSorIn9gaIxPHvdTLOwabIb4 193uAmQXvz+BnKDc/c5VRfZsdn94yhejQHPet5n8WE8ETynBayC+6PivwlfpYHAntbwOHq7b9ZDK5 vRbVBWIV/GZiucp2RdIxdPf6lhT9eARpclKiOGkHWTzXLP7RA9zEWixywygwpR6VW75Mbmvxern5u VgvlosCbLwqQ2VwD5/gN89eosBsOmsg7PcgLs1mzkAxmiShOzhWSfWOD2e5qhjXr0XsH9Zbxzw3s7 5B75/O3xvYu8Q82ObKXF1DuVZAf/CB71WkS3Ofklisw1qfl8C1n1UDETCm0kcUovVvqbsyS4MK8ds io71IBiig==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=noisy.programming.kicks-ass.net) by merlin.infradead.org with esmtpsa (Exim 4.92.3 #3 (Red Hat Linux)) id 1iPnNP-0003Yn-8d; Wed, 30 Oct 2019 12:47:03 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 31C7F3006D0; Wed, 30 Oct 2019 13:45:58 +0100 (CET) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 61B4B2B435A44; Wed, 30 Oct 2019 13:46:59 +0100 (CET) Date: Wed, 30 Oct 2019 13:46:59 +0100 From: Peter Zijlstra To: Leo Yan Cc: Adrian Hunter , Ingo Molnar , Borislav Petkov , "H . Peter Anvin" , x86@kernel.org, Mark Rutland , Alexander Shishkin , Mathieu Poirier , Arnaldo Carvalho de Melo , Jiri Olsa , linux-kernel@vger.kernel.org Subject: Re: [PATCH RFC 1/6] perf/x86: Add perf text poke event Message-ID: <20191030124659.GQ4114@hirez.programming.kicks-ass.net> References: <20191025130000.13032-1-adrian.hunter@intel.com> <20191025130000.13032-2-adrian.hunter@intel.com> <20191030104747.GA21153@leoy-ThinkPad-X240s> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191030104747.GA21153@leoy-ThinkPad-X240s> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 30, 2019 at 06:47:47PM +0800, Leo Yan wrote: > Hi Adrian, > > On Fri, Oct 25, 2019 at 03:59:55PM +0300, Adrian Hunter wrote: > > Record changes to kernel text (i.e. self-modifying code) in order to > > support tracers like Intel PT decoding through jump labels. > > > > A copy of the running kernel code is needed as a reference point > > (e.g. from /proc/kcore). The text poke event records the old bytes > > and the new bytes so that the event can be processed forwards or backwards. > > > > The text poke event has 'flags' to describe when the event is sent. At > > present the only flag is PERF_TEXT_POKE_UPDATE which indicates the > > point at which tools should update their reference kernel executable to > > change the old bytes to the new bytes. > > > > Other architectures may wish to emit a pair of text poke events. One before > > and one after a text poke. In that case, PERF_TEXT_POKE_UPDATE flag would > > be set on only one of the pair. > > Thanks a lot for the patch set. > > Below is my understanding for implementation 'emit a pair of text poke > events' as you mentioned: > > Since Arm64 instruction is fixed size, it doesn't need to rely on INT3 > liked mechanism to replace instructions and directly uses two operations > to alter instruction (modify instruction and flush icache line). So > Arm64 has no chance to send perf event in the middle of altering > instruction. Right. > Thus we can send pair of text poke events in the kernel: > > perf_event_text_poke(PERF_TEXT_POKE_UPDATE_PREV) > > Change instruction > Flush icache > > perf_event_text_poke(PERF_TEXT_POKE_UPDATE_POST) > > In the userspace, if perf tool detects the instruction is changed > from nop to branch, There is _far_ more text poking than just jump_label's NOP/JMP transitions. Ftrace also does NOP/CALL, CALL/CALL, and the static_call infrastructure that I posted here: https://lkml.kernel.org/r/20191007082708.013939311@infradead.org Has: JMP/RET, JMP/JMP and if it has inline patching support: NOP/CALL, CALL/CALL, patching. Anyway, the below argument doesn't care much, it works for NOP/JMP just fine. > we need to update dso cache for the > 'PERF_TEXT_POKE_UPDATE_PREV' event; if detect the instruction is > changed from branch to nop, we need to update dso cache for > 'PERF_TEXT_POKE_UPDATE_POST' event. The main idea is to ensure the > branch instructions can be safely contained in the dso file and any > branch samples can read out correct branch instruction. > > Could you confirm this is the same with your understanding? Or I miss > anything? I personally even think the pair events can be used for > different arches (e.g. the solution can be reused on Arm64/x86, etc). So the problem we have with PT is that it is a bit-stream of branch taken/not-taken decisions. In order to decode that we need to have an accurate view of the unconditional code flow. Both NOP/JMP are unconditional and we need to exactly know which of the two was encountered. With your scheme, I don't see how we can ever actually know that. When we get the PRE event, all we really know is that we're going to change a specific instruction into another. And at the POST event we know it has been done. But in between these two events, we have no clue which of the two instructions is live on which CPU (two CPUs might in fact have a different live instruction at the same time). This means we _cannot_ unambiguously decode a taken/not-taken decision stream. Does CS have this same problem, and how would the PRE/POST events help with that? So our (x86) horrible (variable instruction length induced) complexity for text poking is actually in our advantage this one time. The exception and exception-return paths allow us to unambiguously know what happens around the time of poking.