Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1146911imu; Tue, 20 Nov 2018 12:29:08 -0800 (PST) X-Google-Smtp-Source: AJdET5eiHNIwoMq4TOPW3AqaHaMvTxgpyF6sHT3/yhlueeUx6hoFN8+DEAONo/3k74lNTQqCyOH2 X-Received: by 2002:a62:a511:: with SMTP id v17-v6mr3713422pfm.18.1542745748512; Tue, 20 Nov 2018 12:29:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542745748; cv=none; d=google.com; s=arc-20160816; b=fbuvTp2ditsAfXDlg8hIadGoAPUy+9TsJITq06sC4xG0qxPsfIlAcw6XiSrfgt+Nwo J6DuULxsNJTWdX00t+OumaIK+9yfC8tD6xKCZG1LzAAR/NiPgGjcZVbEWL1EDuz2BmSB u/XBqjb1eNJySCM3gKKHZV12GTHExivVmSdr8plfe/Ltb6RtVjk4m5yii15fbpUKwtwW UiWVFaGCnq6DbR1CME+hLlguSTchZg37ch5TzW/1/0+G8aDGEA6+Rwj8K+x5lCtBrRAi 3yDHduwa6ZTRAibDJIiwxFePhPCHdbYUEDEC+Hl7W6/85+Yk/rPg/r8j8qXRJoiwn2F1 ehUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=FEn3i1uP4tJa6vP5eIWtYU8rt+lYiVOO+KFwOrhPUqg=; b=IKAupBCi77FtumLTOdlcXdRyV0vJO16occEVjA+ZZx1JvoK9aahpXlChhec27U8WTh ueMKtHuN5b8al9Wsuh0VlRPQ4+439/QC7kRiBm2jI2WwOg12Ih4jTe7rj95ewc477yL6 wLjXUgkkh+2/yb+4dPlQIWAitIw7BUDWIFY1dnKaoANhQX9BXlUUUrFA3VO6MQzrjRmA uLAbwBv4F/k+olQK9Z3d4JxitDbEmE0ZV5D26eDQzhoElHAZiMn6pk2SmoK/ISbBZR0R SR8gD1hmyw4jrk8I+56DxLAegvc4H0ZGDsJ4gyKM4eBy7adgQatduIakxh+PsiyXgTta VDcw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kylehuey.com header.s=google header.b=jcwyVjbt; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 69si44015800pgd.290.2018.11.20.12.28.53; Tue, 20 Nov 2018 12:29:08 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kylehuey.com header.s=google header.b=jcwyVjbt; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727450AbeKUGVk (ORCPT + 99 others); Wed, 21 Nov 2018 01:21:40 -0500 Received: from mail-vk1-f178.google.com ([209.85.221.178]:46829 "EHLO mail-vk1-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727384AbeKUGVk (ORCPT ); Wed, 21 Nov 2018 01:21:40 -0500 Received: by mail-vk1-f178.google.com with SMTP id j23so682119vke.13 for ; Tue, 20 Nov 2018 11:50:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kylehuey.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=FEn3i1uP4tJa6vP5eIWtYU8rt+lYiVOO+KFwOrhPUqg=; b=jcwyVjbt53OYfBSllpv1PQF3SrIMCYDH8pYe0Tlxd6+uGOG4nfMBuHAenO171g0KTY 5z/3zWInY5OvHJmdPKysMoxqOlTPEGG8bPUO5Yu76+j30TiiL6J9+H7F42U1u+hdGOpc x+qORy1jGnS1bB2bhxETIb42KdH+wIoNJROGoVOPcJNaKggU7zxs31csck7nRD2zonzX GAzGrSvMmx/o7knAx4TwS4vrLwzB8UteSVymV9otkTp7aJwhaXOzhU7wdnX1V81Omwcp L5ZlfN7aEwwrJfzcOXrR+HiCCv8vCFmL8UNVI6L9aK4qwVfu3SiF69IDMyaZjx/6vbkF TTPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=FEn3i1uP4tJa6vP5eIWtYU8rt+lYiVOO+KFwOrhPUqg=; b=TWgmQ7FOhJrCWzSQNMd3nT9McZCw1il+2TPK6kmwskQEYc8YhQKGd70rSNH3dEOUx9 DcIY4ylmK/KPIrIsJdCSAddv2KeGoaBKSVMjVFmFCLXQrsRS2InzH8FzXUyQ1eFLYsTD 7ymJ0qGtiHf7eWXoj6xVyM28EvWaYylLjBarFebg16UTpZCFTKyHItM3GA63bKhybJ+b TEGXut92VRP1SQE8aPZt8wkQY2QjFJ6hnPYkzb5QG+DQ4MZwYNbHoM5BPdNfKCi+HIyj jhjFVbI9YTFGNdxY/CPppviq7q/zZ/2FtQjjp7OPZg3ixQxPlVnY8GedkqboznZUgsx0 Txnw== X-Gm-Message-State: AA+aEWZWsdHzcq+8svwO3xnqiT4RE/FaJuFuJrIzbPQ2aVukan3nFZrJ HoU97fE6OPMDvUY5M1+eSJyf1GsTRyY2eJuGqqLsdA== X-Received: by 2002:a1f:f8cf:: with SMTP id w198mr1414447vkh.82.1542743449774; Tue, 20 Nov 2018 11:50:49 -0800 (PST) MIME-Version: 1.0 References: <20181120170842.GZ2131@hirez.programming.kicks-ass.net> In-Reply-To: From: Kyle Huey Date: Tue, 20 Nov 2018 11:50:18 -0800 Message-ID: Subject: Re: [REGRESSION] x86, perf: counter freezing breaks rr To: Stephane Eranian Cc: Andi Kleen , "Peter Zijlstra (Intel)" , Kan Liang , Ingo Molnar , "Robert O'Callahan" , Alexander Shishkin , Arnaldo Carvalho de Melo , Jiri Olsa , Linus Torvalds , Thomas Gleixner , Vince Weaver , acme@kernel.org, open list Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 20, 2018 at 10:16 AM Stephane Eranian wrote: > I would like to understand better the PMU behavior you are relying upon and > why the V4 freeze approach is breaking it. Could you elaborate? I investigated a bit more to write this response and discovered that my initial characterization of the problem as an overcount during replay is incorrect; what we are actually seeing is an undercount during recording. rr relies on the userspace retired-conditional-branches counter being exactly the same between recording and replay. The primary reason we do this is to establish a program "timeline", allowing us to find the correct place to inject asynchronous signals during replay (the program counter plus the retired-conditional-branches counter value uniquely identifies a point in most programs). Because we run the rcb counter during recording, we piggy back on it by programming it to interrupt the program every few hundred thousand branches to give us a chance to context switch to a different program thread. We've found that with counter freezing enabled, when the PMI fires, the reported value of the retired conditional branches counter is low by something on the order of 10 branches. In a single threaded program, although the PMI fires, we don't actually record a context switch or the counter value at this point. We continue on to the next tracee event (e.g. a syscall) and record the counter value at that point. Then, during replay, we replay to the syscall and check that the replay counter value matches the recorded value and find that it is too high. (NB: during a single threaded replay the PMI is not used here because there is no asynchronous event.) Repeatedly recording the same program produces traces that have different recorded retired-conditional-branch counter values after the first PMI fired during recording, but during replay we always count off the same number of branches, further suggesting that the replay value is correct. And finally, recordings made on a kernel with counter freezing active still fail to replay on a kernel without counter freezing active. I don't know what the underlying mechanism for the loss of counter events is (e.g. whether it's incorrect code in the interrupt handler, a silicon bug, or what) but it's clear that the counter freezing implementation is causing events to be lost. - Kyle - Kyle