Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1214485imu; Tue, 20 Nov 2018 13:39:34 -0800 (PST) X-Google-Smtp-Source: AJdET5eBItnWcZ04W3ZFecIO9vC4Us3CGV7Qxo1XwfI9hG89ztgIfRfKV6GvZcZKlNiDv+ZLAskx X-Received: by 2002:a62:4c6:: with SMTP id 189-v6mr4076979pfe.110.1542749974469; Tue, 20 Nov 2018 13:39:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542749974; cv=none; d=google.com; s=arc-20160816; b=UePUAzfqodPWYf4hEFQtnK0r7EAZynYpgunKtfeNbZSyfL/zZyDMZzAX4orN+s9iUF PCor3lqFn1wJn07qgzqfXvieTu+79uxdt5SPLd65uENSXPphAo3YREqWlEv+VT+WU52w WumhgLVMRg7SGWqpD4NSA2hPo3k9YWdk+gZP8gTjLJFzY4pecN+QmIFsqs+yyYf+v743 Yq6jL7tthFAp1mwOCrk7XsZBY2GpsTO0a5i4teLvVjxSTh224a9MC+Ca9U24g8Lg8ba1 YOCGaOo4BHyA/B2yQbLAtUDb8A8GgKEJrJ6QmpbpQb7EY/+TGmfmPU8kj5MTV3GjnGz5 QDig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=1KRsRLm2++41GdQ5sZ2GzvmAHid+tTRCcHl3iqSl1Lc=; b=eLiWc3rM2/BfpVU4BTDpWyERGfX0XoYkQkRXvkpT5drPeDIeEjFs910MYzsBGE+pd+ VB1iPN8rhsTG5GYpUYpPVH2zAnWqloaD7leie1rF0qad4A6PApG89dSNKEhmNqn6ZZvi B/4w8JuYYw+x1R2284S06QRxSNkHDIPE67VO+q0cJPGyTovGTuinCZAYc9y330J2BAFK VAa3E+JBXHKU8OWk7/7+bEetqa5Bc/NcvvKzUQae12rXbXnyW+NT9u+ssbOiY0Z3sUH6 c9Y3RBsmS40sJyr/Dkr7HduQuigT4udtfA7wIXHgm7uquX38vxpqPKcJMRKQo8LdgW5C 2EPQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kylehuey.com header.s=google header.b=NsNngr9b; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j10si26638535pll.179.2018.11.20.13.39.19; Tue, 20 Nov 2018 13:39:34 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kylehuey.com header.s=google header.b=NsNngr9b; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726568AbeKUIGY (ORCPT + 99 others); Wed, 21 Nov 2018 03:06:24 -0500 Received: from mail-ua1-f68.google.com ([209.85.222.68]:44294 "EHLO mail-ua1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726129AbeKUIGY (ORCPT ); Wed, 21 Nov 2018 03:06:24 -0500 Received: by mail-ua1-f68.google.com with SMTP id d19so1189521uaq.11 for ; Tue, 20 Nov 2018 13:35:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kylehuey.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=1KRsRLm2++41GdQ5sZ2GzvmAHid+tTRCcHl3iqSl1Lc=; b=NsNngr9bPcsGPY2bsLhFtK6vTzwxj0KFCaFXY18IzMerdOALGJEFpHkagnKKWcCArC bzPvqOvrIW7L8mTYUs0D7bGxLNhQ+9jsRDzec2BuXpMaI0/K81HF7b9zTTZpWpkfEQvy mbT1Py61xcsXYhH+O3kX38UAl7Q4vszlUSGTSy1A8NhV0y2jONndaWAY9LHzHwIF6VXj TQy42gylgs6jSbeJHzSPimTZyzVyDsMwGuz6+AER2pFA/4OzFrt3ERUecinicIkPEPFX zgQ+A/KST6l/3WMgFoS3VFD/ekS94yqw0dxkwG2uR3X+U+Mj2HHczaQlYkCsojBVCBFk BAVA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=1KRsRLm2++41GdQ5sZ2GzvmAHid+tTRCcHl3iqSl1Lc=; b=M0aObvZMgasy5iJ6KZTPVqVpvjtIoIYli5aPDi18TdU6xf3MFsZdZmkaQ1u2XbGFhb TMs7eKx+3V5ju8UnzGMJgebEKkhu1EUYPE8gpnCtYh66/UoT1U2U1dlSGJNQlT9DaGHD 0yloxzvk7720Hs3VD4Q3+YQ11YWQQQZ3rjiDV7927FA+WaMnRk+GQpBZZvfjKJN9kFUw Zgd+opt1ZNbxzvEXOF8Bpvh6U919nVGfp9c5v5+bx9YLucLg0YkhW/pVlE9BkPzEusGe /rwJXg4lj31lqKHB7orDg0yaEuAxvheZWUrQgk0q1itijKaTn7Ik8nWlEoOGIt48VEeW 8Img== X-Gm-Message-State: AA+aEWYBT/LTxF5knZF9CQLVhaNNMyZ8OynMAxGv+BWAhdr0E8ln4gpc s4I8i+8HgcsNnMlW0HQuqmAOjeuIwX0MBYfYGafqPQ== X-Received: by 2002:ab0:3392:: with SMTP id y18mr1718755uap.117.1542749710442; Tue, 20 Nov 2018 13:35:10 -0800 (PST) MIME-Version: 1.0 References: <20181120194129.GC13936@tassilo.jf.intel.com> <20181120201144.GD13936@tassilo.jf.intel.com> In-Reply-To: From: Kyle Huey Date: Tue, 20 Nov 2018 13:34:38 -0800 Message-ID: Subject: Re: [REGRESSION] x86, perf: counter freezing breaks rr To: Stephane Eranian Cc: Andi Kleen , Kan Liang , "Peter Zijlstra (Intel)" , Ingo Molnar , "Robert O'Callahan" , Alexander Shishkin , Arnaldo Carvalho de Melo , Jiri Olsa , Linus Torvalds , Thomas Gleixner , Vince Weaver , acme@kernel.org, open list Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 20, 2018 at 1:19 PM Stephane Eranian wrote: > > On Tue, Nov 20, 2018 at 12:53 PM Kyle Huey wrote: > > > > On Tue, Nov 20, 2018 at 12:11 PM Andi Kleen wrote: > > > > > > > > > Given that we're already at rc3, and that this renders rr unusable, > > > > > > we'd ask that counter freezing be disabled for the 4.20 release. > > > > > > > > > > The boot option should be good enough for the release? > > > > > > > > I'm not entirely sure what you mean here. We want you to flip the > > > > default boot option so this feature is off for this release. i.e. rr > > > > should work by default on 4.20 and people should have to opt into the > > > > inaccurate behavior if they want faster PMI servicing. > > > > > > I don't think it's inaccurate, it's just different > > > than what you are used to. > > > > > > For profiling including the kernel it's actually far more accurate > > > because the count is stopped much earlier near the sampling > > > point. Otherwise there is a considerable over count into > > > the PMI handler. > > > > > > In your case you limit the count to ring 3 so it's always cut off > > > at the transition point into the kernel, while with freezing > > > it's at the overflow point. > > > > I suppose that's fair that it's better for some use cases. The flip > > side is that it's no longer possible to get exactly accurate counts > > from user space if you're using the PMI (because any events between > > the overflow itself and the transition to the PMI handler are > > permanently lost) which is catastrophically bad for us :) > > > Let me make sure I got this right. During recording, you count on > retired-cond-branch > and you record the value of the PMU counter at specific locations, > e.g., syscalls. > During replay, you program the branch-conditional-retired to overflow > on interrupt at > each recorded values. So if you sampled the event at 1,000,000 and > then at 1,500,000. > Then you program the event with a period of 1,000,000 first, on > overflow the counter interrupts > and you get a signal. Then, you reprogram the event for a new period > of 500,000. During recording > and replay the event is limited to ring 3 (user level). Am I > understanding this right? This is largely correct, except that we only program the interrupt for events that we would not naturally stop at during the course of execution such as asynchronous signals or context switch points. At events that we would naturally stop at (i.e. we can stop at syscalls via ptrace) we simply check that the counters match to find any discrepancies faster, before they affect an async signal delivery. Let's say I have the following event sequence: 1. alarm syscall at rbc=1000 2. SIGALARM delivery at rbc=8000 3. exit syscall at rbc=9000 During replay, we begin the program and run to the syscall via a PTRACE_SYSCALL ptrace. When the replayed process stops, we check that the value of the rbc counter is 1000 (we also check that all registers match what we recorded) and then we emulate the effects of the syscall on the replayed process's registers and memory. Then we see that the next event is an asynchronous signal, and we program the rbc counter to interrupt after an additional (8000 - 1000 - SKID_SIZE) events (where SKID_SIZE has been chosen by experimentation to ensure that the PMU interrupt is not delivered *after* the point in the program we care about. For Skylake this value is 100). We then resume the program with a PTRACE_CONT ptrace and wait for the PMI to stop the replayed tracee. We advance the program to the exact point that we care about through a combination of breakpoints and singlestepping, and then deliver the SIGALARM. Once that is done, we see that the next event is the exit syscall, and we again do a PTRACE_SYSCALL ptrace to get to it. Once there we check the rbc counter value and registers match what were recorded, and perform the syscall. Our counters are always restricted to ring 3 in both recording and replay. - Kyle