Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751689AbdF1Ffy (ORCPT ); Wed, 28 Jun 2017 01:35:54 -0400 Received: from mga03.intel.com ([134.134.136.65]:21131 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751300AbdF1Ffx (ORCPT ); Wed, 28 Jun 2017 01:35:53 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.40,274,1496127600"; d="scan'208";a="1165455677" Subject: Re: [REGRESSION] perf/core: PMU interrupts dropped if we entered the kernel in the "skid" region To: Kyle Huey Cc: Ingo Molnar , "Peter Zijlstra (Intel)" , stable@vger.kernel.org, Alexander Shishkin , Arnaldo Carvalho de Melo , Jiri Olsa , Linus Torvalds , Namhyung Kim , Stephane Eranian , Thomas Gleixner , Vince Weaver , acme@kernel.org, jolsa@kernel.org, kan.liang@intel.com, Mark Rutland , Will Deacon , gregkh@linuxfoundation.org, "Robert O'Callahan" , open list , yao.jin@intel.com References: <2256f9b5-1277-c4b1-1472-61a10cd1db9a@linux.intel.com> From: "Jin, Yao" Message-ID: Date: Wed, 28 Jun 2017 13:35:47 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3762 Lines: 95 Hi Kyle, I understand your requirement. Sorry I don't know the detail of rr debugger, but I guess if it just uses counter overflow to deliver a signal, could it set the counter without "exclude_kernel"? Another consideration is, I'm not sure if the kernel can accurately realize that if the interrupt is to trigger sampling or just deliver a signal. Userspace may know that but kernel may not. Anyway let's listen to more comments from community. Thanks Jin Yao On 6/28/2017 12:51 PM, Kyle Huey wrote: > On Tue, Jun 27, 2017 at 7:09 PM, Jin, Yao wrote: >> Hi, >> >> In theory, the PMI interrupts in skid region should be dropped, right? > No, why would they be dropped? > > My understanding of the situation is as follows: > > There is some time, call it t_0, where the hardware counter overflows. > The PMU triggers an interrupt, but this is not instantaneous. Call > the time when the interrupt is actually delivered t_1. Then t_1 - t_0 > is the "skid". > > Note that if the counter is `exclude_kernel`, then at t_0 the CPU > *must* be running a userspace program. But by t_1, the CPU may be > doing something else. Your patch changed things so that if at t_1 the > CPU is in the kernel, then the interrupt is discarded. But rr has > programmed the counter to deliver a signal on overflow (via F_SETSIG > on the fd returned by perf_event_open). This change results in the > signal never being delivered, because the interrupt was ignored. > (More accurately, the signal is delivered the *next* time the counter > overflows, which is far past where we wanted to inject our > asynchronous event into our tracee. > > It seems to me that it might be reasonable to ignore the interrupt if > the purpose of the interrupt is to trigger sampling of the CPUs > register state. But if the interrupt will trigger some other > operation, such as a signal on an fd, then there's no reason to drop > it. > >> For a userspace debugger, is it the only choice that relies on the *skid* >> PMI interrupt? > I don't understand this question, but hopefully the above clarified things. > > - Kyle > >> Thanks >> Jin Yao >> >> >> On 6/28/2017 9:01 AM, Kyle Huey wrote: >>> Sent again with LKML CCd, sorry for the noise. >>> >>> - Kyle >>> >>> On Tue, Jun 27, 2017 at 5:38 PM, Kyle Huey wrote: >>>> cc1582c231ea introduced a regression in v4.12.0-rc5, and appears to be >>>> a candidate for backporting to stable branches. >>>> >>>> rr, a userspace record and replay debugger[0], uses the PMU interrupt >>>> to stop a program during replay to inject asynchronous events such as >>>> signals. We are counting retired conditional branches in userspace >>>> only. This changeset causes the kernel to drop interrupts on the >>>> floor if, during the PMU interrupt's "skid" region, the CPU enters >>>> kernel mode for whatever reason. When replaying traces of complex >>>> programs such as Firefox, we intermittently fail to deliver >>>> asynchronous events on time, leading the replay to diverge from the >>>> recorded state. >>>> >>>> It seems like this change should, at a bare minimum, be limited to >>>> counters that actually perform sampling of register state when the >>>> interrupt fires. In our case, with the retired conditional branches >>>> counter restricted to counting userspace events only, it makes no >>>> difference that the PMU interrupt happened to be delivered in the >>>> kernel. >>>> >>>> As this makes rr unusable on complex applications and cannot be >>>> efficiently worked around, we would appreciate this being addressed >>>> before 4.12 is finalized, and the regression not being introduced to >>>> stable branches. >>>> >>>> Thanks, >>>> >>>> - Kyle >>>> >>>> [0] http://rr-project.org/ >>