Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753826AbdF1EvK (ORCPT ); Wed, 28 Jun 2017 00:51:10 -0400 Received: from mail-yb0-f194.google.com ([209.85.213.194]:33593 "EHLO mail-yb0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751633AbdF1EvC (ORCPT ); Wed, 28 Jun 2017 00:51:02 -0400 MIME-Version: 1.0 In-Reply-To: <2256f9b5-1277-c4b1-1472-61a10cd1db9a@linux.intel.com> References: <2256f9b5-1277-c4b1-1472-61a10cd1db9a@linux.intel.com> From: Kyle Huey Date: Tue, 27 Jun 2017 21:51:00 -0700 Message-ID: Subject: Re: [REGRESSION] perf/core: PMU interrupts dropped if we entered the kernel in the "skid" region To: "Jin, Yao" Cc: Ingo Molnar , "Peter Zijlstra (Intel)" , stable@vger.kernel.org, Alexander Shishkin , Arnaldo Carvalho de Melo , Jiri Olsa , Linus Torvalds , Namhyung Kim , Stephane Eranian , Thomas Gleixner , Vince Weaver , acme@kernel.org, jolsa@kernel.org, kan.liang@intel.com, Mark Rutland , Will Deacon , gregkh@linuxfoundation.org, "Robert O'Callahan" , open list Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3159 Lines: 82 On Tue, Jun 27, 2017 at 7:09 PM, Jin, Yao wrote: > Hi, > > In theory, the PMI interrupts in skid region should be dropped, right? No, why would they be dropped? My understanding of the situation is as follows: There is some time, call it t_0, where the hardware counter overflows. The PMU triggers an interrupt, but this is not instantaneous. Call the time when the interrupt is actually delivered t_1. Then t_1 - t_0 is the "skid". Note that if the counter is `exclude_kernel`, then at t_0 the CPU *must* be running a userspace program. But by t_1, the CPU may be doing something else. Your patch changed things so that if at t_1 the CPU is in the kernel, then the interrupt is discarded. But rr has programmed the counter to deliver a signal on overflow (via F_SETSIG on the fd returned by perf_event_open). This change results in the signal never being delivered, because the interrupt was ignored. (More accurately, the signal is delivered the *next* time the counter overflows, which is far past where we wanted to inject our asynchronous event into our tracee. It seems to me that it might be reasonable to ignore the interrupt if the purpose of the interrupt is to trigger sampling of the CPUs register state. But if the interrupt will trigger some other operation, such as a signal on an fd, then there's no reason to drop it. > For a userspace debugger, is it the only choice that relies on the *skid* > PMI interrupt? I don't understand this question, but hopefully the above clarified things. - Kyle > Thanks > Jin Yao > > > On 6/28/2017 9:01 AM, Kyle Huey wrote: >> >> Sent again with LKML CCd, sorry for the noise. >> >> - Kyle >> >> On Tue, Jun 27, 2017 at 5:38 PM, Kyle Huey wrote: >>> >>> cc1582c231ea introduced a regression in v4.12.0-rc5, and appears to be >>> a candidate for backporting to stable branches. >>> >>> rr, a userspace record and replay debugger[0], uses the PMU interrupt >>> to stop a program during replay to inject asynchronous events such as >>> signals. We are counting retired conditional branches in userspace >>> only. This changeset causes the kernel to drop interrupts on the >>> floor if, during the PMU interrupt's "skid" region, the CPU enters >>> kernel mode for whatever reason. When replaying traces of complex >>> programs such as Firefox, we intermittently fail to deliver >>> asynchronous events on time, leading the replay to diverge from the >>> recorded state. >>> >>> It seems like this change should, at a bare minimum, be limited to >>> counters that actually perform sampling of register state when the >>> interrupt fires. In our case, with the retired conditional branches >>> counter restricted to counting userspace events only, it makes no >>> difference that the PMU interrupt happened to be delivered in the >>> kernel. >>> >>> As this makes rr unusable on complex applications and cannot be >>> efficiently worked around, we would appreciate this being addressed >>> before 4.12 is finalized, and the regression not being introduced to >>> stable branches. >>> >>> Thanks, >>> >>> - Kyle >>> >>> [0] http://rr-project.org/ > >