Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753756AbdF1BBJ (ORCPT ); Tue, 27 Jun 2017 21:01:09 -0400 Received: from mail-yb0-f195.google.com ([209.85.213.195]:35779 "EHLO mail-yb0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753644AbdF1BBB (ORCPT ); Tue, 27 Jun 2017 21:01:01 -0400 MIME-Version: 1.0 In-Reply-To: References: From: Kyle Huey Date: Tue, 27 Jun 2017 18:01:00 -0700 Message-ID: Subject: Re: [REGRESSION] perf/core: PMU interrupts dropped if we entered the kernel in the "skid" region To: Jin Yao , Ingo Molnar Cc: "Peter Zijlstra (Intel)" , stable@vger.kernel.org, Alexander Shishkin , Arnaldo Carvalho de Melo , Jiri Olsa , Linus Torvalds , Namhyung Kim , Stephane Eranian , Thomas Gleixner , Vince Weaver , acme@kernel.org, jolsa@kernel.org, kan.liang@intel.com, Mark Rutland , Will Deacon , gregkh@linuxfoundation.org, "Robert O'Callahan" , open list Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1458 Lines: 35 Sent again with LKML CCd, sorry for the noise. - Kyle On Tue, Jun 27, 2017 at 5:38 PM, Kyle Huey wrote: > cc1582c231ea introduced a regression in v4.12.0-rc5, and appears to be > a candidate for backporting to stable branches. > > rr, a userspace record and replay debugger[0], uses the PMU interrupt > to stop a program during replay to inject asynchronous events such as > signals. We are counting retired conditional branches in userspace > only. This changeset causes the kernel to drop interrupts on the > floor if, during the PMU interrupt's "skid" region, the CPU enters > kernel mode for whatever reason. When replaying traces of complex > programs such as Firefox, we intermittently fail to deliver > asynchronous events on time, leading the replay to diverge from the > recorded state. > > It seems like this change should, at a bare minimum, be limited to > counters that actually perform sampling of register state when the > interrupt fires. In our case, with the retired conditional branches > counter restricted to counting userspace events only, it makes no > difference that the PMU interrupt happened to be delivered in the > kernel. > > As this makes rr unusable on complex applications and cannot be > efficiently worked around, we would appreciate this being addressed > before 4.12 is finalized, and the regression not being introduced to > stable branches. > > Thanks, > > - Kyle > > [0] http://rr-project.org/