Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753613AbdF1CJN (ORCPT ); Tue, 27 Jun 2017 22:09:13 -0400 Received: from mga05.intel.com ([192.55.52.43]:11193 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753016AbdF1CJH (ORCPT ); Tue, 27 Jun 2017 22:09:07 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.40,273,1496127600"; d="scan'208";a="1165402737" Subject: Re: [REGRESSION] perf/core: PMU interrupts dropped if we entered the kernel in the "skid" region To: Kyle Huey , Ingo Molnar Cc: "Peter Zijlstra (Intel)" , stable@vger.kernel.org, Alexander Shishkin , Arnaldo Carvalho de Melo , Jiri Olsa , Linus Torvalds , Namhyung Kim , Stephane Eranian , Thomas Gleixner , Vince Weaver , acme@kernel.org, jolsa@kernel.org, kan.liang@intel.com, Mark Rutland , Will Deacon , gregkh@linuxfoundation.org, "Robert O'Callahan" , open list References: From: "Jin, Yao" Message-ID: <2256f9b5-1277-c4b1-1472-61a10cd1db9a@linux.intel.com> Date: Wed, 28 Jun 2017 10:09:01 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1719 Lines: 46 Hi, In theory, the PMI interrupts in skid region should be dropped, right? For a userspace debugger, is it the only choice that relies on the *skid* PMI interrupt? Thanks Jin Yao On 6/28/2017 9:01 AM, Kyle Huey wrote: > Sent again with LKML CCd, sorry for the noise. > > - Kyle > > On Tue, Jun 27, 2017 at 5:38 PM, Kyle Huey wrote: >> cc1582c231ea introduced a regression in v4.12.0-rc5, and appears to be >> a candidate for backporting to stable branches. >> >> rr, a userspace record and replay debugger[0], uses the PMU interrupt >> to stop a program during replay to inject asynchronous events such as >> signals. We are counting retired conditional branches in userspace >> only. This changeset causes the kernel to drop interrupts on the >> floor if, during the PMU interrupt's "skid" region, the CPU enters >> kernel mode for whatever reason. When replaying traces of complex >> programs such as Firefox, we intermittently fail to deliver >> asynchronous events on time, leading the replay to diverge from the >> recorded state. >> >> It seems like this change should, at a bare minimum, be limited to >> counters that actually perform sampling of register state when the >> interrupt fires. In our case, with the retired conditional branches >> counter restricted to counting userspace events only, it makes no >> difference that the PMU interrupt happened to be delivered in the >> kernel. >> >> As this makes rr unusable on complex applications and cannot be >> efficiently worked around, we would appreciate this being addressed >> before 4.12 is finalized, and the regression not being introduced to >> stable branches. >> >> Thanks, >> >> - Kyle >> >> [0] http://rr-project.org/