Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751104AbdGNUOd (ORCPT ); Fri, 14 Jul 2017 16:14:33 -0400 Received: from mail-io0-f171.google.com ([209.85.223.171]:33707 "EHLO mail-io0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751035AbdGNUOb (ORCPT ); Fri, 14 Jul 2017 16:14:31 -0400 From: Vince Weaver X-Google-Original-From: Vince Weaver Date: Fri, 14 Jul 2017 16:14:26 -0400 (EDT) X-X-Sender: vince@macbook-air To: Alexander Shishkin cc: linux-kernel@vger.kernel.org, Ingo Molnar , Peter Zijlstra , Stephane Eranian Subject: Re: perf: bisected sampling bug in Linux 4.11-rc1 In-Reply-To: <87tw2ewzmz.fsf@ashishki-desk.ger.corp.intel.com> Message-ID: References: <87tw2ewzmz.fsf@ashishki-desk.ger.corp.intel.com> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1867 Lines: 55 On Fri, 14 Jul 2017, Alexander Shishkin wrote: > Vince Weaver writes: > > > I was tracking down some regressions in my perf_event_test testsuite. > > Some of the tests broke in the 4.11-rc1 timeframe. > > > > I've bisected one of them, this report is about > > tests/overflow/simul_oneshot_group_overflow > > This test creates an event group containing two sampling events, set > > to overflow to a signal handler (which disables and then refreshes the > > event). > > > > On a good kernel you get the following: > > Event perf::instructions with period 1000000 > > Event perf::instructions with period 2000000 > > fd 3 overflows: 946 (perf::instructions/1000000) > > fd 4 overflows: 473 (perf::instructions/2000000) > > Ending counts: > > Count 0: 946379875 > > Count 1: 946365218 > > > > With the broken kernels you get: > > Event perf::instructions with period 1000000 > > Event perf::instructions with period 2000000 > > fd 3 overflows: 938 (perf::instructions/1000000) > > fd 4 overflows: 318 (perf::instructions/2000000) > > Ending counts: > > Count 0: 946373080 > > Count 1: 653373058 > > I'm not sure I'm seeing it (granted, it's a friday evening): is it the > difference in overflow counts? It's two things. It's created an grouped event, with the two events both perf::instructions. 1. The total count at the end should be the same for both (on the failing kernels it is not) 2. The overflow count for both events should be roughly total_events/sample_freq. (on the failing kernels it is not) > Also, are they cpu or task bound? The open looks like this: perf_event_open(&pe,0,-1,-1,0); On the failing case, the group leader is pinned. The source code for the test is here: https://github.com/deater/perf_event_tests/blob/master/tests/overflow/simul_oneshot_group_overflow.c Vince