Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755311AbaBTSws (ORCPT ); Thu, 20 Feb 2014 13:52:48 -0500 Received: from mail-qc0-f171.google.com ([209.85.216.171]:62685 "EHLO mail-qc0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754166AbaBTSwq (ORCPT ); Thu, 20 Feb 2014 13:52:46 -0500 Date: Thu, 20 Feb 2014 13:54:45 -0500 (EST) From: Vince Weaver To: Peter Zijlstra cc: Vince Weaver , Dave Jones , Linux Kernel , Ingo Molnar , Paul Mackerras , Steven Rostedt Subject: Re: x86_pmu_start WARN_ON. In-Reply-To: <20140220182300.GN9987@twins.programming.kicks-ass.net> Message-ID: References: <20140217152859.GF15586@twins.programming.kicks-ass.net> <20140219101949.GG15586@twins.programming.kicks-ass.net> <20140220100830.GN6835@laptop.programming.kicks-ass.net> <20140220182300.GN9987@twins.programming.kicks-ass.net> User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 20 Feb 2014, Peter Zijlstra wrote: > On Thu, Feb 20, 2014 at 01:03:16PM -0500, Vince Weaver wrote: > > attached, it's not very big. > > This is I think the relevant bit: > > pec_1076_warn-2804 [000] d... 147.926153: x86_pmu_disable: x86_pmu_disable > pec_1076_warn-2804 [000] d... 147.926153: x86_pmu_state: Events: { > pec_1076_warn-2804 [000] d... 147.926156: x86_pmu_state: 0: state: .R config: ffffffffffffffff ( (null)) > pec_1076_warn-2804 [000] d... 147.926158: x86_pmu_state: 33: state: AR config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926159: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926160: x86_pmu_state: n_events: 1, n_added: 0, n_txn: 1 > pec_1076_warn-2804 [000] d... 147.926161: x86_pmu_state: Assignment: { > pec_1076_warn-2804 [000] d... 147.926162: x86_pmu_state: 0->33 tag: 1 config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926163: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926166: collect_events: Adding event: 1 (ffff880119ec8800) > pec_1076_warn-2804 [000] d... 147.926170: collect_events: Adding event: 0 (ffff8800c9e01800) > pec_1076_warn-2804 [000] d... 147.926172: collect_events: Adding event: 4 (ffff8800cbab2c00) > pec_1076_warn-2804 [000] d... 147.926177: x86_pmu_enable: x86_pmu_enable > pec_1076_warn-2804 [000] d... 147.926177: x86_pmu_state: Events: { > pec_1076_warn-2804 [000] d... 147.926179: x86_pmu_state: 0: state: .R config: ffffffffffffffff ( (null)) > pec_1076_warn-2804 [000] d... 147.926181: x86_pmu_state: 33: state: AR config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926182: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926184: x86_pmu_state: n_events: 2, n_added: 2, n_txn: 2 > pec_1076_warn-2804 [000] d... 147.926184: x86_pmu_state: Assignment: { > pec_1076_warn-2804 [000] d... 147.926186: x86_pmu_state: 0->33 tag: 1 config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926188: x86_pmu_state: 1->0 tag: 1 config: 1 (ffff880119ec8800) > pec_1076_warn-2804 [000] d... 147.926188: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926190: x86_pmu_enable: S0: hwc->idx: 33, hwc->last_cpu: 0, hwc->last_tag: 1 hwc->state: 0 > pec_1076_warn-2804 [000] d... 147.926191: x86_pmu_enable: starting: 0 > > so it does indeed look like n_added got scrambled; we started out with 1 > event on disable; we've got 2 events on enable, but n_added is also 2, > which would suggest we had 0 on disable. > > That makes us want to (re)start the NMI counter alright. Might be relevant: check the last_cpu values. Right before the above it looks like the thread gets moved from CPU 1 to CPU 0 (possibly as a result of the long chain started with the close() of the tracepoint event), so the problem NMI watchdog event being enabled is a different one than the one that was disabled just before. Vince -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/