Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756152Ab3HFL7i (ORCPT ); Tue, 6 Aug 2013 07:59:38 -0400 Received: from cam-admin0.cambridge.arm.com ([217.140.96.50]:46567 "EHLO cam-admin0.cambridge.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753947Ab3HFL7h (ORCPT ); Tue, 6 Aug 2013 07:59:37 -0400 Date: Tue, 6 Aug 2013 12:59:21 +0100 From: Will Deacon To: Mark Rutland Cc: Vince Weaver , "linux-kernel@vger.kernel.org" , Peter Zijlstra , Ingo Molnar , Paul Mackerras , Arnaldo Carvalho de Melo , "trinity@vger.kernel.org" Subject: Re: perf,arm -- oops in validate_event Message-ID: <20130806115921.GA14798@mudshark.cambridge.arm.com> References: <20130806111932.GA25383@e106331-lin.cambridge.arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130806111932.GA25383@e106331-lin.cambridge.arm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2492 Lines: 56 On Tue, Aug 06, 2013 at 12:19:32PM +0100, Mark Rutland wrote: > On Mon, Aug 05, 2013 at 10:17:37PM +0100, Vince Weaver wrote: > > It looks like in validate_event() we do > > > > struct arm_pmu *armpmu = to_arm_pmu(event->pmu); > > ... > > return armpmu->get_event_idx(hw_events, event) >= 0; > > > > armpmu is read into r3, and somehow the value at the offset of > > armpmu->get_event_idx is either -1 or 0, so when it does a "blx" > > branch to the address at this offset we get the ooops. > > > > c001bf8c: e3120010 tst r2, #16 > > c001bf90: 0a000004 beq c001bfa8 > > c001bf94: e5933070 ldr r3, [r3, #112] ; 0x70 > > * c001bf98: e12fff33 blx r3 > > c001bf9c: e1e00000 mvn r0, r0 > > > > I'm having trouble tracing the code back past that, and I don't have time > > to start adding printk's and recompiling right now. > > > > Vince > > I think I can save you the effort :) > > From the looks of the test case and the kernel code in question, it > looks like the following happens: > > * We create a software event, which becomes its own group leader. > * We create a hardware event, with the software event as its group > leader. > * When we try to schedule the hardware event, we try to validate all > events in its event group (the leader + siblings), but in doing so we > treat the software event as a hardware event, and erroneously try to > get its (non-existent) arm_pmu container, and call some garbage value > as get_event_idx(...). > > This could also happen if we tried to add events from different hardware > PMUs to the same groups. I'm not sure if that's valid, but I couldn't > see any code preventing that, and it seems the x86 validation logic is > wired to allow this. If it's not valid, we could skip validation of > software events by checking with is_software_event. But we already check `event->pmu != leader_pmu' in validate_event, so we shouldn't get anywhere nearer calling get_event_idx in the case you describe. It sounds more like we have an inconsistency with one of the events. Can you dump the events as they're processed in validate_group please? Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/