Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1946646AbbEVJ05 (ORCPT ); Fri, 22 May 2015 05:26:57 -0400 Received: from mail-ob0-f180.google.com ([209.85.214.180]:34917 "EHLO mail-ob0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1946631AbbEVJ0t (ORCPT ); Fri, 22 May 2015 05:26:49 -0400 MIME-Version: 1.0 In-Reply-To: <20150522064955.GA26489@gmail.com> References: <20150521125615.GO3644@twins.programming.kicks-ass.net> <20150521130952.GQ3644@twins.programming.kicks-ass.net> <20150521132015.GS3644@twins.programming.kicks-ass.net> <1432214957.30671.0.camel@twins> <1432217038.30671.7.camel@twins> <20150522064955.GA26489@gmail.com> Date: Fri, 22 May 2015 02:26:48 -0700 Message-ID: Subject: Re: [PATCH 01/10] perf,x86: Fix event/group validation From: Stephane Eranian To: Ingo Molnar Cc: Peter Zijlstra , Vince Weaver , Jiri Olsa , "Liang, Kan" , LKML , Andrew Hunter , Maria Dimakopoulou Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3206 Lines: 80 On Thu, May 21, 2015 at 11:49 PM, Ingo Molnar wrote: > > > * Stephane Eranian wrote: > > > On Thu, May 21, 2015 at 7:03 AM, Peter Zijlstra wrote: > > > On Thu, 2015-05-21 at 06:36 -0700, Stephane Eranian wrote: > > >> On Thu, May 21, 2015 at 6:29 AM, Peter Zijlstra wrote: > > >> > On Thu, 2015-05-21 at 06:27 -0700, Stephane Eranian wrote: > > >> >> Or are you talking about a preemption while executing x86_schedule_events()? > > >> > > > >> > That. > > >> > > > >> > And we can of course cure that by an earlier patch I send; but I find it > > >> > a much simpler rule to just never allow modifying global state for > > >> > validation. > > >> > > >> I can see validation being preempted, but not the context switch code path. > > >> Is that what you are talking about? > > >> > > >> You are saying validate_group() is in the middle of x86_schedule_events() > > >> using fake_cpuc, when it gets preempted. The context switch code when it loads > > >> the new thread's PMU state calls x86_schedule_events() which modifies the > > >> cpuc->event_list[]->hwc. But this is cpuc vs. fake_cpuc again. So yes, the calls > > >> nest but they do not touch the same state. > > > > > > They both touch event->hw->constraint. > > > > > >> And when you eventually come back > > >> to validate_group() you are back to using the fake_cpuc. So I am still not clear > > >> on how the corruption can happen. > > > > > > validate_group() > > > x86_schedule_events() > > > event->hw.constraint = c; # store > > > > > > > > > perf_task_event_sched_in() > > > ... > > > x86_schedule_events(); > > > event->hw.constraint = c2; # store > > > > > > ... > > > > > > put_event_constraints(event); # assume failure to schedule > > > intel_put_event_constraints() > > > event->hw.constraint = NULL; > > > > > > > > > > > > c = event->hw.constraint; # read -> NULL > > > > > > if (!test_bit(hwc->idx, c->idxmsk)) # <- *BOOM* NULL deref > > > > > > > > > This in particular is possible when the event in question is a cpu-wide > > > event and group-leader, where the validate_group() tries to add an event > > > to the group. > > > > Ok, I think I get it now. It is not related to fake_cpuc vs. cpuc, > > it is related to the fact that the constraint is cached in the event > > struct itself and that one is shared between validate_group() and > > x86_schedule_events() because cpu_hw_event->event_list[] is an array > > of pointers to events and not an array of events. > > Btw., comments and the code structure should be greatly enhanced to > make all that very clear and hard to mess up. > Peter and I will clean this up. > > A month ago perf became fuzzing-proof, and now that's down the drain > again... > It will be fixed this week. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/