Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751190AbbEGMnQ (ORCPT ); Thu, 7 May 2015 08:43:16 -0400 Received: from casper.infradead.org ([85.118.1.10]:58411 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750780AbbEGMnO (ORCPT ); Thu, 7 May 2015 08:43:14 -0400 Date: Thu, 7 May 2015 14:43:00 +0200 From: Peter Zijlstra To: Stephane Eranian Cc: Vince Weaver , LKML , Arnaldo Carvalho de Melo , Jiri Olsa , Ingo Molnar , Paul Mackerras Subject: Re: perf: fuzzer triggers NULL pointer derefreence in x86_schedule_events Message-ID: <20150507124300.GK23123@twins.programming.kicks-ass.net> References: <20150501125955.GF5029@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1922 Lines: 48 On Mon, May 04, 2015 at 12:32:56PM -0700, Stephane Eranian wrote: > On Fri, May 1, 2015 at 5:59 AM, Peter Zijlstra wrote: > > > > On Thu, Apr 30, 2015 at 03:08:56PM -0400, Vince Weaver wrote: > > > > > > So the perf_fuzzer caught this after about a week of fuzzing on a Haswell > > > machine running a recent git kernel (pre 4.1-rc1 though). > > > > > > We've seen this BUG before and various fixes were applied but apparently > > > it wasn't enough. > > > > > > Sadly it doesn't seem to be reproducible. > > > > > > validate_group() -> x86_pmu.schedule_events() -> ???? -> variable_test_bit() > > > (hard to tell which test bit with all the inlining going on). > > > > Assuming you build with debug info addr2line -i can help, but I think I > > found it by comparing the Code section below with my objdump -D output. > > > > Its: > > /* constraint still honored */ > > if (!test_bit(hwc->idx, c->idxmsk)) > > break; > > > > Which would seem to suggest c is NULL. > > > But then, you'd crash in the previous loop, because after > get_event_contraint(), you touch > c->weight. Indeed so; and we can make an analogous argument for hwc. However: > I think it is more likely related to the bitmask (idxmsk). But then > it is always allocated with the constraint even with the HT bug > workaround. So most, likely the index is bogus and you touch outside > the idxmsk[] array. [428232.701319] BUG: unable to handle kernel NULL pointer dereference at (null) But the thing really tried to touch NULL, not some random address that faulted. As always, Vince has found us a good puzzle ;-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/