Date: Thu, 7 May 2015 14:43:00 +0200
From: Peter Zijlstra <peterz@infradead.org>
To: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>,
        LKML <linux-kernel@vger.kernel.org>,
        Arnaldo Carvalho de Melo <acme@kernel.org>,
        Jiri Olsa <jolsa@redhat.com>, Ingo Molnar <mingo@redhat.com>,
        Paul Mackerras <paulus@samba.org>
Subject: Re: perf: fuzzer triggers NULL pointer derefreence in
 x86_schedule_events
Message-ID: <20150507124300.GK23123@twins.programming.kicks-ass.net>
References: <alpine.DEB.2.11.1504301448440.30050@vincent-weaver-1.umelst.maine.edu>
 <20150501125955.GF5029@twins.programming.kicks-ass.net>
 <CABPqkBRiAe3HmYN7vACH-8OPMOLXKrETY_8n0Hm3pyVcuLHfug@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CABPqkBRiAe3HmYN7vACH-8OPMOLXKrETY_8n0Hm3pyVcuLHfug@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2012-12-30)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1922
Lines: 48

On Mon, May 04, 2015 at 12:32:56PM -0700, Stephane Eranian wrote:
> On Fri, May 1, 2015 at 5:59 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Thu, Apr 30, 2015 at 03:08:56PM -0400, Vince Weaver wrote:
> > >
> > > So the perf_fuzzer caught this after about a week of fuzzing on a Haswell
> > > machine running a recent git kernel (pre 4.1-rc1 though).
> > >
> > > We've seen this BUG before and various fixes were applied but apparently
> > > it wasn't enough.
> > >
> > > Sadly it doesn't seem to be reproducible.
> > >
> > > validate_group() -> x86_pmu.schedule_events() -> ???? -> variable_test_bit()
> > >  (hard to tell which test bit with all the inlining going on).
> >
> > Assuming you build with debug info addr2line -i can help, but I think I
> > found it by comparing the Code section below with my objdump -D output.
> >
> > Its:
> >                 /* constraint still honored */
> >                 if (!test_bit(hwc->idx, c->idxmsk))
> >                         break;
> >
> > Which would seem to suggest c is NULL.
> >
> But then, you'd crash in the previous loop, because after
> get_event_contraint(), you touch
> c->weight.

Indeed so; and we can make an analogous argument for hwc. However:

> I think it is more likely related to the bitmask (idxmsk).  But then
> it is always allocated with the constraint even with the HT bug
> workaround.  So most, likely the index is bogus and you touch outside
> the idxmsk[] array.

[428232.701319] BUG: unable to handle kernel NULL pointer dereference at           (null)

But the thing really tried to touch NULL, not some random address that
faulted.

As always, Vince has found us a good puzzle ;-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/