From: Vince Weaver <vincent.weaver@maine.edu>
Date: Mon, 11 May 2015 17:19:28 -0400 (EDT)
To: Stephane Eranian <eranian@google.com>
cc: Vince Weaver <vincent.weaver@maine.edu>,
        LKML <linux-kernel@vger.kernel.org>,
        Peter Zijlstra <peterz@infradead.org>,
        Arnaldo Carvalho de Melo <acme@kernel.org>,
        Jiri Olsa <jolsa@redhat.com>, Ingo Molnar <mingo@redhat.com>,
        Paul Mackerras <paulus@samba.org>
Subject: Re: perf: another perf_fuzzer generated lockup
In-Reply-To: <CABPqkBStVPW_W29-6Q0fcNeHPUmRWow3ckT--0rCr-5EM07wow@mail.gmail.com>
Message-ID: <alpine.DEB.2.11.1505111713380.7482@vincent-weaver-1.umelst.maine.edu>
References: <alpine.DEB.2.11.1505080025560.26907@vincent-weaver-1.umelst.maine.edu> <CABPqkBStVPW_W29-6Q0fcNeHPUmRWow3ckT--0rCr-5EM07wow@mail.gmail.com>
User-Agent: Alpine 2.11 (DEB 23 2013-08-11)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2096
Lines: 50

On Fri, 8 May 2015, Stephane Eranian wrote:

> Vince,
> 
> On Thu, May 7, 2015 at 9:40 PM, Vince Weaver <vincent.weaver@maine.edu> wrote:
> >
> >
> > This is a new one I think, I hit it on the haswell machine running
> > 4.1-rc2.
> >
> > The backtrace is complex enough I'm not really sure what's going on here.
> >
> > The fuzzer has been having weird issues where it's been getting
> > overflow signals from invalid fds.  This seems to happen
> > when an overflow signal interrupts the fuzzer mid-fork?
> > And the fuzzer code doesn't handle this well and attempts to call exit()
> > and/or kill the child from the signal handler that interrupted the
> > fork() and that doesn't always go well.  I'm not sure if this is related,
> > just that some of those actions seem to appear in the backtrace.
> >
> >
> Is there a way to figure out how the fuzzer had programmed the PMU
> to get there? (besides adding PMU state dump in the kernel crashdump)?

Not easily.  In theory the fuzzer can regenerate state from the random 
seed, but some of these bugs seem to be timing related or race conditions, 
so they don't always replicate.  

Also I can make the fuzzer dump the state, but often it has 100+ events 
active and no way of knowing which ones are currently scheduled onto the 
CPU.

Dumping the PMU state might help, but at the same time there's all the 
other events going on such as software and tracepoint events and they 
might all be contributing.

This particular bug almost replicates; the system definitely pauses for a 
bit even if not long enough to trigger the watchdog.

I've been meaning to work on it more but we just finished with finals so I'm 
stuck doing more or less nothing but grading this next week.  After that I 
should have some time to work on this issue plus a couple other warnings 
the fuzzer has been showing.

Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/