Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753927AbbEKVOT (ORCPT ); Mon, 11 May 2015 17:14:19 -0400 Received: from mail-qc0-f174.google.com ([209.85.216.174]:36259 "EHLO mail-qc0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752145AbbEKVOR (ORCPT ); Mon, 11 May 2015 17:14:17 -0400 From: Vince Weaver X-Google-Original-From: Vince Weaver Date: Mon, 11 May 2015 17:19:28 -0400 (EDT) To: Stephane Eranian cc: Vince Weaver , LKML , Peter Zijlstra , Arnaldo Carvalho de Melo , Jiri Olsa , Ingo Molnar , Paul Mackerras Subject: Re: perf: another perf_fuzzer generated lockup In-Reply-To: Message-ID: References: User-Agent: Alpine 2.11 (DEB 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2096 Lines: 50 On Fri, 8 May 2015, Stephane Eranian wrote: > Vince, > > On Thu, May 7, 2015 at 9:40 PM, Vince Weaver wrote: > > > > > > This is a new one I think, I hit it on the haswell machine running > > 4.1-rc2. > > > > The backtrace is complex enough I'm not really sure what's going on here. > > > > The fuzzer has been having weird issues where it's been getting > > overflow signals from invalid fds. This seems to happen > > when an overflow signal interrupts the fuzzer mid-fork? > > And the fuzzer code doesn't handle this well and attempts to call exit() > > and/or kill the child from the signal handler that interrupted the > > fork() and that doesn't always go well. I'm not sure if this is related, > > just that some of those actions seem to appear in the backtrace. > > > > > Is there a way to figure out how the fuzzer had programmed the PMU > to get there? (besides adding PMU state dump in the kernel crashdump)? Not easily. In theory the fuzzer can regenerate state from the random seed, but some of these bugs seem to be timing related or race conditions, so they don't always replicate. Also I can make the fuzzer dump the state, but often it has 100+ events active and no way of knowing which ones are currently scheduled onto the CPU. Dumping the PMU state might help, but at the same time there's all the other events going on such as software and tracepoint events and they might all be contributing. This particular bug almost replicates; the system definitely pauses for a bit even if not long enough to trigger the watchdog. I've been meaning to work on it more but we just finished with finals so I'm stuck doing more or less nothing but grading this next week. After that I should have some time to work on this issue plus a couple other warnings the fuzzer has been showing. Vince -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/