Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753483AbbBYPQ6 (ORCPT ); Wed, 25 Feb 2015 10:16:58 -0500 Received: from bombadil.infradead.org ([198.137.202.9]:34463 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752593AbbBYPQ4 (ORCPT ); Wed, 25 Feb 2015 10:16:56 -0500 Date: Wed, 25 Feb 2015 16:16:39 +0100 From: Peter Zijlstra To: Vince Weaver Cc: linux-kernel@vger.kernel.org, Paul Mackerras , Ingo Molnar , Arnaldo Carvalho de Melo , Jiri Olsa Subject: Re: perf: fuzzer causes lockup in x86_pmu_event_init() Message-ID: <20150225151639.GL5029@twins.programming.kicks-ass.net> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2908 Lines: 87 On Mon, Feb 23, 2015 at 10:56:10PM -0500, Vince Weaver wrote: > On Tue, 17 Feb 2015, Vince Weaver wrote: > > > This is on a Haswell machine, current git as of this past Friday. > > > > I let the perf_fuzzer run and it took 4 days to find this. > > Sadly it doesn't seem to be reproducible so I am not sure > > how it exactly got into this state. > > I have hit this on another machine, my core2 machine (after 10 days of > fuzzing). So this seems to be a real issue although hard to hit. > > The problem seems to map to > arch/x86/kernel/cpu/perf_event.c:824 > > It is stuck forever in this loop in collect_events() > > list_for_each_entry(event, &leader->sibling_list, group_entry) { > if (!is_x86_event(event) || > event->state <= PERF_EVENT_STATE_OFF) > continue; > > if (n >= max_count) > return -EINVAL; > > cpuc->event_list[n] = event; > n++; > } > > [884044.228001] RIP: 0010:[] [] x86_pmu_event_init+0x138/0x31d > [884044.228001] Call Trace: > [884044.228001] [] perf_try_init_event+0x25/0x47 > [884044.228001] [] perf_init_event+0x93/0xca > [884044.228001] [] perf_event_alloc+0x29b/0x32d > [884044.228001] [] SYSC_perf_event_open+0x417/0x89c > [884044.228001] [] SyS_perf_event_open+0x9/0xb That smells like a corrupted sibling_list, I see no other way for that loop to not end. It occurs to me that that list iteration is entirely unserialized, we should be holding a ctx lock or mutex, but we do not. Now IIRC the perf fuzzer is single threaded, so it would not actually trigger the most horrible cases here; but this does smell bad. Does something like the below make sense and/or help? Jolsa? --- kernel/events/core.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/kernel/events/core.c b/kernel/events/core.c index af924bc38121..763e7c02e796 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7049,12 +7049,23 @@ EXPORT_SYMBOL_GPL(perf_pmu_unregister); static int perf_try_init_event(struct pmu *pmu, struct perf_event *event) { + struct perf_event_context *ctx = NULL; int ret; if (!try_module_get(pmu->module)) return -ENODEV; + + if (event->group_leader != event) { + ctx = perf_event_ctx_lock(event->group_leader); + BUG_ON(!ctx); + } + event->pmu = pmu; ret = pmu->event_init(event); + + if (ctx) + perf_event_ctx_unlock(event->group_leader, ctx); + if (ret) module_put(pmu->module); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/