Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933473AbaLKAjE (ORCPT ); Wed, 10 Dec 2014 19:39:04 -0500 Received: from mail-pd0-f175.google.com ([209.85.192.175]:34492 "EHLO mail-pd0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933226AbaLKAjC (ORCPT ); Wed, 10 Dec 2014 19:39:02 -0500 Message-ID: <5488E7A2.1050400@amacapital.net> Date: Wed, 10 Dec 2014 16:38:58 -0800 From: Andy Lutomirski User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: Vince Weaver , Linus Torvalds CC: Linux Kernel Mailing List , Peter Zijlstra , Ingo Molnar Subject: Re: Linux 3.18 released References: In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/08/2014 10:39 AM, Vince Weaver wrote: > On Sun, 7 Dec 2014, Linus Torvalds wrote: > >> I'd love to say that we've figured out the problem that plagues 3.17 >> for a couple of people, but we haven't. At the same time, there's >> absolutely no point in having everybody else twiddling their thumbs >> when a couple of people are actively trying to bisect an older issue, >> so holding up the release just didn't make sense. Especially since >> that would just have then held things up entirely over the holiday >> break. >> >> So the merge window for 3.19 is open, and DaveJ will hopefully get his >> bisection done (or at least narrow things down sufficiently that we >> have that "Ahaa" moment) over the next week. But in solidarity with >> Dave (and to make my life easier too ;) let's try to avoid introducing >> any _new_ nasty issues, ok? > > It's probably unrelated to DaveJ's issue, but my perf_event fuzzer still > quickly locks the kernel pretty solid on 3.18. > > Just 5 minutes of testing managed to trip over the following issue that > dates back to at least 3.15-rc7 Out of curiosity, can you see if this: https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/paranoid-and-more&id=38e49874d0ab18276f753f5784420b091f4be6eb makes the problem much worse? (Don't take the whole series there -- just cherry-pick the one patch.) --Andy > > My notes say last time I tracked down the issue as so: > > What happens is in kernel/core/events.c find_get_context() > somehow perf_lock_task_context() returns NULL > due to !atomic_inc_not_zero(&ctx->refcount) > but task->perf_event_ctxp[ctxn] still has a valid value. > > There are multiple perf related issues like this that are hard to track > down. They are borderline heisenbugs that are possibly race conditions, > so bisecting doesn't work and even things like enablibg ftrace will make > the issue go away (or crash ftrace itself). > > This particular manifestation of the bug (or bugs) wedges things but I can > use alt-sysrq from the serial console to see where it is stuck (see > below; the CPU is stuck in a loop). > > > [ 2225.916004] [] ? get_page_from_freelist+0x55/0x781 > [ 2225.916004] [] __alloc_pages_nodemask+0x167/0x6dc > [ 2225.916004] [] ? intel_pmu_enable_all+0x28/0xa4 > [ 2225.916004] [] kmem_getpages+0x58/0xec > [ 2225.916004] [] cache_grow+0xad/0x1d8 > [ 2225.916004] [] ____cache_alloc+0x237/0x2ce > [ 2225.916004] [] __kmalloc+0x8f/0xf2 > [ 2225.916004] [] ? T.1336+0xe/0x10 > [ 2225.916004] [] T.1336+0xe/0x10 > [ 2225.916004] [] alloc_perf_context+0x20/0x51 > [ 2225.916004] [] find_get_context+0x138/0x1c7 > [ 2225.916004] [] SYSC_perf_event_open+0x48b/0x870 > [ 2225.916004] [] SyS_perf_event_open+0xe/0x10 > [ 2225.916004] [] system_call_fastpath+0x16/0x1b > > [ 2256.708004] [] ? put_ctx+0x40/0x61 > [ 2256.708004] [] find_get_context+0x1a9/0x1c7 > [ 2256.708004] [] SYSC_perf_event_open+0x48b/0x870 > [ 2256.708004] [] SyS_perf_event_open+0xe/0x10 > [ 2256.708004] [] system_call_fastpath+0x16/0x1b > > [ 2303.796003] [] ? kmalloc_slab+0x7f/0x8d > [ 2303.796003] [] __kmalloc+0x29/0xf2 > [ 2303.796003] [] ? T.1336+0xe/0x10 > [ 2303.796003] [] T.1336+0xe/0x10 > [ 2303.796003] [] alloc_perf_context+0x20/0x51 > [ 2303.796003] [] find_get_context+0x138/0x1c7 > [ 2303.796003] [] SYSC_perf_event_open+0x48b/0x870 > [ 2303.796003] [] SyS_perf_event_open+0xe/0x10 > [ 2303.796003] [] system_call_fastpath+0x16/0x1b > > Vince > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/