Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752823AbaKUXES (ORCPT ); Fri, 21 Nov 2014 18:04:18 -0500 Received: from mail-lb0-f182.google.com ([209.85.217.182]:50130 "EHLO mail-lb0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751755AbaKUXEQ (ORCPT ); Fri, 21 Nov 2014 18:04:16 -0500 MIME-Version: 1.0 In-Reply-To: References: <20141120233920.GC25393@htj.dyndns.org> <20141121162742.GB15461@htj.dyndns.org> <20141121170805.GD30603@home.goodmis.org> From: Andy Lutomirski Date: Fri, 21 Nov 2014 15:03:54 -0800 Message-ID: Subject: Re: frequent lockups in 3.18rc4 To: Linus Torvalds Cc: Thomas Gleixner , Steven Rostedt , Tejun Heo , "linux-kernel@vger.kernel.org" , Arnaldo Carvalho de Melo , Peter Zijlstra , Frederic Weisbecker , Don Zickus , Dave Jones , "the arch/x86 maintainers" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Nov 21, 2014 at 2:55 PM, Linus Torvalds wrote: > On Fri, Nov 21, 2014 at 1:11 PM, Thomas Gleixner wrote: >> >> I'm fine with that. I just think it's not horrid enough, but that can >> be fixed easily :) > > Oh, I think it's plenty horrid. > > Anyway, here's an actual patch. As usual, it has seen absolutely no > actual testing, but I did try to make sure it compiles and seems to do > the right thing on: > - x86-32 no-PAE > - x86-32 no-PAE with PARAVIRT > - x86-32 PAE > - x86-64 > > also, I just removed the noise that is "vmalloc_sync_all()", since > it's just all garbage and nothing actually uses it. Yeah, it's used by > "register_die_notifier()", which makes no sense what-so-ever. > Whatever. It's gone. > > Can somebody actually *test* this? In particular, in any kind of real > paravirt environment? Or, any comments even without testing? > > I *really* am not proud of the mess wrt the whole > > #ifdef CONFIG_PARAVIRT > #ifdef CONFIG_X86_32 > ... > > but I think that from a long-term perspective, we're actually better > off with this kind of really ugly - but very explcit - hack that very > clearly shows what is going on. > > The old code that actually "walked" the page tables was more > "portable", but was somewhat misleading about what was actually going > on. At the risk of going deeper down the rabbit hole, I grepped for pgd_list. I found: __set_pmd_pte in pageattr.c. It appears to be completely incorrect. Unless I've misunderstood, other than the very first line, it will either do nothing at all or crash when it falls off the end of the page tables that it's pointlessly trying to update. sync_global_pgds: OK, I guess -- this is for hot-add of memory, right? But if we teach the context switch code to check that the kernel stack is okay, that can be removed, I think. (We absolutely MUST keep the static per-cpu stuff populated everywhere before running user code, but that's never in hot-added memory.) xen_mm_pin_all and xen_mm_unpin_all: I have no clue. I wonder how that works with SHARED_KERNEL_PMD. Anyone want to attack these? It would be kind of nice to remove pgd_list entirely. (I realize that doing so precludes the use of bloody enormous 512GB kernel pages, but any attempt to use *those* is so completely screwed without a major reworking of all of this (or perhaps stop_machine) that keeping pgd_list around just for that is probably a mistake.) --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/