Date: Sun, 14 Jun 2015 09:59:43 +0200
From: Ingo Molnar <mingo@kernel.org>
To: Oleg Nesterov <oleg@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
        Andy Lutomirski <luto@amacapital.net>,
        Andrew Morton <akpm@linux-foundation.org>,
        Denys Vlasenko <dvlasenk@redhat.com>, Brian Gerst <brgerst@gmail.com>,
        Peter Zijlstra <peterz@infradead.org>, Borislav Petkov <bp@alien8.de>,
        "H. Peter Anvin" <hpa@zytor.com>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Thomas Gleixner <tglx@linutronix.de>, Waiman Long <Waiman.Long@hp.com>
Subject: Re: why do we need vmalloc_sync_all?
Message-ID: <20150614075943.GA810@gmail.com>
References: <1434188955-31397-1-git-send-email-mingo@kernel.org>
 <20150613185828.GA32376@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150613185828.GA32376@redhat.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1913
Lines: 48


* Oleg Nesterov <oleg@redhat.com> wrote:

> I didn't read v2 yet, but I'd like to ask a question.
> 
> Why do we need vmalloc_sync_all()?
> 
> It has a single caller, register_die_notifier() which calls it without
> any explanation. IMO, this needs a comment at least.

Yes, it's used to work around crashes in modular callbacks: if the callbacks 
happens to be called from within the page fault path, before the vmalloc page 
fault handler runs, then we have a catch-22 problem.

It's rare but not entirely impossible.

> I am not sure I understand the changelog in 101f12af correctly, but at first 
> glance vmalloc_sync_all() is no longer needed at least on x86, do_page_fault() 
> no longer does notify_die(DIE_PAGE_FAULT). And btw DIE_PAGE_FAULT has no users. 
> DIE_MNI too...
> 
> Perhaps we can simply kill it on x86?

So in theory we could still have it run from DIE_OOPS, and that could turn a 
survivable kernel crash into a non-survivable one.

Note that all of this will go away if we also do the vmalloc fault handling 
simplification that I discussed with Andy:

 - this series already makes the set of kernel PGDs strictly monotonically 
   increasing during the lifetime of the x86 kernel

 - if in a subsequent patch we can synchronize new PGDs right after the vmalloc
   code creates it, before the area is used - so we can remove vmalloc_fault()
   altogether [or rather, turn it into a debug warning initially].
   vmalloc_fault() is a clever but somewhat fragile complication.

 - after that we can simply remove vmalloc_sync_all() from x86, because all active 
   vmalloc areas will be fully instantiated, all the time, on x86.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/