Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S938737AbcKLBhN (ORCPT ); Fri, 11 Nov 2016 20:37:13 -0500 Received: from mail-pf0-f178.google.com ([209.85.192.178]:36600 "EHLO mail-pf0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934854AbcKLBhL (ORCPT ); Fri, 11 Nov 2016 20:37:11 -0500 Date: Fri, 11 Nov 2016 17:37:03 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: "Kirill A. Shutemov" cc: "Aneesh Kumar K.V" , Hugh Dickins , akpm@linux-foundation.org, benh@kernel.crashing.org, paulus@samba.org, mpe@ellerman.id.au, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH 2/2] mm: THP page cache support for ppc64 In-Reply-To: <20161111162909.GG19382@node.shutemov.name> Message-ID: References: <20161107083441.21901-1-aneesh.kumar@linux.vnet.ibm.com> <20161107083441.21901-2-aneesh.kumar@linux.vnet.ibm.com> <20161111101439.GB19382@node.shutemov.name> <8737iy1ahw.fsf@linux.vnet.ibm.com> <20161111162909.GG19382@node.shutemov.name> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1722 Lines: 39 On Fri, 11 Nov 2016, Kirill A. Shutemov wrote: > On Fri, Nov 11, 2016 at 05:42:11PM +0530, Aneesh Kumar K.V wrote: > > > > doing this in do_set_pmd keeps this closer to where we set the pmd. Any > > reason you thing we should move it higher up the stack. We already do > > pte_alloc() at the same level for a non transhuge case in > > alloc_set_pte(). > > I vaguely remember Hugh mentioned deadlock of allocation under page-lock vs. > OOM-killer (or something else?). You remember well. It was indeed the OOM killer, but in particular due to the way it used to wait for a current victim to exit, and that exit could be delayed forever by the way munlock_vma_pages_all() goes to lock each page in a VM_LOCKED area - a pity if one of them is the page we hold locked while servicing a fault and need to allocate a pagetable. > > If the deadlock is still there it would be matter of making preallocation > unconditional to fix the issue. I think enough has changed at the OOM killer end that the deadlock is no longer there. I haven't kept up with all the changes made recently, but I think we no longer wait for a unique victim to exit before trying another (reaped mms set MMF_OOM_SKIP); and the OOM reaper skips over VM_LOCKED areas to avoid just such a deadlock. It's still silly that munlock_vma_pages_all() should require page lock on each of those pages; but neither Michal nor I have had time to revisit our attempts to relieve that requirement - mlock.c is not easy. > > But what you propose about doesn't make situation any worse. I'm fine with > that. Yes, I think that's right: if there is a problem, then it would already be problem since alloc_set_pte() was created; but we've seen no reports. Hugh