Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754267AbbEFALn (ORCPT ); Tue, 5 May 2015 20:11:43 -0400 Received: from mta-out1.inet.fi ([62.71.2.195]:50749 "EHLO jenni1.inet.fi" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751399AbbEFALl (ORCPT ); Tue, 5 May 2015 20:11:41 -0400 Date: Wed, 6 May 2015 03:11:05 +0300 From: "Kirill A. Shutemov" To: "Aneesh Kumar K.V" Cc: akpm@linux-foundation.org, mpe@ellerman.id.au, paulus@samba.org, benh@kernel.crashing.org, kirill.shutemov@linux.intel.com, aarcange@redhat.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Subject: Re: [RFC PATCH] mm/thp: Use new function to clear pmd before THP splitting Message-ID: <20150506001105.GA14559@node.dhcp.inet.fi> References: <1430760556-28137-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1430760556-28137-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1722 Lines: 42 On Mon, May 04, 2015 at 10:59:16PM +0530, Aneesh Kumar K.V wrote: > Archs like ppc64 require pte_t * to remain stable in some code path. > They use local_irq_disable to prevent a parallel split. Generic code > clear pmd instead of marking it _PAGE_SPLITTING in code path > where we can afford to mark pmd none before splitting. Use a > variant of pmdp_splitting_clear_notify that arch can override. > > Signed-off-by: Aneesh Kumar K.V Sorry, I still try wrap my head around this problem. So, Power has __find_linux_pte_or_hugepte() which does lock-less lookup in page tables with local interrupts disabled. For huge pages it casts pmd_t to pte_t. Since format of pte_t is different from pmd_t we want to prevent transit from pmd pointing to page table to pmd pinging to huge page (and back) while interrupts are disabled. The complication for Power is that it doesn't do implicit IPI on tlb flush. Is it correct? For THP, split_huge_page() and collapse sides are covered. This patch should address two cases of splitting PMD, but not compound page in current upstream. But I think there's still *big* problem for Power -- zap_huge_pmd(). For instance: other CPU can shoot out a THP PMD with MADV_DONTNEED and fault in small pages instead. IIUC, for __find_linux_pte_or_hugepte(), it's equivalent of splitting. I don't see how this can be fixed without kick_all_cpus_sync() in all pmdp_clear_flush() on Power. -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/