Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753730AbbEFIyb (ORCPT ); Wed, 6 May 2015 04:54:31 -0400 Received: from e28smtp09.in.ibm.com ([122.248.162.9]:53185 "EHLO e28smtp09.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751755AbbEFIyZ (ORCPT ); Wed, 6 May 2015 04:54:25 -0400 From: "Aneesh Kumar K.V" To: "Kirill A. Shutemov" Cc: akpm@linux-foundation.org, mpe@ellerman.id.au, paulus@samba.org, benh@kernel.crashing.org, kirill.shutemov@linux.intel.com, aarcange@redhat.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Subject: Re: [RFC PATCH] mm/thp: Use new function to clear pmd before THP splitting In-Reply-To: <20150506001105.GA14559@node.dhcp.inet.fi> References: <1430760556-28137-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20150506001105.GA14559@node.dhcp.inet.fi> User-Agent: Notmuch/0.19+103~g294bb6d (http://notmuchmail.org) Emacs/24.4.1 (x86_64-pc-linux-gnu) Date: Wed, 06 May 2015 14:18:17 +0530 Message-ID: <87lhh2ccry.fsf@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15050608-0033-0000-0000-00000590389C Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2786 Lines: 75 "Kirill A. Shutemov" writes: > On Mon, May 04, 2015 at 10:59:16PM +0530, Aneesh Kumar K.V wrote: >> Archs like ppc64 require pte_t * to remain stable in some code path. >> They use local_irq_disable to prevent a parallel split. Generic code >> clear pmd instead of marking it _PAGE_SPLITTING in code path >> where we can afford to mark pmd none before splitting. Use a >> variant of pmdp_splitting_clear_notify that arch can override. >> >> Signed-off-by: Aneesh Kumar K.V > > Sorry, I still try wrap my head around this problem. > > So, Power has __find_linux_pte_or_hugepte() which does lock-less lookup in > page tables with local interrupts disabled. For huge pages it casts pmd_t > to pte_t. Since format of pte_t is different from pmd_t we want to prevent > transit from pmd pointing to page table to pmd pinging to huge page (and > back) while interrupts are disabled. > > The complication for Power is that it doesn't do implicit IPI on tlb > flush. > s/doesn't do/doesn't need to do/ > Is it correct? that is correct. I will add more info to the commit message of the patch I will end up doing. > > For THP, split_huge_page() and collapse sides are covered. This patch > should address two cases of splitting PMD, but not compound page in > current upstream. > > But I think there's still *big* problem for Power -- zap_huge_pmd(). > > For instance: other CPU can shoot out a THP PMD with MADV_DONTNEED and > fault in small pages instead. IIUC, for __find_linux_pte_or_hugepte(), > it's equivalent of splitting. > > I don't see how this can be fixed without kick_all_cpus_sync() in all > pmdp_clear_flush() on Power. > Yes we could run into issue with that. Thanks for catching this. Now i am not sure whether we want to do the kick_all_cpus_sync in pmdp_get_and_clear. We do use that function while updating huge pte. The one i am looking at is change_huge_pmd. We don't need a IPI there and we would really like to avoid the IPI. Any idea why we follow the sequence of pmd_clear and set_pmd, instead of pmd_update there ? I looked at code paths we are clearing pmd where we would not require an IPI. Listing them move_huge_pmd do_huge_pmd_wp_page migrate_misplace_transhuge_page change_huge_pmd. Of this IIUC change_huge_pmd may be called more frequently and hence we may want to avoid doing kick_all_cpus_sync there ? One way to fix that would be switch change_huge_pmd to pmd_update and then we could do a kick_all_cpus_sync in pmdp_get_and_clear. -aneesh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/