Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752383AbcLGHAl (ORCPT ); Wed, 7 Dec 2016 02:00:41 -0500 Received: from pegase1.c-s.fr ([93.17.236.30]:9020 "EHLO pegase1.c-s.fr" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751779AbcLGHAj (ORCPT ); Wed, 7 Dec 2016 02:00:39 -0500 Subject: Re: [PATCH v3 2/3] powerpc: get hugetlbpage handling more generic To: Scott Wood , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman References: <69b1226ce134761fd7ab81a498bdb85cd737280f.1474441302.git.christophe.leroy@c-s.fr> <1480987116.32531.17.camel@buserror.net> <1481072799.32531.23.camel@buserror.net> Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org From: Christophe LEROY Message-ID: Date: Wed, 7 Dec 2016 07:59:07 +0100 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <1481072799.32531.23.camel@buserror.net> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2665 Lines: 65 Le 07/12/2016 à 02:06, Scott Wood a écrit : > On Tue, 2016-12-06 at 07:34 +0100, Christophe LEROY wrote: >> >> Le 06/12/2016 à 02:18, Scott Wood a écrit : >>> >>> On Wed, 2016-09-21 at 10:11 +0200, Christophe Leroy wrote: >>>> >>>> Today there are two implementations of hugetlbpages which are managed >>>> by exclusive #ifdefs: >>>> * FSL_BOOKE: several directory entries points to the same single >>>> hugepage >>>> * BOOK3S: one upper level directory entry points to a table of hugepages >>>> >>>> In preparation of implementation of hugepage support on the 8xx, we >>>> need a mix of the two above solutions, because the 8xx needs both cases >>>> depending on the size of pages: >>>> * In 4k page size mode, each PGD entry covers a 4M bytes area. It means >>>> that 2 PGD entries will be necessary to cover an 8M hugepage while a >>>> single PGD entry will cover 8x 512k hugepages. >>>> * In 16 page size mode, each PGD entry covers a 64M bytes area. It means >>>> that 8x 8M hugepages will be covered by one PGD entry and 64x 512k >>>> hugepages will be covers by one PGD entry. >>>> >>>> This patch: >>>> * removes #ifdefs in favor of if/else based on the range sizes >>>> * merges the two huge_pte_alloc() functions as they are pretty similar >>>> * merges the two hugetlbpage_init() functions as they are pretty similar >>> [snip] >>>> >>>> @@ -860,16 +803,34 @@ static int __init hugetlbpage_init(void) >>>> * if we have pdshift and shift value same, we don't >>>> * use pgt cache for hugepd. >>>> */ >>>> - if (pdshift != shift) { >>>> + if (pdshift > shift) { >>>> pgtable_cache_add(pdshift - shift, NULL); >>>> if (!PGT_CACHE(pdshift - shift)) >>>> panic("hugetlbpage_init(): could not >>>> create >>>> " >>>> "pgtable cache for %d bit >>>> pagesize\n", shift); >>>> } >>>> +#ifdef CONFIG_PPC_FSL_BOOK3E >>>> + else if (!hugepte_cache) { >>> This else never triggers on book3e, because the way this function >>> calculates >>> pdshift is wrong for book3e (it uses PyD_SHIFT instead of >>> HUGEPD_PxD_SHIFT). >>> We later get OOMs because huge_pte_alloc() calculates pdshift correctly, >>> tries to use hugepte_cache, and fails. >> Ok, I'll check it again, I was expecting it to still work properly on >> book3e, because after applying patch 3 it works properly on the 8xx. > > On 8xx you probably happen to have a page size that yields "pdshift <= shift" > even with the incorrect pdshift calculation, causing hugepte_cache to be > allocated. The smallest hugepage size on 8xx is 512k compared to 4M on fsl- > book3e. > Indeed it works because on 8xx, PUD_SHIFT == PMD_SHIFT == PGDIR_SHIFT Christophe