Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753177AbaBNBHv (ORCPT ); Thu, 13 Feb 2014 20:07:51 -0500 Received: from mx1.redhat.com ([209.132.183.28]:27862 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752869AbaBNBHs (ORCPT ); Thu, 13 Feb 2014 20:07:48 -0500 From: Luiz Capitulino To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mtosatti@redhat.com, mgorman@suse.de, aarcange@redhat.com, andi@firstfloor.org, riel@redhat.com, davidlohr@hp.com, isimatu.yasuaki@jp.fujitsu.com, yinghai@kernel.org, rientjes@google.com Subject: [PATCH v2 0/4] hugetlb: add hugepages_node= command-line option Date: Thu, 13 Feb 2014 20:02:04 -0500 Message-Id: <1392339728-13487-1-git-send-email-lcapitulino@redhat.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On a NUMA system, HugeTLB provides support for allocating per-node huge pages through sysfs. For example, to allocate 300 2M huge pages on node1, one can do: echo 300 > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages This works as long as you have enough contiguous pages. Which may work for 2M pages, but for 1G huge pages this is likely to fail due to fragmentation or won't even work actually, as allocating more than MAX_ORDER pages at runtime doesn't seem to work out of the box for some archs. For 1G huge pages it's better or even required to reserve them during the kernel boot, which is when the allocation is more likely to succeed. To this end we have the hugepages= command-line option, which works but misses the per node allocation support. This option evenly distributes huge pages among nodes on a NUMA system. This behavior is very limiting and unflexible. There are use-cases where users wants to be able to specify which nodes 1G huge pages should be allocated from. This series addresses this problem by adding a new kernel comand-line option called hugepages_node=, which users can use to configure initial huge page allocation on NUMA. The new option syntax is: hugepages_node=nid:nr_pages:size,... For example, this command-line: hugepages_node=0:300:2M,1:4:1G Allocates 300 2M huge pages from node0 and 4 1G huge pages from node1. hugepages_node= is non-intrusive (it doesn't touch any core HugeTLB code). Indeed, all functions and the array added by this series are run only once and discarded after boot. All the hugepages_node= option does it to set initial huge page allocation among NUMA nodes. Changelog: o v2 - Change syntax to hugepages_node=nid:nr_pages:size,... [Andi Kleen] - Several small improvements [Andrew Morton] - Validate node index [Yasuaki Ishimatsu] - Use GFP_THISNODE [Mel Gorman] - Fold 2MB support patch with 1GB support patch - Improve logs and intro email Luiz capitulino (4): memblock: memblock_virt_alloc_internal(): add __GFP_THISNODE flag support memblock: add memblock_virt_alloc_nid_nopanic() hugetlb: add parse_pagesize_str() hugetlb: add hugepages_node= command-line option Documentation/kernel-parameters.txt | 8 +++ arch/x86/mm/hugetlbpage.c | 77 ++++++++++++++++++++++++-- include/linux/bootmem.h | 4 ++ include/linux/hugetlb.h | 2 + mm/hugetlb.c | 106 ++++++++++++++++++++++++++++++++++++ mm/memblock.c | 44 ++++++++++++++- 6 files changed, 232 insertions(+), 9 deletions(-) -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/