Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751527AbdGZVHY (ORCPT ); Wed, 26 Jul 2017 17:07:24 -0400 Received: from mx1.redhat.com ([209.132.183.28]:51762 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750996AbdGZVHX (ORCPT ); Wed, 26 Jul 2017 17:07:23 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com AB101ABBBC Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=jglisse@redhat.com Date: Wed, 26 Jul 2017 17:06:59 -0400 From: Jerome Glisse To: Michal Hocko Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Vlastimil Babka , Andrea Arcangeli , Reza Arbab , Yasuaki Ishimatsu , qiuxishi@huawei.com, Kani Toshimitsu , slaoub@gmail.com, Joonsoo Kim , Andi Kleen , Daniel Kiper , Igor Mammedov , Vitaly Kuznetsov , LKML , Benjamin Herrenschmidt , Catalin Marinas , Dan Williams , Fenghua Yu , Heiko Carstens , "H. Peter Anvin" , Ingo Molnar , Martin Schwidefsky , Michael Ellerman , Michal Hocko , Paul Mackerras , Thomas Gleixner , Tony Luck , Will Deacon Subject: Re: [RFC PATCH 0/5] mm, memory_hotplug: allocate memmap from hotadded memory Message-ID: <20170726210657.GE21717@redhat.com> References: <20170726083333.17754-1-mhocko@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20170726083333.17754-1-mhocko@kernel.org> User-Agent: Mutt/1.8.3 (2017-05-23) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Wed, 26 Jul 2017 21:07:23 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2930 Lines: 56 On Wed, Jul 26, 2017 at 10:33:28AM +0200, Michal Hocko wrote: > Hi, > this is another step to make the memory hotplug more usable. The primary > goal of this patchset is to reduce memory overhead of the hot added > memory (at least for SPARSE_VMEMMAP memory model). Currently we use > kmalloc to poppulate memmap (struct page array) which has two main > drawbacks a) it consumes an additional memory until the hotadded memory > itslef is onlined and b) memmap might end up on a different numa node > which is especially true for movable_node configuration. > > a) is problem especially for memory hotplug based memory "ballooning" > solutions when the delay between physical memory hotplug and the > onlining can lead to OOM and that led to introduction of hacks like auto > onlining (see 31bc3858ea3e ("memory-hotplug: add automatic onlining > policy for the newly added memory")). > b) can have performance drawbacks. > > One way to mitigate both issues is to simply allocate memmap array > (which is the largest memory footprint of the physical memory hotplug) > from the hotadded memory itself. VMEMMAP memory model allows us to map > any pfn range so the memory doesn't need to be online to be usable > for the array. See patch 3 for more details. In short I am reusing an > existing vmem_altmap which wants to achieve the same thing for nvdim > device memory. > > I am sending this as an RFC because this has seen only a very limited > testing and I am mostly interested about opinions on the chosen > approach. I had to touch some arch code and I have no idea whether my > changes make sense there (especially ppc). Therefore I would highly > appreciate arch maintainers to check patch 2. > > Patches 4 and 5 should be straightforward cleanups. > > There is also one potential drawback, though. If somebody uses memory > hotplug for 1G (gigantic) hugetlb pages then this scheme will not work > for them obviously because each memory section will contain 2MB reserved > area. I am not really sure somebody does that and how reliable that > can work actually. Nevertheless, I _believe_ that onlining more memory > into virtual machines is much more common usecase. Anyway if there ever > is a strong demand for such a usecase we have basically 3 options a) > enlarge memory sections b) enhance altmap allocation strategy and reuse > low memory sections to host memmaps of other sections on the same NUMA > node c) have the memmap allocation strategy configurable to fallback to > the current allocation. > > Are there any other concerns, ideas, comments? > This does not seems to be an opt-in change ie if i am reading patch 3 correctly if an altmap is not provided to __add_pages() you fallback to allocating from begining of zone. This will not work with HMM ie device private memory. So at very least i would like to see some way to opt-out of this. Maybe a new argument like bool forbid_altmap ? Cheers, J?r?me