Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752547AbdCMPLO (ORCPT ); Mon, 13 Mar 2017 11:11:14 -0400 Received: from mx2.suse.de ([195.135.220.15]:44722 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751089AbdCMPLI (ORCPT ); Mon, 13 Mar 2017 11:11:08 -0400 Date: Mon, 13 Mar 2017 16:11:01 +0100 From: Michal Hocko To: Igor Mammedov Cc: Heiko Carstens , Vitaly Kuznetsov , linux-mm@kvack.org, Andrew Morton , Greg KH , "K. Y. Srinivasan" , David Rientjes , Daniel Kiper , linux-api@vger.kernel.org, LKML , linux-s390@vger.kernel.org, xen-devel@lists.xenproject.org, linux-acpi@vger.kernel.org, qiuxishi@huawei.com, toshi.kani@hpe.com, xieyisheng1@huawei.com, slaoub@gmail.com, iamjoonsoo.kim@lge.com, vbabka@suse.cz, Zhang Zhen , Reza Arbab , Yasuaki Ishimatsu , Tang Chen , Andi Kleen Subject: Re: WTH is going on with memory hotplug sysf interface (was: Re: [RFC PATCH] mm, hotplug: get rid of auto_online_blocks) Message-ID: <20170313151100.GS31518@dhcp22.suse.cz> References: <1488462828-174523-1-git-send-email-imammedo@redhat.com> <20170302142816.GK1404@dhcp22.suse.cz> <20170302180315.78975d4b@nial.brq.redhat.com> <20170303082723.GB31499@dhcp22.suse.cz> <20170303183422.6358ee8f@nial.brq.redhat.com> <20170306145417.GG27953@dhcp22.suse.cz> <20170307134004.58343e14@nial.brq.redhat.com> <20170309125400.GI11592@dhcp22.suse.cz> <20170310135807.GI3753@dhcp22.suse.cz> <20170310155333.GN3753@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170310155333.GN3753@dhcp22.suse.cz> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2830 Lines: 65 Let's add Andi On Fri 10-03-17 16:53:33, Michal Hocko wrote: > On Fri 10-03-17 14:58:07, Michal Hocko wrote: > [...] > > This would explain why onlining from the last block actually works but > > to me this sounds like a completely crappy behavior. All we need to > > guarantee AFAICS is that Normal and Movable zones do not overlap. I > > believe there is even no real requirement about ordering of the physical > > memory in Normal vs. Movable zones as long as they do not overlap. But > > let's keep it simple for the start and always enforce the current status > > quo that Normal zone is physically preceeding Movable zone. > > Can somebody explain why we cannot have a simple rule for Normal vs. > > Movable which would be: > > - block [pfn, pfn+block_size] can be Normal if > > !zone_populated(MOVABLE) || pfn+block_size < ZONE_MOVABLE->zone_start_pfn > > - block [pfn, pfn+block_size] can be Movable if > > !zone_populated(NORMAL) || ZONE_NORMAL->zone_end_pfn < pfn > > OK, so while I was playing with this setup some more I probably got why > this is done this way. All new memblocks are added to the zone Normal > where they are accounted as spanned but not present. When we do > online_movable we just cut from the end of the Normal zone and move it > to Movable zone. This sounds really awkward. What was the reason to go > this way? Why cannot we simply add those pages to the zone at the online > time? Answering to myself. So the reason seems to be 9d99aaa31f59 ("[PATCH] x86_64: Support memory hotadd without sparsemem") which is no longer true because config MEMORY_HOTPLUG bool "Allow for memory hot-add" depends on SPARSEMEM || X86_64_ACPI_NUMA depends on ARCH_ENABLE_MEMORY_HOTPLUG depends on COMPILE_TEST || !KASAN so it is either SPARSEMEM or X86_64_ACPI_NUMA that would have to be enabled. config X86_64_ACPI_NUMA def_bool y prompt "ACPI NUMA detection" depends on X86_64 && NUMA && ACPI && PCI select ACPI_NUMA But I do not see any way how to enable anything but SPARSEMEM for x86_64 choice prompt "Memory model" depends on SELECT_MEMORY_MODEL default DISCONTIGMEM_MANUAL if ARCH_DISCONTIGMEM_DEFAULT default SPARSEMEM_MANUAL if ARCH_SPARSEMEM_DEFAULT default FLATMEM_MANUAL ARCH_SPARSEMEM_DEFAULT is 32b only config ARCH_DISCONTIGMEM_DEFAULT def_bool y depends on NUMA && X86_32 and ARCH_SPARSEMEM_DEFAULT is enabeld on 64b. So I guess whatever was the reason to add this code back in 2006 is not true anymore. So I am really wondering. Do we absolutely need to assign pages which are not onlined yet to the ZONE_NORMAL unconditionally? Why cannot we put them out of any zone and wait for memory online operation to put them where requested? -- Michal Hocko SUSE Labs