Received: by 10.223.164.202 with SMTP id h10csp1607300wrb; Thu, 23 Nov 2017 21:56:04 -0800 (PST) X-Google-Smtp-Source: AGs4zMb8OvhRz8MlEXvueT67acHKOgy3kjnjATP9DEt4Ju743J8Oep5V6eZTC0OaexGCfRbnHKqq X-Received: by 10.99.125.23 with SMTP id y23mr27017996pgc.345.1511502963978; Thu, 23 Nov 2017 21:56:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1511502963; cv=none; d=google.com; s=arc-20160816; b=Ih3cUwCFurPPFvPWo+slVgrccnxvFSFlned1Hfz2hXT/u7hvGNehqmXxv4wQM7C0Wz NW8zeCTd6Lj+6APUevDlm8ilN4sIpzrjvEglF8471THNd9DKQQGmAOtz6/BdEPKEOyQk N/G664rfkcyS7FQ+k9DHtvk7Q264LGV+sboyxYJdaiXAA/3ZifsY71mYniEve93y/dRM BIh3WIDFvvkF+PzegiRrhpFXr6uRT1V0PooEIR0AKXcF4TmDrMP0prYMqi+A1hg4YgEb hHv/wvRnZEwBfmmAJtKIRPZXwRew38eXoRKoIVm0OS8zKxagIFG7KXbzEUnubfsfZXbM a/pQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=mrjt++xyRers8Msyz9H92NqTQiJL90XRy8RkhxUOeHM=; b=oA+8zuQSCSkeDouQQZZ2EOxQfqy4OcvItyCRFOGcL9AW+RMLSiurNnU2XmgW9lUKaJ i9N/0HybIeLTQjKp9MYATQwcL3Yzm2/qHuDBMp15wGXVqw8FJHItzGKuyvr2NDVmQ7si Mp1Yyp4J6z/YxNckY4YoCOHDnoNt2JGOFEXCo6taUa8DsYPn+j1ANvz8r9lZjjp+sNzp 21i301AXZhvlwUyp6Wr0bSo6TJihvocdOgRAbHcaI/lnnh8YfWtLrLqO6ncnWV4DuI5N ZAtYhdjQTzeBqx59GF6lg4YY3QyMtDZrOgNeLsOn6ygECKj5bMUHUi2Emim/uk+LCULu kprw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=Et6P/+/Q; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b3si2577141pli.459.2017.11.23.21.55.52; Thu, 23 Nov 2017 21:56:03 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=Et6P/+/Q; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751980AbdKXFzP (ORCPT + 76 others); Fri, 24 Nov 2017 00:55:15 -0500 Received: from mail-ua0-f194.google.com ([209.85.217.194]:38321 "EHLO mail-ua0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750992AbdKXFzO (ORCPT ); Fri, 24 Nov 2017 00:55:14 -0500 Received: by mail-ua0-f194.google.com with SMTP id f14so14056889uaa.5 for ; Thu, 23 Nov 2017 21:55:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=mrjt++xyRers8Msyz9H92NqTQiJL90XRy8RkhxUOeHM=; b=Et6P/+/QS2TQpLHjWgarI4VIcDwE0XVkEDsAUIktCfzAuww+swzOnB2CbvEbkwXziS VPi4dQ3G9NUboxvk65TiyjoPYUWkrV/p3m616N8KswI7MzzP8+iXgoS7MArTARKHZ1fM dk6q1eICHyDuty0Ncc+D0jYqMBEuJwcAs0bY2XCAoP9QCBkEomx4ZqbSayfdaG0hUWn3 dO/LsNkNTHwMQoX7iBQMQqPgetecNt9Xd+VaLeeUlLRA3PkwFhXB/WIGw5kVfe+iDEuA +vdp9Xk8tEflIY8zwOd+UmzLp3AVxpoFlbYdZ3SAyrX/my2QHQ8IIZo2C6KYzg9rApdo wvRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=mrjt++xyRers8Msyz9H92NqTQiJL90XRy8RkhxUOeHM=; b=jgRjEa6kQDNukIsDX76JOyFFhlS0RcMjUASdRJxXqAFmisC8CUPYmC/+D4LWKIxR5p TPhajtOJ5cqrejWpirxQ2GEueQ7UIW+nTrX3KeRY0ArXyBddgGm/+PB+F7al3RdHv0h2 1xqgG7DYdlSRCpN/jtkKx6tfifmdgmvIhw+w9LC7GrwEiuqDXgR38xskUZACHtN3wwxL I4YGeF0piFcHlHX/dh7NOrOgoQozE6/JTjnnlj3ZD0ZePZoa33E5o40kKtcXUGTkHmyx 9ZAkkNFkZZx2oNT4PKBfYUymo5y4/FK9dhXM8HcKoHdB8kN5TX3m4N9n+ZJORmlofVrM 0QRA== X-Gm-Message-State: AJaThX6iAJxjhx9l7qPASKWrungWdxqNackOSdRdAficHh9tqgP25O8Y qHNOKU/9e3CS/ORhTDbYsBJNsB+WrLytiQgbqbY= X-Received: by 10.176.69.162 with SMTP id u31mr22991490uau.149.1511502913333; Thu, 23 Nov 2017 21:55:13 -0800 (PST) MIME-Version: 1.0 X-Google-Sender-Delegation: getarunks@gmail.com Received: by 10.159.61.201 with HTTP; Thu, 23 Nov 2017 21:55:12 -0800 (PST) In-Reply-To: References: From: Arun KS Date: Fri, 24 Nov 2017 11:25:12 +0530 X-Google-Sender-Auth: wIWCRkxNi3c2FFvB93cSXEZvbjc Message-ID: Subject: Re: [PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64 To: Maciej Bielski Cc: "linux-arm-kernel@lists.infradead.org" , "linux-kernel@vger.kernel.org" , linux-mm@kvack.org, ar@linux.vnet.ibm.com, arunks@qti.qualcomm.com, mark.rutland@arm.com, scott.branden@broadcom.com, will.deacon@arm.com, qiuxishi@huawei.com, Catalin Marinas , mhocko@suse.com, realean2@ie.ibm.com Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 23, 2017 at 4:43 PM, Maciej Bielski wrote: > Introduces memory hotplug functionality (hot-add) for arm64. > > Changes v1->v2: > - swapper pgtable updated in place on hot add, avoiding unnecessary copy: > all changes are additive and non destructive. > > - stop_machine used to updated swapper on hot add, avoiding races > > - checking if pagealloc is under debug to stay coherent with mem_map > > Signed-off-by: Maciej Bielski > Signed-off-by: Andrea Reale > --- > arch/arm64/Kconfig | 12 ++++++ > arch/arm64/configs/defconfig | 1 + > arch/arm64/include/asm/mmu.h | 3 ++ > arch/arm64/mm/init.c | 87 ++++++++++++++++++++++++++++++++++++++++++++ > arch/arm64/mm/mmu.c | 39 ++++++++++++++++++++ > 5 files changed, 142 insertions(+) > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index 0df64a6..c736bba 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -641,6 +641,14 @@ config HOTPLUG_CPU > Say Y here to experiment with turning CPUs off and on. CPUs > can be controlled through /sys/devices/system/cpu. > > +config ARCH_HAS_ADD_PAGES > + def_bool y > + depends on ARCH_ENABLE_MEMORY_HOTPLUG > + > +config ARCH_ENABLE_MEMORY_HOTPLUG > + def_bool y > + depends on !NUMA > + > # Common NUMA Features > config NUMA > bool "Numa Memory Allocation and Scheduler Support" > @@ -715,6 +723,10 @@ config ARCH_HAS_CACHE_LINE_SIZE > > source "mm/Kconfig" > > +config ARCH_MEMORY_PROBE > + def_bool y > + depends on MEMORY_HOTPLUG > + > config SECCOMP > bool "Enable seccomp to safely compute untrusted bytecode" > ---help--- > diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig > index 34480e9..5fc5656 100644 > --- a/arch/arm64/configs/defconfig > +++ b/arch/arm64/configs/defconfig > @@ -80,6 +80,7 @@ CONFIG_ARM64_VA_BITS_48=y > CONFIG_SCHED_MC=y > CONFIG_NUMA=y > CONFIG_PREEMPT=y > +CONFIG_MEMORY_HOTPLUG=y > CONFIG_KSM=y > CONFIG_TRANSPARENT_HUGEPAGE=y > CONFIG_CMA=y > diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h > index 0d34bf0..2b3fa4d 100644 > --- a/arch/arm64/include/asm/mmu.h > +++ b/arch/arm64/include/asm/mmu.h > @@ -40,5 +40,8 @@ extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys, > pgprot_t prot, bool page_mappings_only); > extern void *fixmap_remap_fdt(phys_addr_t dt_phys); > extern void mark_linear_text_alias_ro(void); > +#ifdef CONFIG_MEMORY_HOTPLUG > +extern void hotplug_paging(phys_addr_t start, phys_addr_t size); > +#endif > > #endif > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > index 5960bef..e96e7d3 100644 > --- a/arch/arm64/mm/init.c > +++ b/arch/arm64/mm/init.c > @@ -722,3 +722,90 @@ static int __init register_mem_limit_dumper(void) > return 0; > } > __initcall(register_mem_limit_dumper); > + > +#ifdef CONFIG_MEMORY_HOTPLUG > +int add_pages(int nid, unsigned long start_pfn, > + unsigned long nr_pages, bool want_memblock) > +{ > + int ret; > + u64 start_addr = start_pfn << PAGE_SHIFT; > + /* > + * Mark the first page in the range as unusable. This is needed > + * because __add_section (within __add_pages) wants pfn_valid > + * of it to be false, and in arm64 pfn falid is implemented by > + * just checking at the nomap flag for existing blocks. > + * > + * A small trick here is that __add_section() requires only > + * phys_start_pfn (that is the first pfn of a section) to be > + * invalid. Regardless of whether it was assumed (by the function > + * author) that all pfns within a section are either all valid > + * or all invalid, it allows to avoid looping twice (once here, > + * second when memblock_clear_nomap() is called) through all > + * pfns of the section and modify only one pfn. Thanks to that, > + * further, in __add_zone() only this very first pfn is skipped > + * and corresponding page is not flagged reserved. Therefore it > + * is enough to correct this setup only for it. > + * > + * When arch_add_memory() returns the walk_memory_range() function > + * is called and passed with online_memory_block() callback, > + * which execution finally reaches the memory_block_action() > + * function, where also only the first pfn of a memory block is > + * checked to be reserved. Above, it was first pfn of a section, > + * here it is a block but > + * (drivers/base/memory.c): > + * sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE; > + * (include/linux/memory.h): > + * #define MIN_MEMORY_BLOCK_SIZE (1UL << SECTION_SIZE_BITS) > + * so we can consider block and section equivalently > + */ > + memblock_mark_nomap(start_addr, 1< + ret = __add_pages(nid, start_pfn, nr_pages, want_memblock); > + > + /* > + * Make the pages usable after they have been added. > + * This will make pfn_valid return true > + */ > + memblock_clear_nomap(start_addr, 1< + > + /* > + * This is a hack to avoid having to mix arch specific code > + * into arch independent code. SetPageReserved is supposed > + * to be called by __add_zone (within __add_section, within > + * __add_pages). However, when it is called there, it assumes that > + * pfn_valid returns true. For the way pfn_valid is implemented > + * in arm64 (a check on the nomap flag), the only way to make > + * this evaluate true inside __add_zone is to clear the nomap > + * flags of blocks in architecture independent code. > + * > + * To avoid this, we set the Reserved flag here after we cleared > + * the nomap flag in the line above. > + */ > + SetPageReserved(pfn_to_page(start_pfn)); > + > + return ret; > +} > + > +int arch_add_memory(int nid, u64 start, u64 size, bool want_memblock) > +{ > + int ret; > + unsigned long start_pfn = start >> PAGE_SHIFT; > + unsigned long nr_pages = size >> PAGE_SHIFT; > + unsigned long end_pfn = start_pfn + nr_pages; > + unsigned long max_sparsemem_pfn = 1UL << (MAX_PHYSMEM_BITS-PAGE_SHIFT); > + > + if (end_pfn > max_sparsemem_pfn) { > + pr_err("end_pfn too big"); > + return -1; > + } > + hotplug_paging(start, size); > + > + ret = add_pages(nid, start_pfn, nr_pages, want_memblock); > + > + if (ret) > + pr_warn("%s: Problem encountered in __add_pages() ret=%d\n", > + __func__, ret); > + > + return ret; > +} > + > +#endif /* CONFIG_MEMORY_HOTPLUG */ > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c > index f1eb15e..d93043d 100644 > --- a/arch/arm64/mm/mmu.c > +++ b/arch/arm64/mm/mmu.c > @@ -28,6 +28,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -615,6 +616,44 @@ void __init paging_init(void) > SWAPPER_DIR_SIZE - PAGE_SIZE); > } > > +#ifdef CONFIG_MEMORY_HOTPLUG > + > +/* > + * hotplug_paging() is used by memory hotplug to build new page tables > + * for hot added memory. > + */ > + > +struct mem_range { > + phys_addr_t base; > + phys_addr_t size; > +}; > + > +static int __hotplug_paging(void *data) > +{ > + int flags = 0; > + struct mem_range *section = data; > + > + if (debug_pagealloc_enabled()) > + flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS; > + > + __create_pgd_mapping(swapper_pg_dir, section->base, > + __phys_to_virt(section->base), section->size, > + PAGE_KERNEL, pgd_pgtable_alloc, flags); Hello Andrea, __hotplug_paging runs on stop_machine context. cpu stop callbacks must not sleep. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/stop_machine.c?h=v4.14#n479 __create_pgd_mapping uses pgd_pgtable_alloc. which does __get_free_page(PGALLOC_GFP) https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/mm/mmu.c?h=v4.14#n342 PGALLOC_GFP has GFP_KERNEL which inturn has __GFP_RECLAIM #define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO) #define GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS) Now, prepare_alloc_pages() called by __alloc_pages_nodemask checks for might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM); https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/page_alloc.c?h=v4.14#n4150 and then BUG() I was testing on 4.4 kernel, but cross checked with 4.14 as well. Regards, Arun > + > + return 0; > +} > + > +inline void hotplug_paging(phys_addr_t start, phys_addr_t size) > +{ > + struct mem_range section = { > + .base = start, > + .size = size, > + }; > + > + stop_machine(__hotplug_paging, §ion, NULL); > +} > +#endif /* CONFIG_MEMORY_HOTPLUG */ > + > /* > * Check whether a kernel address is valid (derived from arch/x86/). > */ > -- > 2.7.4 > From 1584855200248375691@xxx Thu Nov 23 11:14:59 +0000 2017 X-GM-THRID: 1584855200248375691 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread