Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp4044211imu; Fri, 30 Nov 2018 10:01:47 -0800 (PST) X-Google-Smtp-Source: AFSGD/XR0NYZc2NBXKAx9vQ/kflbr9gTFS1d9YyM+aNEevVy4gDxuajw86Re01RyymASLonK0axF X-Received: by 2002:a65:40c5:: with SMTP id u5mr5430561pgp.46.1543600907925; Fri, 30 Nov 2018 10:01:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543600907; cv=none; d=google.com; s=arc-20160816; b=QgaVOsySkQ93gx1Uw3mDsUbWw+sdJ0Z6aGDTb995j8ORVJTQ3dqAaq7iclcU2ZFItL JmSEN0+n3/5i/FzY3oGACTFFxf591igYTRcaVRb+06ukOwS8ulYbq16vCfHFbE/dhaYs CH5wmmPAGCb59WAXzd98iBEP2arbwYOazRGFUDlkLmxt/R0cxmHcMKg2tx3TvIPExoJM 8uIzTEc0RMhb/B53deOj3so7RlH3OhfaFXgI+4as3DjZZj8rXuqHIlh0EE7vAswqqxEa icgUAQ83GznzD+usq04CPHx6ayGJjOc0o7tiAu1VmDrBwtNjke2aWefQtq7T4VINoYFR EgiA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=fOAgJv4CYab+0jm77qRExhD+tagc1uJFvHyePFoSYg8=; b=LhH9hZHZVtSKl1KNcHR0LyoDrtHrmbv+beSuy5f8vg2CNnfLMI2l/A6CpzBSE+VAFf xQDXgEJcs4mut5D5dqUFd0AiMUkXrjssTGaqRsc9lqJoupYi5HFpsgiwQMU82GiS8rAT HBIdVsmgFc3+62rVg9i6PG7sNDqNBpdlBaeJSCob/VbHq3qKpIjxo5iG0D5vOk/pc20W lG+/9aUEsOi5eoCzrff5lpoSAm180/iqKsOVbb/kuBiV4W0WKOlU2eGBYoukGTj76+f/ qgJAhjQ0cn0DqtzrC+ZSMAaV/1cXLEOgPTuYCYCPPWCjhqLBD0iA59yMdeHwAkCOuAlz 0X+w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t64si5385598pgd.202.2018.11.30.10.01.30; Fri, 30 Nov 2018 10:01:47 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726409AbeLAFKh (ORCPT + 99 others); Sat, 1 Dec 2018 00:10:37 -0500 Received: from mx1.redhat.com ([209.132.183.28]:55276 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726103AbeLAFKh (ORCPT ); Sat, 1 Dec 2018 00:10:37 -0500 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C27F63001569; Fri, 30 Nov 2018 18:00:30 +0000 (UTC) Received: from t460s.redhat.com (ovpn-126-156.rdu2.redhat.com [10.10.126.156]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6455C5D9C9; Fri, 30 Nov 2018 18:00:15 +0000 (UTC) From: David Hildenbrand To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, linux-ia64@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, linux-acpi@vger.kernel.org, devel@linuxdriverproject.org, xen-devel@lists.xenproject.org, x86@kernel.org, David Hildenbrand , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , "Rafael J. Wysocki" , Len Brown , Greg Kroah-Hartman , "K. Y. Srinivasan" , Haiyang Zhang , Stephen Hemminger , Martin Schwidefsky , Heiko Carstens , Boris Ostrovsky , Juergen Gross , Stefano Stabellini , Rashmica Gupta , Andrew Morton , Pavel Tatashin , Balbir Singh , Michael Neuling , Nathan Fontenot , YueHaibing , Vasily Gorbik , Ingo Molnar , Stephen Rothwell , "mike.travis@hpe.com" , Oscar Salvador , Joonsoo Kim , Mathieu Malaterre , Michal Hocko , Arun KS , Andrew Banman , Dave Hansen , =?UTF-8?q?Michal=20Such=C3=A1nek?= , Vitaly Kuznetsov , Dan Williams Subject: [PATCH RFCv2 3/4] mm/memory_hotplug: Introduce and use more memory types Date: Fri, 30 Nov 2018 18:59:21 +0100 Message-Id: <20181130175922.10425-4-david@redhat.com> In-Reply-To: <20181130175922.10425-1-david@redhat.com> References: <20181130175922.10425-1-david@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.46]); Fri, 30 Nov 2018 18:00:31 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Let's introduce new types for different kinds of memory blocks and use them in existing code. As I don't see an easy way to split this up, do it in one hunk for now. acpi: Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel. Properly change the type when trying to add memory that was already detected and used during boot (so this memory will correctly end up as "acpi" in user space). pseries: Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel. As far as I see, handling like in the acpi case for existing blocks is not required. probed memory from user space: Use DIMM_UNREMOVABLE as there is no interface to get rid of this code again. hv_balloon,xen/balloon: Use BALLOON. As simple as that :) s390x/sclp: Use a dedicated type S390X_STANDBY as this type of memory and it's semantics are very s390x specific. powernv/memtrace: Only allow to use BOOT memory for memtrace. I consider this code in general dangerous, but we have to keep it working ... most probably just a debug feature. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: "Rafael J. Wysocki" Cc: Len Brown Cc: Greg Kroah-Hartman Cc: "K. Y. Srinivasan" Cc: Haiyang Zhang Cc: Stephen Hemminger Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Stefano Stabellini Cc: Rashmica Gupta Cc: Andrew Morton Cc: Pavel Tatashin Cc: Balbir Singh Cc: Michael Neuling Cc: Nathan Fontenot Cc: YueHaibing Cc: Vasily Gorbik Cc: Ingo Molnar Cc: Stephen Rothwell Cc: "mike.travis@hpe.com" Cc: Oscar Salvador Cc: Joonsoo Kim Cc: Mathieu Malaterre Cc: Michal Hocko Cc: Arun KS Cc: Andrew Banman Cc: Dave Hansen Cc: Michal Suchánek Cc: Vitaly Kuznetsov Cc: Dan Williams Signed-off-by: David Hildenbrand --- At first I tried to abstract the types quite a lot, but I think there are subtle differences that are worth differentiating. More details about the types can be found in the excessive documentation. It is wort noting that BALLOON_MOVABLE has no user yet, but I have something in mind that might want to make use of that (virtio-mem). Just included it to discuss the general approach. I can drop it from this patch. --- arch/powerpc/platforms/powernv/memtrace.c | 9 ++-- .../platforms/pseries/hotplug-memory.c | 7 ++- drivers/acpi/acpi_memhotplug.c | 16 ++++++- drivers/base/memory.c | 18 ++++++- drivers/hv/hv_balloon.c | 3 +- drivers/s390/char/sclp_cmd.c | 3 +- drivers/xen/balloon.c | 2 +- include/linux/memory.h | 47 ++++++++++++++++++- include/linux/memory_hotplug.h | 6 +-- mm/memory_hotplug.c | 15 +++--- 10 files changed, 104 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c index 248a38ad25c7..5d08db87091e 100644 --- a/arch/powerpc/platforms/powernv/memtrace.c +++ b/arch/powerpc/platforms/powernv/memtrace.c @@ -54,9 +54,9 @@ static const struct file_operations memtrace_fops = { .open = simple_open, }; -static int check_memblock_online(struct memory_block *mem, void *arg) +static int check_memblock_boot_and_online(struct memory_block *mem, void *arg) { - if (mem->state != MEM_ONLINE) + if (mem->type != MEM_BLOCK_BOOT || mem->state != MEM_ONLINE) return -1; return 0; @@ -77,7 +77,7 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages) u64 end_pfn = start_pfn + nr_pages - 1; if (walk_memory_range(start_pfn, end_pfn, NULL, - check_memblock_online)) + check_memblock_boot_and_online)) return false; walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE, @@ -233,7 +233,8 @@ static int memtrace_online(void) ent->mem = 0; } - if (add_memory(ent->nid, ent->start, ent->size)) { + if (add_memory(ent->nid, ent->start, ent->size, + MEMORY_BLOCK_BOOT)) { pr_err("Failed to add trace memory to node %d\n", ent->nid); ret += 1; diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c index 2a983b5a52e1..5f91359c7993 100644 --- a/arch/powerpc/platforms/pseries/hotplug-memory.c +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c @@ -651,7 +651,7 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index) static int dlpar_add_lmb(struct drmem_lmb *lmb) { unsigned long block_sz; - int nid, rc; + int nid, rc, type = MEMORY_BLOCK_DIMM; if (lmb->flags & DRCONF_MEM_ASSIGNED) return -EINVAL; @@ -667,8 +667,11 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb) /* Find the node id for this address */ nid = memory_add_physaddr_to_nid(lmb->base_addr); + if (!IS_ENABLED(CONFIG_MEMORY_HOTREMOVE)) + type = MEMORY_BLOCK_DIMM_UNREMOVABLE; + /* Add the memory */ - rc = __add_memory(nid, lmb->base_addr, block_sz); + rc = __add_memory(nid, lmb->base_addr, block_sz, type); if (rc) { invalidate_lmb_associativity_index(lmb); return rc; diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index 8fe0960ea572..f841113b450d 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -177,6 +177,13 @@ static unsigned long acpi_meminfo_end_pfn(struct acpi_memory_info *info) static int acpi_bind_memblk(struct memory_block *mem, void *arg) { + /* switch the type of memory block if this memory was already present */ + if (mem->type == MEMORY_BLOCK_BOOT) { + if (IS_ENABLED(CONFIG_MEMORY_HOTREMOVE)) + mem->type = MEMORY_BLOCK_DIMM; + else + mem->type = MEMORY_BLOCK_DIMM_UNREMOVABLE; + } return acpi_bind_one(&mem->dev, arg); } @@ -191,6 +198,7 @@ static int acpi_bind_memory_blocks(struct acpi_memory_info *info, static int acpi_unbind_memblk(struct memory_block *mem, void *arg) { acpi_unbind_one(&mem->dev); + mem->type = MEMORY_BLOCK_BOOT; return 0; } @@ -203,10 +211,13 @@ static void acpi_unbind_memory_blocks(struct acpi_memory_info *info) static int acpi_memory_enable_device(struct acpi_memory_device *mem_device) { acpi_handle handle = mem_device->device->handle; - int result, num_enabled = 0; + int result, num_enabled = 0, type = MEMORY_BLOCK_DIMM; struct acpi_memory_info *info; int node; + if (!IS_ENABLED(CONFIG_MEMORY_HOTREMOVE)) + type = MEMORY_BLOCK_DIMM_UNREMOVABLE; + node = acpi_get_node(handle); /* * Tell the VM there is more memory here... @@ -228,7 +239,8 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device) if (node < 0) node = memory_add_physaddr_to_nid(info->start_addr); - result = __add_memory(node, info->start_addr, info->length); + result = __add_memory(node, info->start_addr, info->length, + type); /* * If the memory block has been used by the kernel, add_memory() diff --git a/drivers/base/memory.c b/drivers/base/memory.c index c42300082c88..c5fdca7a3009 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -394,6 +394,21 @@ static ssize_t type_show(struct device *dev, struct device_attribute *attr, case MEMORY_BLOCK_BOOT: len = sprintf(buf, "boot\n"); break; + case MEMORY_BLOCK_DIMM: + len = sprintf(buf, "dimm\n"); + break; + case MEMORY_BLOCK_DIMM_UNREMOVABLE: + len = sprintf(buf, "dimm-unremovable\n"); + break; + case MEMORY_BLOCK_BALLOON: + len = sprintf(buf, "balloon\n"); + break; + case MEMORY_BLOCK_BALLOON_MOVABLE: + len = sprintf(buf, "balloon-movable\n"); + break; + case MEMORY_BLOCK_S390X_STANDBY: + len = sprintf(buf, "s390x-standby\n"); + break; default: len = sprintf(buf, "ERROR-UNKNOWN-%ld\n", mem->state); @@ -538,7 +553,8 @@ memory_probe_store(struct device *dev, struct device_attribute *attr, nid = memory_add_physaddr_to_nid(phys_addr); ret = __add_memory(nid, phys_addr, - MIN_MEMORY_BLOCK_SIZE * sections_per_block); + MIN_MEMORY_BLOCK_SIZE * sections_per_block, + MEMORY_BLOCK_DIMM_UNREMOVABLE); if (ret) goto out; diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c index 47719862e57f..f502ea6cd255 100644 --- a/drivers/hv/hv_balloon.c +++ b/drivers/hv/hv_balloon.c @@ -741,7 +741,8 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size, nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn)); ret = add_memory(nid, PFN_PHYS((start_pfn)), - (HA_CHUNK << PAGE_SHIFT)); + (HA_CHUNK << PAGE_SHIFT), + MEMORY_BLOCK_BALLOON); if (ret) { pr_err("hot_add memory failed error is %d\n", ret); diff --git a/drivers/s390/char/sclp_cmd.c b/drivers/s390/char/sclp_cmd.c index 37d42de06079..0ca6f77e7e1d 100644 --- a/drivers/s390/char/sclp_cmd.c +++ b/drivers/s390/char/sclp_cmd.c @@ -406,7 +406,8 @@ static void __init add_memory_merged(u16 rn) if (!size) goto skip_add; for (addr = start; addr < start + size; addr += block_size) - add_memory(numa_pfn_to_nid(PFN_DOWN(addr)), addr, block_size); + add_memory(numa_pfn_to_nid(PFN_DOWN(addr)), addr, block_size, + MEMORY_BLOCK_S390X_STANDBY); skip_add: first_rn = rn; num = 1; diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index 5d2d7a917b4e..953ff86d609b 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -352,7 +352,7 @@ static enum bp_state reserve_additional_memory(void) mutex_unlock(&balloon_mutex); /* add_memory_resource() requires the device_hotplug lock */ lock_device_hotplug(); - rc = add_memory_resource(nid, resource); + rc = add_memory_resource(nid, resource, MEMORY_BLOCK_BALLOON); unlock_device_hotplug(); mutex_lock(&balloon_mutex); diff --git a/include/linux/memory.h b/include/linux/memory.h index 9f39ef41e6d2..a3a1e9764805 100644 --- a/include/linux/memory.h +++ b/include/linux/memory.h @@ -59,12 +59,57 @@ int set_memory_block_size_order(unsigned int order); * specific device driver takes care of this memory block. This memory * block type is onlined automatically by the kernel during boot and might * later be managed by a different device driver, in which case the type - * might change. + * might change (e.g. to MEMORY_BLOCK_DIMM). + * + * MEMORY_BLOCK_DIMM: + * This memory block is managed by a device driver taking care of DIMMs + * (or similar). Once all memory blocks belonging to the DIMM have been + * offlined, the DIMM along with the memory blocks can be removed to + * effectively unplug it. This memory block type is usually onlined to the + * MOVABLE zone, to make offlining and unplug possible. Examples include + * ACPI DIMMs and PPC LMBs if the kernel supports removal of memory. + * + * MEMORY_BLOCK_DIMM_UNREMOVABLE: + * This memory block is managed by a device driver taking care of DIMMs + * (or similar). There is either no HW interface to remove the DIMM or + * the kernel does not support offlining/removal of memory, so this memory + * block can never be removed. Examples include ACPI DIMMs and PPC LMBs + * when removal of memory is not supported by the kernel, as well as + * memory probed manually from user space. + * This memory block type is usually onlined to the NORMAL zone. + * + * MEMORY_BLOCK_BALLOON: + * This memory block was added by a balloon device driver (or similar) + * that does not require a specific zone for optimal operation + * (e.g. unplug memory using balloon inflation on this memory block on + * page granularity). Examples include memory added by the XEN and Hyper-V + * balloon driver. + * This memory block type is usually onlined to the NORMAL zone. + * + * MEMORY_BLOCK_BALLOON_MOVABLE: + * This memory block was added by a balloon device driver (or similar) + * that suggests to online this memory block to the MOVABLE zone for + * optimal operation (a.g. unplug using balloon inflation on this memory + * block in bigger chunks than pages). There are no examples yet. + * This memory block type is usually onlined to the MOVABLE zone. + * + * MEMORY_BLOCK_S390X_STANDBY: + * The memory block is special standby memory on s390x. As long as + * offline, no memory will be allocated to the system for this memory + * block. Onlining memory will result in memory getting allocated to the + * system and memory can usually not be offlined again. The memory block + * will never be removed. This memory type is usually not onlined + * automatically but explicitly by the administrator. */ enum { MEMORY_BLOCK_NONE = 0, MEMORY_BLOCK_UNSPECIFIED, MEMORY_BLOCK_BOOT, + MEMORY_BLOCK_DIMM, + MEMORY_BLOCK_DIMM_UNREMOVABLE, + MEMORY_BLOCK_BALLOON, + MEMORY_BLOCK_BALLOON_MOVABLE, + MEMORY_BLOCK_S390X_STANDBY, }; /* These states are exposed to userspace as text strings in sysfs */ diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 667a37aa9a3c..7c8895299e8c 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -326,9 +326,9 @@ static inline void __remove_memory(int nid, u64 start, u64 size) {} extern void __ref free_area_init_core_hotplug(int nid); extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn, void *arg, int (*func)(struct memory_block *, void *)); -extern int __add_memory(int nid, u64 start, u64 size); -extern int add_memory(int nid, u64 start, u64 size); -extern int add_memory_resource(int nid, struct resource *resource); +extern int __add_memory(int nid, u64 start, u64 size, int type); +extern int add_memory(int nid, u64 start, u64 size, int type); +extern int add_memory_resource(int nid, struct resource *resource, int type); extern int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, int type); extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 7246faa44488..f109002d6e6e 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1071,7 +1071,7 @@ static int online_memory_block(struct memory_block *mem, void *arg) * * we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */ -int __ref add_memory_resource(int nid, struct resource *res) +int __ref add_memory_resource(int nid, struct resource *res, int type) { u64 start, size; bool new_node = false; @@ -1080,6 +1080,9 @@ int __ref add_memory_resource(int nid, struct resource *res) start = res->start; size = resource_size(res); + if (type == MEMORY_BLOCK_NONE) + return -EINVAL; + ret = check_hotplug_memory_range(start, size); if (ret) return ret; @@ -1100,7 +1103,7 @@ int __ref add_memory_resource(int nid, struct resource *res) new_node = ret; /* call arch's memory hotadd */ - ret = arch_add_memory(nid, start, size, NULL, MEMORY_TYPE_UNSPECIFIED); + ret = arch_add_memory(nid, start, size, NULL, type); if (ret < 0) goto error; @@ -1141,7 +1144,7 @@ int __ref add_memory_resource(int nid, struct resource *res) } /* requires device_hotplug_lock, see add_memory_resource() */ -int __ref __add_memory(int nid, u64 start, u64 size) +int __ref __add_memory(int nid, u64 start, u64 size, int type) { struct resource *res; int ret; @@ -1150,18 +1153,18 @@ int __ref __add_memory(int nid, u64 start, u64 size) if (IS_ERR(res)) return PTR_ERR(res); - ret = add_memory_resource(nid, res); + ret = add_memory_resource(nid, res, type); if (ret < 0) release_memory_resource(res); return ret; } -int add_memory(int nid, u64 start, u64 size) +int add_memory(int nid, u64 start, u64 size, int type) { int rc; lock_device_hotplug(); - rc = __add_memory(nid, start, size); + rc = __add_memory(nid, start, size, type); unlock_device_hotplug(); return rc; -- 2.17.2