Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp967244imm; Wed, 23 May 2018 08:14:36 -0700 (PDT) X-Google-Smtp-Source: AB8JxZr3Z01TBB8X8tlVhp3Wft5x3VjG8h3hlVt2w7Nq7GbcBO0+V9eApql7wdm0nHqra4qXdFRL X-Received: by 2002:a62:6a0a:: with SMTP id f10-v6mr3257587pfc.99.1527088476330; Wed, 23 May 2018 08:14:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527088476; cv=none; d=google.com; s=arc-20160816; b=iele5BHxREy+MUMwLWs+7KERA/uO7/5o9puWJzviD1RHFlWjN4lu5R3/QDpLQoK6PG HSbZcPaF94h6EQtpApgkHW0gCgx5vNmkgbRKWMen7WjJt+gCJrXFfEOlGcg1Orv5OTwH YtXzLDicp3ZoWaiGOz4U42psGcjMVqSbF5Xew/W1m2XRRiLXQRewbQk4MvBnohzDwFiz VEZMveQxqmtW557ZiK46u72Ek/gQXzt3kbP4nlE2otXSIT+dMoH7/drFIGYM4mSU+ABX ovOtY6LuY6H6/43+GL+zrBQ04Kf4n4QbIxTrx7Si/f95ftF9PBc9q4uFKp2FwMVb1CPA TJ4g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=ygK/IpDeBMIYKecqpqNT/KHTIGqmPwqEE0OkbtpD0L0=; b=Im/ugxkA2WiDqFYIHIHv4EzDAwgY7dF9uHVzLiqOoOA57hDyYyU4oA4zEA0BUvGvGu sFmFLdVNhSMfl9llTdFjgPBsiJYmTzCTnuhZ8GfmgXUIkPYcsvLq9IEdKx4Y124sIU2w 3lY2kNK5GWzrXS/mMvsJBQ1JtycfVvq99PYBR0ez7RXxosXyu8hOL0t2oIS88j+F1Hcx aoSMwdJKulYXVUL3+xGbYUzXKyVR02+iwvEe0hsn8DSRgfgcpN+PEJRxXqMYMM+r9hBe GhaEsfvUmyAwvWqfoou85G8QyxFk6oHvt1h5RC37oyv+f28PqN3Rsg6JHatBLDEu5hmW QTRA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z12-v6si14652411pgu.115.2018.05.23.08.14.19; Wed, 23 May 2018 08:14:36 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933571AbeEWPMg (ORCPT + 99 others); Wed, 23 May 2018 11:12:36 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:60016 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S933545AbeEWPM2 (ORCPT ); Wed, 23 May 2018 11:12:28 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 94E9B80825A5; Wed, 23 May 2018 15:12:27 +0000 (UTC) Received: from t460s.redhat.com (ovpn-116-112.ams2.redhat.com [10.36.116.112]) by smtp.corp.redhat.com (Postfix) with ESMTP id 57DC910C564A; Wed, 23 May 2018 15:12:25 +0000 (UTC) From: David Hildenbrand To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, David Hildenbrand , Greg Kroah-Hartman , Boris Ostrovsky , Juergen Gross , Ingo Molnar , Andrew Morton , Pavel Tatashin , Vlastimil Babka , Michal Hocko , Dan Williams , Joonsoo Kim , Reza Arbab , Thomas Gleixner Subject: [PATCH v1 08/10] mm/memory_hotplug: allow to control onlining/offlining of memory by a driver Date: Wed, 23 May 2018 17:11:49 +0200 Message-Id: <20180523151151.6730-9-david@redhat.com> In-Reply-To: <20180523151151.6730-1-david@redhat.com> References: <20180523151151.6730-1-david@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Wed, 23 May 2018 15:12:27 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Wed, 23 May 2018 15:12:27 +0000 (UTC) for IP:'10.11.54.3' DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'david@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Some devices (esp. paravirtualized) might want to control - when to online/offline a memory block - how to online memory (MOVABLE/NORMAL) - in which granularity to online/offline memory So let's add a new flag "driver_managed" and disallow to change the state by user space. Device onlining/offlining will still work, however the memory will not be actually onlined/offlined. That has to be handled by the device driver that owns the memory. Please note that we have to create user visible memory blocks after all since this is required to trigger the right udevs events in order to reload kexec/kdump. Also, it allows to see what is going on in the system (e.g. which memory blocks are still around). Cc: Greg Kroah-Hartman Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Ingo Molnar Cc: Andrew Morton Cc: Pavel Tatashin Cc: Vlastimil Babka Cc: Michal Hocko Cc: Dan Williams Cc: Joonsoo Kim Cc: Reza Arbab Cc: Thomas Gleixner Signed-off-by: David Hildenbrand --- drivers/base/memory.c | 22 ++++++++++++++-------- drivers/xen/balloon.c | 2 +- include/linux/memory.h | 1 + include/linux/memory_hotplug.h | 4 +++- mm/memory_hotplug.c | 34 ++++++++++++++++++++++++++++++++-- 5 files changed, 51 insertions(+), 12 deletions(-) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index bffe8616bd55..3b8616551561 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -231,27 +231,28 @@ static bool pages_correctly_probed(unsigned long start_pfn) * Must already be protected by mem_hotplug_begin(). */ static int -memory_block_action(unsigned long phys_index, unsigned long action, int online_type) +memory_block_action(struct memory_block *mem, unsigned long action) { - unsigned long start_pfn; + unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr); unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block; - int ret; + int ret = 0; - start_pfn = section_nr_to_pfn(phys_index); + if (mem->driver_managed) + return 0; switch (action) { case MEM_ONLINE: if (!pages_correctly_probed(start_pfn)) return -EBUSY; - ret = online_pages(start_pfn, nr_pages, online_type); + ret = online_pages(start_pfn, nr_pages, mem->online_type); break; case MEM_OFFLINE: ret = offline_pages(start_pfn, nr_pages); break; default: WARN(1, KERN_WARNING "%s(%ld, %ld) unknown action: " - "%ld\n", __func__, phys_index, action, action); + "%ld\n", __func__, mem->start_section_nr, action, action); ret = -EINVAL; } @@ -269,8 +270,7 @@ static int memory_block_change_state(struct memory_block *mem, if (to_state == MEM_OFFLINE) mem->state = MEM_GOING_OFFLINE; - ret = memory_block_action(mem->start_section_nr, to_state, - mem->online_type); + ret = memory_block_action(mem, to_state); mem->state = ret ? from_state_req : to_state; @@ -350,6 +350,11 @@ store_mem_state(struct device *dev, */ mem_hotplug_begin(); + if (mem->driver_managed) { + ret = -EINVAL; + goto out; + } + switch (online_type) { case MMOP_ONLINE_KERNEL: case MMOP_ONLINE_MOVABLE: @@ -364,6 +369,7 @@ store_mem_state(struct device *dev, ret = -EINVAL; /* should never happen */ } +out: mem_hotplug_done(); err: unlock_device_hotplug(); diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index 065f0b607373..89981d573c06 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -401,7 +401,7 @@ static enum bp_state reserve_additional_memory(void) * callers drop the mutex before trying again. */ mutex_unlock(&balloon_mutex); - rc = add_memory_resource(nid, resource, memhp_auto_online); + rc = add_memory_resource(nid, resource, memhp_auto_online, false); mutex_lock(&balloon_mutex); if (rc) { diff --git a/include/linux/memory.h b/include/linux/memory.h index 9f8cd856ca1e..018c5e5ecde1 100644 --- a/include/linux/memory.h +++ b/include/linux/memory.h @@ -29,6 +29,7 @@ struct memory_block { unsigned long state; /* serialized by the dev->lock */ int section_count; /* serialized by mem_sysfs_mutex */ int online_type; /* for passing data to online routine */ + bool driver_managed; /* driver handles online/offline */ int phys_device; /* to which fru does this belong? */ void *hw; /* optional pointer to fw/hw data */ int (*phys_callback)(struct memory_block *); diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index d71829d54360..497e28f5b000 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -326,7 +326,9 @@ static inline void remove_memory(int nid, u64 start, u64 size) {} extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn, void *arg, int (*func)(struct memory_block *, void *)); extern int add_memory(int nid, u64 start, u64 size); -extern int add_memory_resource(int nid, struct resource *resource, bool online); +extern int add_memory_driver_managed(int nid, u64 start, u64 size); +extern int add_memory_resource(int nid, struct resource *resource, bool online, + bool driver_managed); extern int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, bool want_memblock); extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 27f7c27f57ac..1610e214bfc8 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1124,8 +1124,15 @@ static int online_memory_block(struct memory_block *mem, void *arg) return device_online(&mem->dev); } +static int mark_memory_block_driver_managed(struct memory_block *mem, void *arg) +{ + mem->driver_managed = true; + return 0; +} + /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */ -int __ref add_memory_resource(int nid, struct resource *res, bool online) +int __ref add_memory_resource(int nid, struct resource *res, bool online, + bool driver_managed) { u64 start, size; pg_data_t *pgdat = NULL; @@ -1133,6 +1140,9 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online) bool new_node; int ret; + if (online && driver_managed) + return -EINVAL; + start = res->start; size = resource_size(res); @@ -1204,6 +1214,9 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online) if (online) walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL, online_memory_block); + else if (driver_managed) + walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), + NULL, mark_memory_block_driver_managed); goto out; @@ -1228,13 +1241,30 @@ int __ref add_memory(int nid, u64 start, u64 size) if (IS_ERR(res)) return PTR_ERR(res); - ret = add_memory_resource(nid, res, memhp_auto_online); + ret = add_memory_resource(nid, res, memhp_auto_online, false); if (ret < 0) release_memory_resource(res); return ret; } EXPORT_SYMBOL_GPL(add_memory); +int __ref add_memory_driver_managed(int nid, u64 start, u64 size) +{ + struct resource *res; + int ret; + + res = register_memory_resource(start, size); + if (IS_ERR(res)) + return PTR_ERR(res); + + ret = add_memory_resource(nid, res, false, true); + if (ret < 0) + release_memory_resource(res); + return ret; +} +EXPORT_SYMBOL_GPL(add_memory_driver_managed); + + #ifdef CONFIG_MEMORY_HOTREMOVE /* * A free page on the buddy free lists (not the per-cpu lists) has PageBuddy -- 2.17.0