From: Tianyu Lan <[email protected]>
Hyper-V provides dynamic memory hot add/remove function.
Memory hot-add has already enabled in Hyper-V balloon driver.
Now add memory hot-remove function.
This patchset is based on the David Hildenbrand's "virtio-mem:
paravirtualized memory" RFC V4 version and use new interface
offline_and_remove_memory().
https://lkml.org/lkml/2019/12/12/681
Change since V1:
- Split patch into small patch for review.
- Convert ha_lock from spin lock to mutex.
- Add a common work to handle all mem hot plug and
balloon msg
- Use offline_and_remove_memory() to offline and remove
memory.
Tianyu Lan (10):
mm/resource: Move child to new resource when release mem region.
mm: expose is_mem_section_removable() symbol
x86/Hyper-V/Balloon: Replace hot-add and balloon up works with a
common work
x86/Hyper-V/Balloon: Convert spin lock ha_lock to mutex
x86/Hyper-V/Balloon: Avoid releasing ha_lock when traverse
ha_region_list
x86/Hyper-V/Balloon: Enable mem hot-remove capability
x86/Hyper-V/Balloon: Handle mem hot-remove request
x86/Hyper-V/Balloon: Handle request with non-aligned page number
x86/Hyper-V/Balloon: Hot add mem in the gaps of hot add region
x86/Hyper-V: Workaround Hyper-V unballoon msg bug
drivers/hv/hv_balloon.c | 754 ++++++++++++++++++++++++++++++++++++++++--------
kernel/resource.c | 38 ++-
mm/memory_hotplug.c | 1 +
3 files changed, 673 insertions(+), 120 deletions(-)
--
2.14.5
From: Tianyu Lan <[email protected]>
When release mem region, old mem region may be splited to
two regions. Current allocate new struct resource for high
end mem region but not move child resources whose ranges are
in the high end range to new resource. When adjust old mem
region's range, adjust_resource() detects child region's range
is out of new range and return error. Move child resources to
high end resource before adjusting old mem range.
Signed-off-by: Tianyu Lan <[email protected]>
---
kernel/resource.c | 38 ++++++++++++++++++++++++++++++++++----
1 file changed, 34 insertions(+), 4 deletions(-)
diff --git a/kernel/resource.c b/kernel/resource.c
index 76036a41143b..1c7362825134 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -181,6 +181,38 @@ static struct resource *alloc_resource(gfp_t flags)
return res;
}
+static void move_child_to_newresource(struct resource *old,
+ struct resource *new)
+{
+ struct resource *tmp, **p, **np;
+
+ if (!old->child)
+ return;
+
+ p = &old->child;
+ np = &new->child;
+
+ for (;;) {
+ tmp = *p;
+ if (!tmp)
+ break;
+
+ if (tmp->start >= new->start && tmp->end <= new->end) {
+ tmp->parent = new;
+ *np = tmp;
+ np = &tmp->sibling;
+ *p = tmp->sibling;
+
+ if (!tmp->sibling)
+ *np = NULL;
+ continue;
+ }
+
+ p = &tmp->sibling;
+ }
+}
+
+
/* Return the conflict entry if you can't request it */
static struct resource * __request_resource(struct resource *root, struct resource *new)
{
@@ -1246,9 +1278,6 @@ EXPORT_SYMBOL(__release_region);
* Note:
* - Additional release conditions, such as overlapping region, can be
* supported after they are confirmed as valid cases.
- * - When a busy memory resource gets split into two entries, the code
- * assumes that all children remain in the lower address entry for
- * simplicity. Enhance this logic when necessary.
*/
int release_mem_region_adjustable(struct resource *parent,
resource_size_t start, resource_size_t size)
@@ -1331,11 +1360,12 @@ int release_mem_region_adjustable(struct resource *parent,
new_res->sibling = res->sibling;
new_res->child = NULL;
+ move_child_to_newresource(res, new_res);
+ res->sibling = new_res;
ret = __adjust_resource(res, res->start,
start - res->start);
if (ret)
break;
- res->sibling = new_res;
new_res = NULL;
}
--
2.14.5
From: Tianyu Lan <[email protected]>
Hyper-V balloon driver will use is_mem_section_removable() to
check whether memory block is removable or not when receive
memory hot remove msg. Expose it.
Signed-off-by: Tianyu Lan <[email protected]>
---
mm/memory_hotplug.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index d04369e6d3cc..a4ebfc5c48b3 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1179,6 +1179,7 @@ bool is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages)
/* All pageblocks in the memory block are likely to be hot-removable */
return true;
}
+EXPORT_SYMBOL_GPL(is_mem_section_removable);
/*
* Confirm all pages in a range [start, end) belong to the same zone.
--
2.14.5
From: Tianyu Lan <[email protected]>
ha_lock is to protect ha_region_list and is hold in process
context. When process mem hot add msg, add_memory() will be
called in the loop of traversing ha_region_list in order to
find associated ha region. add_memory() holds device_hotplug_lock
mutex and ha_lock is also hold in hv_online_page() which is
called inside add_memory(). So current code needs to release
ha_lock before calling add_memory() in order to avoid holding
mutex under spin lock protection and holding ha_lock twice
which may cause dead lock. When implement mem hot remove, also
have such issue. To avoid releasing ha_lock in the loop of
traversing ha_region_list and simplify code, convert ha_lock
from spin lock to mutex first.
Signed-off-by: Tianyu Lan <[email protected]>
---
drivers/hv/hv_balloon.c | 49 ++++++++++++++++++++++---------------------------
1 file changed, 22 insertions(+), 27 deletions(-)
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index bdb6791e6de1..185146795122 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -546,7 +546,7 @@ struct hv_dynmem_device {
* Protects ha_region_list, num_pages_onlined counter and individual
* regions from ha_region_list.
*/
- spinlock_t ha_lock;
+ struct mutex ha_lock;
/*
* A list of hot-add regions.
@@ -629,7 +629,7 @@ static int hv_memory_notifier(struct notifier_block *nb, unsigned long val,
void *v)
{
struct memory_notify *mem = (struct memory_notify *)v;
- unsigned long flags, pfn_count;
+ unsigned long pfn_count;
switch (val) {
case MEM_ONLINE:
@@ -641,7 +641,7 @@ static int hv_memory_notifier(struct notifier_block *nb, unsigned long val,
break;
case MEM_OFFLINE:
- spin_lock_irqsave(&dm_device.ha_lock, flags);
+ mutex_lock(&dm_device.ha_lock);
pfn_count = hv_page_offline_check(mem->start_pfn,
mem->nr_pages);
if (pfn_count <= dm_device.num_pages_onlined) {
@@ -655,7 +655,7 @@ static int hv_memory_notifier(struct notifier_block *nb, unsigned long val,
WARN_ON_ONCE(1);
dm_device.num_pages_onlined = 0;
}
- spin_unlock_irqrestore(&dm_device.ha_lock, flags);
+ mutex_unlock(&dm_device.ha_lock);
break;
case MEM_GOING_ONLINE:
case MEM_GOING_OFFLINE:
@@ -707,12 +707,11 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
unsigned long start_pfn;
unsigned long processed_pfn;
unsigned long total_pfn = pfn_count;
- unsigned long flags;
for (i = 0; i < (size/HA_CHUNK); i++) {
start_pfn = start + (i * HA_CHUNK);
- spin_lock_irqsave(&dm_device.ha_lock, flags);
+ mutex_lock(&dm_device.ha_lock);
has->ha_end_pfn += HA_CHUNK;
if (total_pfn > HA_CHUNK) {
@@ -724,7 +723,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
}
has->covered_end_pfn += processed_pfn;
- spin_unlock_irqrestore(&dm_device.ha_lock, flags);
+ mutex_unlock(&dm_device.ha_lock);
init_completion(&dm_device.ol_waitevent);
dm_device.ha_waiting = !memhp_auto_online;
@@ -745,10 +744,10 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
*/
do_hot_add = false;
}
- spin_lock_irqsave(&dm_device.ha_lock, flags);
+ mutex_lock(&dm_device.ha_lock);
has->ha_end_pfn -= HA_CHUNK;
has->covered_end_pfn -= processed_pfn;
- spin_unlock_irqrestore(&dm_device.ha_lock, flags);
+ mutex_unlock(&dm_device.ha_lock);
break;
}
@@ -769,10 +768,9 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
static void hv_online_page(struct page *pg, unsigned int order)
{
struct hv_hotadd_state *has;
- unsigned long flags;
unsigned long pfn = page_to_pfn(pg);
- spin_lock_irqsave(&dm_device.ha_lock, flags);
+ mutex_lock(&dm_device.ha_lock);
list_for_each_entry(has, &dm_device.ha_region_list, list) {
/* The page belongs to a different HAS. */
if ((pfn < has->start_pfn) ||
@@ -782,7 +780,7 @@ static void hv_online_page(struct page *pg, unsigned int order)
hv_bring_pgs_online(has, pfn, 1UL << order);
break;
}
- spin_unlock_irqrestore(&dm_device.ha_lock, flags);
+ mutex_unlock(&dm_device.ha_lock);
}
static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
@@ -791,9 +789,8 @@ static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
struct hv_hotadd_gap *gap;
unsigned long residual, new_inc;
int ret = 0;
- unsigned long flags;
- spin_lock_irqsave(&dm_device.ha_lock, flags);
+ mutex_lock(&dm_device.ha_lock);
list_for_each_entry(has, &dm_device.ha_region_list, list) {
/*
* If the pfn range we are dealing with is not in the current
@@ -840,7 +837,7 @@ static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
ret = 1;
break;
}
- spin_unlock_irqrestore(&dm_device.ha_lock, flags);
+ mutex_unlock(&dm_device.ha_lock);
return ret;
}
@@ -854,12 +851,12 @@ static unsigned long handle_pg_range(unsigned long pg_start,
struct hv_hotadd_state *has;
unsigned long pgs_ol = 0;
unsigned long old_covered_state;
- unsigned long res = 0, flags;
+ unsigned long res = 0;
pr_debug("Hot adding %lu pages starting at pfn 0x%lx.\n", pg_count,
pg_start);
- spin_lock_irqsave(&dm_device.ha_lock, flags);
+ mutex_lock(&dm_device.ha_lock);
list_for_each_entry(has, &dm_device.ha_region_list, list) {
/*
* If the pfn range we are dealing with is not in the current
@@ -912,9 +909,9 @@ static unsigned long handle_pg_range(unsigned long pg_start,
} else {
pfn_cnt = size;
}
- spin_unlock_irqrestore(&dm_device.ha_lock, flags);
+ mutex_unlock(&dm_device.ha_lock);
hv_mem_hot_add(has->ha_end_pfn, size, pfn_cnt, has);
- spin_lock_irqsave(&dm_device.ha_lock, flags);
+ mutex_lock(&dm_device.ha_lock);
}
/*
* If we managed to online any pages that were given to us,
@@ -923,7 +920,7 @@ static unsigned long handle_pg_range(unsigned long pg_start,
res = has->covered_end_pfn - old_covered_state;
break;
}
- spin_unlock_irqrestore(&dm_device.ha_lock, flags);
+ mutex_unlock(&dm_device.ha_lock);
return res;
}
@@ -935,7 +932,6 @@ static unsigned long process_hot_add(unsigned long pg_start,
{
struct hv_hotadd_state *ha_region = NULL;
int covered;
- unsigned long flags;
if (pfn_cnt == 0)
return 0;
@@ -967,9 +963,9 @@ static unsigned long process_hot_add(unsigned long pg_start,
ha_region->covered_end_pfn = pg_start;
ha_region->end_pfn = rg_start + rg_size;
- spin_lock_irqsave(&dm_device.ha_lock, flags);
+ mutex_lock(&dm_device.ha_lock);
list_add_tail(&ha_region->list, &dm_device.ha_region_list);
- spin_unlock_irqrestore(&dm_device.ha_lock, flags);
+ mutex_unlock(&dm_device.ha_lock);
}
do_pg_range:
@@ -1727,7 +1723,7 @@ static int balloon_probe(struct hv_device *dev,
init_completion(&dm_device.host_event);
init_completion(&dm_device.config_event);
INIT_LIST_HEAD(&dm_device.ha_region_list);
- spin_lock_init(&dm_device.ha_lock);
+ mutex_init(&dm_device.ha_lock);
INIT_WORK(&dm_device.dm_wrk.wrk, dm_msg_work);
dm_device.host_specified_ha_region = false;
@@ -1769,7 +1765,6 @@ static int balloon_remove(struct hv_device *dev)
struct hv_dynmem_device *dm = hv_get_drvdata(dev);
struct hv_hotadd_state *has, *tmp;
struct hv_hotadd_gap *gap, *tmp_gap;
- unsigned long flags;
if (dm->num_pages_ballooned != 0)
pr_warn("Ballooned pages: %d\n", dm->num_pages_ballooned);
@@ -1782,7 +1777,7 @@ static int balloon_remove(struct hv_device *dev)
unregister_memory_notifier(&hv_memory_nb);
restore_online_page_callback(&hv_online_page);
#endif
- spin_lock_irqsave(&dm_device.ha_lock, flags);
+ mutex_lock(&dm_device.ha_lock);
list_for_each_entry_safe(has, tmp, &dm->ha_region_list, list) {
list_for_each_entry_safe(gap, tmp_gap, &has->gap_list, list) {
list_del(&gap->list);
@@ -1791,7 +1786,7 @@ static int balloon_remove(struct hv_device *dev)
list_del(&has->list);
kfree(has);
}
- spin_unlock_irqrestore(&dm_device.ha_lock, flags);
+ mutex_unlock(&dm_device.ha_lock);
return 0;
}
--
2.14.5
From: Tianyu Lan <[email protected]>
Enable mem hot-remove capability, handle request in the
common work and add mem hot-remove operation later.
Signed-off-by: Tianyu Lan <[email protected]>
---
drivers/hv/hv_balloon.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 101 insertions(+), 3 deletions(-)
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 729dc5551302..43e490f492d5 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -49,6 +49,7 @@
* Changes to 0.2 on 2009/05/14
* Changes to 0.3 on 2009/12/03
* Changed to 1.0 on 2011/04/05
+ * Changed to 2.0 on 2019/12/10
*/
#define DYNMEM_MAKE_VERSION(Major, Minor) ((__u32)(((Major) << 16) | (Minor)))
@@ -94,7 +95,13 @@ enum dm_message_type {
* Version 1.0.
*/
DM_INFO_MESSAGE = 12,
- DM_VERSION_1_MAX = 12
+ DM_VERSION_1_MAX = 12,
+
+ /*
+ * Version 2.0
+ */
+ DM_MEM_HOT_REMOVE_REQUEST = 13,
+ DM_MEM_HOT_REMOVE_RESPONSE = 14
};
@@ -123,7 +130,8 @@ union dm_caps {
* represents an alignment of 2^n in mega bytes.
*/
__u64 hot_add_alignment:4;
- __u64 reservedz:58;
+ __u64 hot_remove:1;
+ __u64 reservedz:57;
} cap_bits;
__u64 caps;
} __packed;
@@ -234,7 +242,9 @@ struct dm_capabilities {
struct dm_capabilities_resp_msg {
struct dm_header hdr;
__u64 is_accepted:1;
- __u64 reservedz:63;
+ __u64 hot_remove:1;
+ __u64 suppress_pressure_reports:1;
+ __u64 reservedz:61;
} __packed;
/*
@@ -377,6 +387,27 @@ struct dm_hot_add_response {
__u32 result;
} __packed;
+struct dm_hot_remove {
+ struct dm_header hdr;
+ __u32 virtual_node;
+ __u32 page_count;
+ __u32 qos_flags;
+ __u32 reservedZ;
+} __packed;
+
+struct dm_hot_remove_response {
+ struct dm_header hdr;
+ __u32 result;
+ __u32 range_count;
+ __u64 more_pages:1;
+ __u64 reservedz:63;
+ union dm_mem_page_range range_array[];
+} __packed;
+
+#define DM_REMOVE_QOS_LARGE (1 << 0)
+#define DM_REMOVE_QOS_LOCAL (1 << 1)
+#define DM_REMOVE_QOS_MASK (0x3)
+
/*
* Types of information sent from host to the guest.
*/
@@ -455,6 +486,11 @@ union dm_msg_info {
union dm_mem_page_range ha_page_range;
union dm_mem_page_range ha_region_range;
} hot_add;
+ struct {
+ __u32 virtual_node;
+ __u32 page_count;
+ __u32 qos_flags;
+ } hot_remove;
};
struct dm_msg_wrk {
@@ -496,6 +532,7 @@ enum hv_dm_state {
DM_BALLOON_UP,
DM_BALLOON_DOWN,
DM_HOT_ADD,
+ DM_HOT_REMOVE,
DM_INIT_ERROR
};
@@ -571,6 +608,35 @@ static struct hv_dynmem_device dm_device;
static void post_status(struct hv_dynmem_device *dm);
+static int hv_send_hot_remove_response(
+ struct dm_hot_remove_response *resp,
+ unsigned long request_count, bool more_pages)
+{
+ struct hv_dynmem_device *dm = &dm_device;
+ int ret;
+
+ resp->hdr.type = DM_MEM_HOT_REMOVE_RESPONSE;
+ resp->range_count = request_count;
+ resp->more_pages = more_pages;
+ resp->hdr.size = sizeof(struct dm_hot_remove_response)
+ + sizeof(union dm_mem_page_range) * request_count;
+ resp->result = !!request_count;
+
+ do {
+ resp->hdr.trans_id = atomic_inc_return(&trans_id);
+ ret = vmbus_sendpacket(dm->dev->channel, resp,
+ resp->hdr.size,
+ (unsigned long)NULL,
+ VM_PKT_DATA_INBAND, 0);
+
+ if (ret == -EAGAIN)
+ msleep(20);
+ post_status(&dm_device);
+ } while (ret == -EAGAIN);
+
+ return ret;
+}
+
#ifdef CONFIG_MEMORY_HOTPLUG
static inline bool has_pfn_is_backed(struct hv_hotadd_state *has,
unsigned long pfn)
@@ -982,6 +1048,17 @@ static unsigned long process_hot_add(unsigned long pg_start,
#endif
+static void hot_remove_req(union dm_msg_info *msg_info)
+{
+ struct hv_dynmem_device *dm = &dm_device;
+
+ /* Add hot remove operation later and send failure response. */
+ hv_send_hot_remove_response((struct dm_hot_remove_response *)
+ balloon_up_send_buffer, 0, false);
+
+ dm->state = DM_INITIALIZED;
+}
+
static void hot_add_req(union dm_msg_info *msg_info)
{
struct dm_hot_add_response resp;
@@ -1366,6 +1443,9 @@ static void dm_msg_work(struct work_struct *dummy)
case DM_MEM_HOT_ADD_REQUEST:
hot_add_req(msg_info);
break;
+ case DM_MEM_HOT_REMOVE_REQUEST:
+ hot_remove_req(msg_info);
+ break;
default:
return;
}
@@ -1506,6 +1586,7 @@ static void balloon_onchannelcallback(void *context)
struct hv_dynmem_device *dm = hv_get_drvdata(dev);
struct dm_balloon *bal_msg;
struct dm_hot_add *ha_msg;
+ struct dm_hot_remove *hr_msg;
struct dm_msg_wrk *dm_wrk = &dm_device.dm_wrk;
union dm_msg_info *msg_info = &dm_wrk->dm_msg;
union dm_mem_page_range *ha_pg_range;
@@ -1587,6 +1668,22 @@ static void balloon_onchannelcallback(void *context)
schedule_work(&dm_wrk->wrk);
break;
+ case DM_MEM_HOT_REMOVE_REQUEST:
+ if (dm->state == DM_HOT_REMOVE)
+ pr_warn("Currently hot-removing.\n");
+
+ dm->state = DM_HOT_REMOVE;
+ hr_msg = (struct dm_hot_remove *)recv_buffer;
+
+ msg_info->hot_remove.virtual_node
+ = hr_msg->virtual_node;
+ msg_info->hot_remove.page_count = hr_msg->page_count;
+ msg_info->hot_remove.qos_flags = hr_msg->qos_flags;
+
+ dm_wrk->msg_type = DM_MEM_HOT_REMOVE_REQUEST;
+ schedule_work(&dm_wrk->wrk);
+ break;
+
case DM_INFO_MESSAGE:
process_info(dm, (struct dm_info_msg *)dm_msg);
break;
@@ -1665,6 +1762,7 @@ static int balloon_connect_vsp(struct hv_device *dev)
*/
cap_msg.caps.cap_bits.balloon = 1;
cap_msg.caps.cap_bits.hot_add = 1;
+ cap_msg.caps.cap_bits.hot_remove = 1;
/*
* Specify our alignment requirements as it relates
--
2.14.5
From: Tianyu Lan <[email protected]>
Linux system mem hot plug unit is 128MB and request page
number maybe not aligned with unit. The non-aligned case
will handle in the later.
Handle mem hot-remve request:
First, search memory from ha region list. If find suitable memory
block, offline & remove memory and create ha region region "gap"
struct for the range. "gap" means the range in the hot-add region
is offlined or removed. The following mem hot-add msg may add
memory in the gap range back.
If there is no suitable memory in the hot-add region, search memory
from the system memory on the target node and perform offline&remove
memory.
Signed-off-by: Tianyu Lan <[email protected]>
---
drivers/hv/hv_balloon.c | 188 ++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 184 insertions(+), 4 deletions(-)
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 43e490f492d5..3d8c09fe148a 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -56,6 +56,10 @@
#define DYNMEM_MAJOR_VERSION(Version) ((__u32)(Version) >> 16)
#define DYNMEM_MINOR_VERSION(Version) ((__u32)(Version) & 0xff)
+#define MAX_HOT_REMOVE_ENTRIES \
+ ((PAGE_SIZE - sizeof(struct dm_hot_remove_response)) \
+ / sizeof(union dm_mem_page_range))
+
enum {
DYNMEM_PROTOCOL_VERSION_1 = DYNMEM_MAKE_VERSION(0, 3),
DYNMEM_PROTOCOL_VERSION_2 = DYNMEM_MAKE_VERSION(1, 0),
@@ -697,6 +701,7 @@ static int hv_memory_notifier(struct notifier_block *nb, unsigned long val,
{
struct memory_notify *mem = (struct memory_notify *)v;
unsigned long pfn_count;
+ int need_unlock;
switch (val) {
case MEM_ONLINE:
@@ -708,7 +713,11 @@ static int hv_memory_notifier(struct notifier_block *nb, unsigned long val,
break;
case MEM_OFFLINE:
- mutex_lock(&dm_device.ha_lock);
+ if (dm_device.lock_thread != current) {
+ mutex_lock(&dm_device.ha_lock);
+ need_unlock = 1;
+ }
+
pfn_count = hv_page_offline_check(mem->start_pfn,
mem->nr_pages);
if (pfn_count <= dm_device.num_pages_onlined) {
@@ -722,7 +731,9 @@ static int hv_memory_notifier(struct notifier_block *nb, unsigned long val,
WARN_ON_ONCE(1);
dm_device.num_pages_onlined = 0;
}
- mutex_unlock(&dm_device.ha_lock);
+
+ if (need_unlock)
+ mutex_unlock(&dm_device.ha_lock);
break;
case MEM_GOING_ONLINE:
case MEM_GOING_OFFLINE:
@@ -1046,14 +1057,183 @@ static unsigned long process_hot_add(unsigned long pg_start,
return handle_pg_range(pg_start, pfn_cnt);
}
+static int hv_hot_remove_range(unsigned int nid, unsigned long start_pfn,
+ unsigned long end_pfn, unsigned long nr_pages,
+ unsigned long *request_index,
+ union dm_mem_page_range *range_array,
+ struct hv_hotadd_state *has)
+{
+ unsigned long block_pages = HA_CHUNK;
+ unsigned long rm_pages = nr_pages;
+ unsigned long pfn;
+ int ret;
+
+ for (pfn = start_pfn; pfn < end_pfn; pfn += block_pages) {
+ struct hv_hotadd_gap *gap;
+ int in_gap;
+
+ if (*request_index >= MAX_HOT_REMOVE_ENTRIES) {
+ struct dm_hot_remove_response *resp =
+ (struct dm_hot_remove_response *)
+ balloon_up_send_buffer;
+
+ /* Flush out all hot-remove ranges. */
+ ret = hv_send_hot_remove_response(resp, *request_index,
+ true);
+ if (ret)
+ return ret;
+
+ /* Reset request buffer. */
+ memset(resp, 0x00, PAGE_SIZE);
+ *request_index = 0;
+ }
+
+ /*
+ * Memory in hot-add region gaps has been offlined or removed
+ * and so skip it if remove range overlap with gap.
+ */
+ if (has) {
+ list_for_each_entry(gap, &has->gap_list, list)
+ if (!(pfn >= gap->end_pfn ||
+ pfn + block_pages < gap->start_pfn)) {
+ in_gap = 1;
+ break;
+ }
+
+ if (in_gap)
+ continue;
+ }
+
+ if (online_section_nr(pfn_to_section_nr(pfn))
+ && is_mem_section_removable(pfn, block_pages)) {
+ ret = offline_and_remove_memory(nid, pfn << PAGE_SHIFT,
+ block_pages << PAGE_SHIFT);
+ if (ret)
+ continue;
+
+ range_array[*request_index].finfo.start_page = pfn;
+ range_array[*request_index].finfo.page_cnt
+ = block_pages;
+
+ (*request_index)++;
+ nr_pages -= block_pages;
+
+ if (!nr_pages)
+ break;
+ }
+ }
+
+ return rm_pages - nr_pages;
+}
+
+static int hv_hot_remove_from_ha_list(unsigned int nid, unsigned long nr_pages,
+ unsigned long *request_index,
+ union dm_mem_page_range *range_array)
+{
+ struct hv_hotadd_state *has;
+ unsigned long start_pfn, end_pfn;
+ int rm_pages;
+ int old_index;
+ int ret, i;
+
+ mutex_lock(&dm_device.ha_lock);
+ dm_device.lock_thread = current;
+ list_for_each_entry(has, &dm_device.ha_region_list, list) {
+ rm_pages = min(nr_pages,
+ has->covered_end_pfn - has->start_pfn);
+ start_pfn = has->start_pfn;
+ end_pfn = has->covered_end_pfn;
+ old_index = *request_index;
+
+ if (!rm_pages || pfn_to_nid(start_pfn) != nid)
+ continue;
+
+ rm_pages = hv_hot_remove_range(nid, start_pfn, end_pfn,
+ rm_pages, request_index, range_array, has);
+ if (rm_pages < 0) {
+ ret = rm_pages;
+ goto error;
+ } else if (!rm_pages) {
+ continue;
+ }
+
+ nr_pages -= rm_pages;
+ dm_device.num_pages_added -= rm_pages;
+
+ /* Create gaps for hot remove regions. */
+ for (i = old_index; i < *request_index; i++) {
+ struct hv_hotadd_gap *gap;
+
+ gap = kzalloc(sizeof(struct hv_hotadd_gap), GFP_ATOMIC);
+ if (!gap) {
+ /*
+ * Disable dm hot-plug when fails to allocate
+ * memory for gaps.
+ */
+ ret = -ENOMEM;
+ do_hot_add = false;
+ goto error;
+ }
+
+ INIT_LIST_HEAD(&gap->list);
+ gap->start_pfn = range_array[i].finfo.start_page;
+ gap->end_pfn =
+ gap->start_pfn + range_array[i].finfo.page_cnt;
+ list_add_tail(&gap->list, &has->gap_list);
+ }
+
+ if (!nr_pages)
+ break;
+ }
+
+ ret = nr_pages;
+ error:
+ dm_device.lock_thread = NULL;
+ mutex_unlock(&dm_device.ha_lock);
+
+ return ret;
+}
+
+static void hv_mem_hot_remove(unsigned int nid, u64 nr_pages)
+{
+ struct dm_hot_remove_response *resp
+ = (struct dm_hot_remove_response *)balloon_up_send_buffer;
+ unsigned long start_pfn = node_start_pfn(nid);
+ unsigned long end_pfn = node_end_pfn(nid);
+ unsigned long request_index = 0;
+ int remain_pages;
+
+ /* Todo: Handle request of non-aligned page number later. */
+
+ /* Search hot-remove memory region from hot add list first.*/
+ memset(resp, 0x00, PAGE_SIZE);
+ remain_pages = hv_hot_remove_from_ha_list(nid, nr_pages,
+ &request_index,
+ resp->range_array);
+ if (remain_pages < 0) {
+ /* Send failure response msg. */
+ request_index = 0;
+ } else if (remain_pages) {
+ start_pfn = ALIGN(start_pfn, HA_CHUNK);
+ hv_hot_remove_range(nid, start_pfn, end_pfn, remain_pages,
+ &request_index, resp->range_array, NULL);
+ }
+
+ hv_send_hot_remove_response(resp, request_index, false);
+}
+
#endif
static void hot_remove_req(union dm_msg_info *msg_info)
{
struct hv_dynmem_device *dm = &dm_device;
+ unsigned int numa_node = msg_info->hot_remove.virtual_node;
+ unsigned int page_count = msg_info->hot_remove.page_count;
- /* Add hot remove operation later and send failure response. */
- hv_send_hot_remove_response((struct dm_hot_remove_response *)
+ if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG) && do_hot_add)
+ hv_mem_hot_remove(numa_node, page_count);
+ else
+ hv_send_hot_remove_response((struct dm_hot_remove_response *)
balloon_up_send_buffer, 0, false);
dm->state = DM_INITIALIZED;
--
2.14.5
From: Tianyu Lan <[email protected]>
Hyper-V sends unballoon msg instead of mem hot add msg in some
case. System doesn't receive balloon msg before unballoon msg
and this will cause balloon_down() to produce warning of
free non-exist memory. Workaround the issue via treating
unballoon msg as mem hot-add msg when mem hot-add is
enabled.
Signed-off-by: Tianyu Lan <[email protected]>
---
drivers/hv/hv_balloon.c | 116 +++++++++++++++++++++++++++---------------------
1 file changed, 66 insertions(+), 50 deletions(-)
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 5aaae62955bf..7f3e7ab22d5d 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -486,6 +486,9 @@ union dm_msg_info {
struct {
__u32 num_pages;
} balloon_state;
+ struct {
+ struct dm_unballoon_request *request;
+ } unballoon_state;
struct {
union dm_mem_page_range ha_page_range;
union dm_mem_page_range ha_region_range;
@@ -1155,6 +1158,21 @@ static unsigned long process_hot_add(unsigned long pg_start,
return pfn_cnt;
}
+ if ((rg_start == 0) && (!dm_device.host_specified_ha_region)) {
+ /*
+ * The host has not specified the hot-add region.
+ * Based on the hot-add page range being specified,
+ * compute a hot-add region that can cover the pages
+ * that need to be hot-added while ensuring the alignment
+ * and size requirements of Linux as it relates to hot-add.
+ */
+ rg_size = (pfn_cnt / HA_CHUNK) * HA_CHUNK;
+ if (pfn_cnt % HA_CHUNK)
+ rg_size += HA_CHUNK;
+
+ rg_start = (pg_start / HA_CHUNK) * HA_CHUNK;
+ }
+
if (!dm_device.host_specified_ha_region) {
covered = pfn_covered(pg_start, pfn_cnt);
if (covered < 0)
@@ -1488,28 +1506,6 @@ static void hot_add_req(union dm_msg_info *msg_info)
rg_start = msg_info->hot_add.ha_region_range.finfo.start_page;
rg_sz = msg_info->hot_add.ha_region_range.finfo.page_cnt;
- if ((rg_start == 0) && (!dm->host_specified_ha_region)) {
- unsigned long region_size;
- unsigned long region_start;
-
- /*
- * The host has not specified the hot-add region.
- * Based on the hot-add page range being specified,
- * compute a hot-add region that can cover the pages
- * that need to be hot-added while ensuring the alignment
- * and size requirements of Linux as it relates to hot-add.
- */
- region_start = pg_start;
- region_size = (pfn_cnt / HA_CHUNK) * HA_CHUNK;
- if (pfn_cnt % HA_CHUNK)
- region_size += HA_CHUNK;
-
- region_start = (pg_start / HA_CHUNK) * HA_CHUNK;
-
- rg_start = region_start;
- rg_sz = region_size;
- }
-
if (do_hot_add)
resp.page_count = process_hot_add(pg_start, pfn_cnt,
rg_start, rg_sz);
@@ -1823,41 +1819,37 @@ static void balloon_up(union dm_msg_info *msg_info)
}
-static void dm_msg_work(struct work_struct *dummy)
-{
- union dm_msg_info *msg_info = &dm_device.dm_wrk.dm_msg;
-
- switch (dm_device.dm_wrk.msg_type) {
- case DM_BALLOON_REQUEST:
- balloon_up(msg_info);
- break;
- case DM_MEM_HOT_ADD_REQUEST:
- hot_add_req(msg_info);
- break;
- case DM_MEM_HOT_REMOVE_REQUEST:
- hot_remove_req(msg_info);
- break;
- default:
- return;
- }
-}
-
-static void balloon_down(struct hv_dynmem_device *dm,
- struct dm_unballoon_request *req)
+static void balloon_down(union dm_msg_info *msg_info)
{
+ struct dm_unballoon_request *req = msg_info->unballoon_state.request;
union dm_mem_page_range *range_array = req->range_array;
+ struct hv_dynmem_device *dm = &dm_device;
+ unsigned int prev_pages_ballooned = dm->num_pages_ballooned;
int range_count = req->range_count;
struct dm_unballoon_response resp;
int i;
- unsigned int prev_pages_ballooned = dm->num_pages_ballooned;
for (i = 0; i < range_count; i++) {
- free_balloon_pages(dm, &range_array[i]);
- complete(&dm_device.config_event);
+ /*
+ * Hyper-V has a bug that send unballoon msg instead
+ * of hot add msg even if there is no balloon msg sent
+ * before. Treat all unballoon msgs as hot add msgs
+ * if hot add capability is enabled.
+ */
+ if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG) && do_hot_add) {
+ dm->host_specified_ha_region = false;
+ dm->num_pages_added +=
+ process_hot_add(range_array[i].finfo.start_page,
+ range_array[i].finfo.page_cnt, 0, 0);
+ } else {
+ free_balloon_pages(dm, &range_array[i]);
+ complete(&dm_device.config_event);
+ }
}
- pr_debug("Freed %u ballooned pages.\n",
- prev_pages_ballooned - dm->num_pages_ballooned);
+ if (!do_hot_add)
+ pr_debug("Freed %u ballooned pages.\n",
+ prev_pages_ballooned - dm->num_pages_ballooned);
if (req->more_pages == 1)
return;
@@ -1875,6 +1867,28 @@ static void balloon_down(struct hv_dynmem_device *dm,
dm->state = DM_INITIALIZED;
}
+static void dm_msg_work(struct work_struct *dummy)
+{
+ union dm_msg_info *msg_info = &dm_device.dm_wrk.dm_msg;
+
+ switch (dm_device.dm_wrk.msg_type) {
+ case DM_BALLOON_REQUEST:
+ balloon_up(msg_info);
+ break;
+ case DM_UNBALLOON_REQUEST:
+ balloon_down(msg_info);
+ break;
+ case DM_MEM_HOT_ADD_REQUEST:
+ hot_add_req(msg_info);
+ break;
+ case DM_MEM_HOT_REMOVE_REQUEST:
+ hot_remove_req(msg_info);
+ break;
+ default:
+ return;
+ }
+}
+
static void balloon_onchannelcallback(void *context);
static int dm_thread_func(void *dm_dev)
@@ -2024,8 +2038,10 @@ static void balloon_onchannelcallback(void *context)
}
dm->state = DM_BALLOON_DOWN;
- balloon_down(dm,
- (struct dm_unballoon_request *)recv_buffer);
+ dm_wrk->msg_type = DM_UNBALLOON_REQUEST;
+ msg_info->unballoon_state.request
+ = (struct dm_unballoon_request *)recv_buffer;
+ schedule_work(&dm_wrk->wrk);
break;
case DM_MEM_HOT_ADD_REQUEST:
--
2.14.5
From: Tianyu Lan <[email protected]>
The mem hot-remove operation and balloon down will be added
or moved into work context. Add a common work to handle
opeations of mem hot-add/remove and balloon up/down.
Signed-off-by: Tianyu Lan <[email protected]>
---
drivers/hv/hv_balloon.c | 86 ++++++++++++++++++++++++++++++-------------------
1 file changed, 52 insertions(+), 34 deletions(-)
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index b155d0052981..bdb6791e6de1 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -447,15 +447,20 @@ struct hv_hotadd_gap {
unsigned long end_pfn;
};
-struct balloon_state {
- __u32 num_pages;
- struct work_struct wrk;
+union dm_msg_info {
+ struct {
+ __u32 num_pages;
+ } balloon_state;
+ struct {
+ union dm_mem_page_range ha_page_range;
+ union dm_mem_page_range ha_region_range;
+ } hot_add;
};
-struct hot_add_wrk {
- union dm_mem_page_range ha_page_range;
- union dm_mem_page_range ha_region_range;
+struct dm_msg_wrk {
+ enum dm_message_type msg_type;
struct work_struct wrk;
+ union dm_msg_info dm_msg;
};
static bool allow_hibernation;
@@ -514,14 +519,9 @@ struct hv_dynmem_device {
unsigned int num_pages_added;
/*
- * State to manage the ballooning (up) operation.
+ * State to manage the ballooning (up) and "hot-add" operation.
*/
- struct balloon_state balloon_wrk;
-
- /*
- * State to execute the "hot-add" operation.
- */
- struct hot_add_wrk ha_wrk;
+ struct dm_msg_wrk dm_wrk;
/*
* This state tracks if the host has specified a hot-add
@@ -982,7 +982,7 @@ static unsigned long process_hot_add(unsigned long pg_start,
#endif
-static void hot_add_req(struct work_struct *dummy)
+static void hot_add_req(union dm_msg_info *msg_info)
{
struct dm_hot_add_response resp;
#ifdef CONFIG_MEMORY_HOTPLUG
@@ -996,11 +996,11 @@ static void hot_add_req(struct work_struct *dummy)
resp.hdr.size = sizeof(struct dm_hot_add_response);
#ifdef CONFIG_MEMORY_HOTPLUG
- pg_start = dm->ha_wrk.ha_page_range.finfo.start_page;
- pfn_cnt = dm->ha_wrk.ha_page_range.finfo.page_cnt;
+ pg_start = msg_info->hot_add.ha_page_range.finfo.start_page;
+ pfn_cnt = msg_info->hot_add.ha_page_range.finfo.page_cnt;
- rg_start = dm->ha_wrk.ha_region_range.finfo.start_page;
- rg_sz = dm->ha_wrk.ha_region_range.finfo.page_cnt;
+ rg_start = msg_info->hot_add.ha_region_range.finfo.start_page;
+ rg_sz = msg_info->hot_add.ha_region_range.finfo.page_cnt;
if ((rg_start == 0) && (!dm->host_specified_ha_region)) {
unsigned long region_size;
@@ -1261,9 +1261,9 @@ static unsigned int alloc_balloon_pages(struct hv_dynmem_device *dm,
return num_pages;
}
-static void balloon_up(struct work_struct *dummy)
+static void balloon_up(union dm_msg_info *msg_info)
{
- unsigned int num_pages = dm_device.balloon_wrk.num_pages;
+ unsigned int num_pages = msg_info->balloon_state.num_pages;
unsigned int num_ballooned = 0;
struct dm_balloon_response *bl_resp;
int alloc_unit;
@@ -1313,7 +1313,7 @@ static void balloon_up(struct work_struct *dummy)
if (num_ballooned == 0 || num_ballooned == num_pages) {
pr_debug("Ballooned %u out of %u requested pages.\n",
- num_pages, dm_device.balloon_wrk.num_pages);
+ num_pages, msg_info->balloon_state.num_pages);
bl_resp->more_pages = 0;
done = true;
@@ -1355,6 +1355,22 @@ static void balloon_up(struct work_struct *dummy)
}
+static void dm_msg_work(struct work_struct *dummy)
+{
+ union dm_msg_info *msg_info = &dm_device.dm_wrk.dm_msg;
+
+ switch (dm_device.dm_wrk.msg_type) {
+ case DM_BALLOON_REQUEST:
+ balloon_up(msg_info);
+ break;
+ case DM_MEM_HOT_ADD_REQUEST:
+ hot_add_req(msg_info);
+ break;
+ default:
+ return;
+ }
+}
+
static void balloon_down(struct hv_dynmem_device *dm,
struct dm_unballoon_request *req)
{
@@ -1490,6 +1506,8 @@ static void balloon_onchannelcallback(void *context)
struct hv_dynmem_device *dm = hv_get_drvdata(dev);
struct dm_balloon *bal_msg;
struct dm_hot_add *ha_msg;
+ struct dm_msg_wrk *dm_wrk = &dm_device.dm_wrk;
+ union dm_msg_info *msg_info = &dm_wrk->dm_msg;
union dm_mem_page_range *ha_pg_range;
union dm_mem_page_range *ha_region;
@@ -1522,8 +1540,9 @@ static void balloon_onchannelcallback(void *context)
pr_warn("Currently ballooning\n");
bal_msg = (struct dm_balloon *)recv_buffer;
dm->state = DM_BALLOON_UP;
- dm_device.balloon_wrk.num_pages = bal_msg->num_pages;
- schedule_work(&dm_device.balloon_wrk.wrk);
+ msg_info->balloon_state.num_pages = bal_msg->num_pages;
+ dm_wrk->msg_type = DM_BALLOON_REQUEST;
+ schedule_work(&dm_wrk->wrk);
break;
case DM_UNBALLOON_REQUEST:
@@ -1549,8 +1568,9 @@ static void balloon_onchannelcallback(void *context)
*/
dm->host_specified_ha_region = false;
ha_pg_range = &ha_msg->range;
- dm->ha_wrk.ha_page_range = *ha_pg_range;
- dm->ha_wrk.ha_region_range.page_range = 0;
+ msg_info->hot_add.ha_page_range = *ha_pg_range;
+ msg_info->hot_add.ha_region_range.page_range
+ = 0;
} else {
/*
* Host is specifying that we first hot-add
@@ -1560,10 +1580,11 @@ static void balloon_onchannelcallback(void *context)
dm->host_specified_ha_region = true;
ha_pg_range = &ha_msg->range;
ha_region = &ha_pg_range[1];
- dm->ha_wrk.ha_page_range = *ha_pg_range;
- dm->ha_wrk.ha_region_range = *ha_region;
+ msg_info->hot_add.ha_page_range = *ha_pg_range;
+ msg_info->hot_add.ha_region_range = *ha_region;
}
- schedule_work(&dm_device.ha_wrk.wrk);
+ dm_wrk->msg_type = DM_MEM_HOT_ADD_REQUEST;
+ schedule_work(&dm_wrk->wrk);
break;
case DM_INFO_MESSAGE:
@@ -1707,8 +1728,7 @@ static int balloon_probe(struct hv_device *dev,
init_completion(&dm_device.config_event);
INIT_LIST_HEAD(&dm_device.ha_region_list);
spin_lock_init(&dm_device.ha_lock);
- INIT_WORK(&dm_device.balloon_wrk.wrk, balloon_up);
- INIT_WORK(&dm_device.ha_wrk.wrk, hot_add_req);
+ INIT_WORK(&dm_device.dm_wrk.wrk, dm_msg_work);
dm_device.host_specified_ha_region = false;
#ifdef CONFIG_MEMORY_HOTPLUG
@@ -1754,8 +1774,7 @@ static int balloon_remove(struct hv_device *dev)
if (dm->num_pages_ballooned != 0)
pr_warn("Ballooned pages: %d\n", dm->num_pages_ballooned);
- cancel_work_sync(&dm->balloon_wrk.wrk);
- cancel_work_sync(&dm->ha_wrk.wrk);
+ cancel_work_sync(&dm->dm_wrk.wrk);
kthread_stop(dm->thread);
vmbus_close(dev->channel);
@@ -1783,8 +1802,7 @@ static int balloon_suspend(struct hv_device *hv_dev)
tasklet_disable(&hv_dev->channel->callback_event);
- cancel_work_sync(&dm->balloon_wrk.wrk);
- cancel_work_sync(&dm->ha_wrk.wrk);
+ cancel_work_sync(&dm->dm_wrk.wrk);
if (dm->thread) {
kthread_stop(dm->thread);
--
2.14.5
From: Tianyu Lan <[email protected]>
ha_lock is to protect ha_region_list. It is held in
hv_online_page() and handle_pg_range(). handle_pg_range()
is to traverse ha region list, find associated hot-add region
and add memory into system. hv_online_page() is called inside
of add_memory(). Current code is to release ha_lock before
calling add_memory() to avoid holding ha_lock twice in the
hv_online_page().
To avoid releasing ha_lock, add "lock_thread" in the struct hv_
dynmem_device to record thread of traversing ha region list,
check "lock_thread" in the hv_online_page() and try holding
ha_lock when current thread is not "lock_thread".
Signed-off-by: Tianyu Lan <[email protected]>
---
drivers/hv/hv_balloon.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 185146795122..729dc5551302 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -547,6 +547,7 @@ struct hv_dynmem_device {
* regions from ha_region_list.
*/
struct mutex ha_lock;
+ struct task_struct *lock_thread;
/*
* A list of hot-add regions.
@@ -711,7 +712,6 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
for (i = 0; i < (size/HA_CHUNK); i++) {
start_pfn = start + (i * HA_CHUNK);
- mutex_lock(&dm_device.ha_lock);
has->ha_end_pfn += HA_CHUNK;
if (total_pfn > HA_CHUNK) {
@@ -723,7 +723,6 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
}
has->covered_end_pfn += processed_pfn;
- mutex_unlock(&dm_device.ha_lock);
init_completion(&dm_device.ol_waitevent);
dm_device.ha_waiting = !memhp_auto_online;
@@ -744,10 +743,8 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
*/
do_hot_add = false;
}
- mutex_lock(&dm_device.ha_lock);
has->ha_end_pfn -= HA_CHUNK;
has->covered_end_pfn -= processed_pfn;
- mutex_unlock(&dm_device.ha_lock);
break;
}
@@ -769,8 +766,14 @@ static void hv_online_page(struct page *pg, unsigned int order)
{
struct hv_hotadd_state *has;
unsigned long pfn = page_to_pfn(pg);
+ int need_unlock;
+
+ /* If current thread hasn't hold ha_lock, take ha_lock here. */
+ if (dm_device.lock_thread != current) {
+ mutex_lock(&dm_device.ha_lock);
+ need_unlock = 1;
+ }
- mutex_lock(&dm_device.ha_lock);
list_for_each_entry(has, &dm_device.ha_region_list, list) {
/* The page belongs to a different HAS. */
if ((pfn < has->start_pfn) ||
@@ -780,7 +783,8 @@ static void hv_online_page(struct page *pg, unsigned int order)
hv_bring_pgs_online(has, pfn, 1UL << order);
break;
}
- mutex_unlock(&dm_device.ha_lock);
+ if (need_unlock)
+ mutex_unlock(&dm_device.ha_lock);
}
static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
@@ -857,6 +861,7 @@ static unsigned long handle_pg_range(unsigned long pg_start,
pg_start);
mutex_lock(&dm_device.ha_lock);
+ dm_device.lock_thread = current;
list_for_each_entry(has, &dm_device.ha_region_list, list) {
/*
* If the pfn range we are dealing with is not in the current
@@ -909,9 +914,7 @@ static unsigned long handle_pg_range(unsigned long pg_start,
} else {
pfn_cnt = size;
}
- mutex_unlock(&dm_device.ha_lock);
hv_mem_hot_add(has->ha_end_pfn, size, pfn_cnt, has);
- mutex_lock(&dm_device.ha_lock);
}
/*
* If we managed to online any pages that were given to us,
@@ -920,6 +923,7 @@ static unsigned long handle_pg_range(unsigned long pg_start,
res = has->covered_end_pfn - old_covered_state;
break;
}
+ dm_device.lock_thread = NULL;
mutex_unlock(&dm_device.ha_lock);
return res;
--
2.14.5
From: Tianyu Lan <[email protected]>
The mem hot-remove msg may request non-aligned page number.
Hot plug unit is 128MB. Handle the remainder pages via
balloon way that allocate memory, offline pages and return
mem back to host. In order to help to check whether memory
range in mem hot add msg is allocated by balloon driver
or not, set page private to 1 when allocate page. If the
pages associated with mem range in hot add msg is present
and page private is non-zero, clear offline flag and free
mem to return back.
Signed-off-by: Tianyu Lan <[email protected]>
---
drivers/hv/hv_balloon.c | 149 +++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 129 insertions(+), 20 deletions(-)
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 3d8c09fe148a..f76c9bd7fe2f 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -641,6 +641,32 @@ static int hv_send_hot_remove_response(
return ret;
}
+static void free_allocated_pages(__u64 start_frame, int num_pages)
+{
+ struct page *pg;
+ int i;
+
+ for (i = 0; i < num_pages; i++) {
+ pg = pfn_to_page(i + start_frame);
+
+ if (page_private(pfn_to_page(i)))
+ set_page_private(pfn_to_page(i), 0);
+
+ __ClearPageOffline(pg);
+ __free_page(pg);
+ dm_device.num_pages_ballooned--;
+ }
+}
+
+static void free_balloon_pages(struct hv_dynmem_device *dm,
+ union dm_mem_page_range *range_array)
+{
+ int num_pages = range_array->finfo.page_cnt;
+ __u64 start_frame = range_array->finfo.start_page;
+
+ free_allocated_pages(start_frame, num_pages);
+}
+
#ifdef CONFIG_MEMORY_HOTPLUG
static inline bool has_pfn_is_backed(struct hv_hotadd_state *has,
unsigned long pfn)
@@ -1017,6 +1043,16 @@ static unsigned long process_hot_add(unsigned long pg_start,
if (pfn_cnt == 0)
return 0;
+ /*
+ * Check whether page is allocated by driver via page private
+ * data due to remainder pages.
+ */
+ if (present_section_nr(pfn_to_section_nr(pg_start))
+ && page_private(pfn_to_page(pg_start))) {
+ free_allocated_pages(pg_start, pfn_cnt);
+ return pfn_cnt;
+ }
+
if (!dm_device.host_specified_ha_region) {
covered = pfn_covered(pg_start, pfn_cnt);
if (covered < 0)
@@ -1194,6 +1230,82 @@ static int hv_hot_remove_from_ha_list(unsigned int nid, unsigned long nr_pages,
return ret;
}
+static int hv_hot_remove_pages(struct dm_hot_remove_response *resp,
+ u64 nr_pages, unsigned long *request_index,
+ bool more_pages)
+{
+ int i, j, alloc_unit = PAGES_IN_2M;
+ struct page *pg;
+ int ret;
+
+ for (i = 0; i < nr_pages; i += alloc_unit) {
+ if (*request_index >= MAX_HOT_REMOVE_ENTRIES) {
+ /* Flush out all hot-remove ranges. */
+ ret = hv_send_hot_remove_response(resp,
+ *request_index, true);
+ if (ret)
+ goto free_pages;
+
+ /*
+ * Continue to allocate memory for hot remove
+ * after resetting send buffer and array index.
+ */
+ memset(resp, 0x00, PAGE_SIZE);
+ *request_index = 0;
+ }
+retry:
+ pg = alloc_pages(GFP_HIGHUSER | __GFP_NORETRY |
+ __GFP_NOMEMALLOC | __GFP_NOWARN,
+ get_order(alloc_unit << PAGE_SHIFT));
+ if (!pg) {
+ if (alloc_unit == 1) {
+ ret = -ENOMEM;
+ goto free_pages;
+ }
+
+ alloc_unit = 1;
+ goto retry;
+ }
+
+ if (alloc_unit != 1)
+ split_page(pg, get_order(alloc_unit << PAGE_SHIFT));
+
+ for (j = 0; j < (1 << get_order(alloc_unit << PAGE_SHIFT));
+ j++) {
+ __SetPageOffline(pg + j);
+
+ /*
+ * Set page's private data to non-zero and use it
+ * to identify whehter the page is allocated by driver
+ * or new hot-add memory in process_hot_add().
+ */
+ set_page_private(pg + j, 1);
+ }
+
+ resp->range_array[*request_index].finfo.start_page
+ = page_to_pfn(pg);
+ resp->range_array[*request_index].finfo.page_cnt
+ = alloc_unit;
+ (*request_index)++;
+
+ dm_device.num_pages_ballooned += alloc_unit;
+ }
+
+ ret = hv_send_hot_remove_response(resp, *request_index, more_pages);
+ if (ret)
+ goto free_pages;
+
+ return 0;
+
+free_pages:
+ for (i = 0; i < *request_index; i++)
+ free_balloon_pages(&dm_device, &resp->range_array[i]);
+
+ /* Response hot remove failure. */
+ hv_send_hot_remove_response(resp, 0, false);
+ return ret;
+}
+
static void hv_mem_hot_remove(unsigned int nid, u64 nr_pages)
{
struct dm_hot_remove_response *resp
@@ -1201,9 +1313,24 @@ static void hv_mem_hot_remove(unsigned int nid, u64 nr_pages)
unsigned long start_pfn = node_start_pfn(nid);
unsigned long end_pfn = node_end_pfn(nid);
unsigned long request_index = 0;
- int remain_pages;
+ unsigned long remainder = nr_pages % HA_CHUNK;
+ int remain_pages, ret;
- /* Todo: Handle request of non-aligned page number later. */
+ /*
+ * If page number isn't aligned with memory hot plug unit,
+ * handle remainder pages via balloon way.
+ */
+ if (remainder) {
+ memset(resp, 0x00, PAGE_SIZE);
+ ret = hv_hot_remove_pages(resp, remainder, &request_index,
+ !!(nr_pages - remainder));
+ if (ret)
+ return;
+
+ nr_pages -= remainder;
+ if (!nr_pages)
+ return;
+ }
/* Search hot-remove memory region from hot add list first.*/
memset(resp, 0x00, PAGE_SIZE);
@@ -1448,24 +1575,6 @@ static void post_status(struct hv_dynmem_device *dm)
}
-static void free_balloon_pages(struct hv_dynmem_device *dm,
- union dm_mem_page_range *range_array)
-{
- int num_pages = range_array->finfo.page_cnt;
- __u64 start_frame = range_array->finfo.start_page;
- struct page *pg;
- int i;
-
- for (i = 0; i < num_pages; i++) {
- pg = pfn_to_page(i + start_frame);
- __ClearPageOffline(pg);
- __free_page(pg);
- dm->num_pages_ballooned--;
- }
-}
-
-
-
static unsigned int alloc_balloon_pages(struct hv_dynmem_device *dm,
unsigned int num_pages,
struct dm_balloon_response *bl_resp,
--
2.14.5
From: Tianyu Lan <[email protected]>
Mem hot remove operation may find memory in the hot add region
and create gaps in ha region list if there is hot-add memory
at that point. The following hot add msg may contain memory range
in these gaps. Handle such request and change gap range after
adding memory.
Signed-off-by: Tianyu Lan <[email protected]>
---
drivers/hv/hv_balloon.c | 108 ++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 105 insertions(+), 3 deletions(-)
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index f76c9bd7fe2f..5aaae62955bf 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -907,10 +907,11 @@ static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
continue;
/*
- * If the current start pfn is not where the covered_end
- * is, create a gap and update covered_end_pfn.
+ * If the current start pfn is greater than covered_end_pfn,
+ * create a gap and update covered_end_pfn. Start pfn may
+ * locate at gap rangs which is created during mem hot remove.
*/
- if (has->covered_end_pfn != start_pfn) {
+ if (has->covered_end_pfn < start_pfn) {
gap = kzalloc(sizeof(struct hv_hotadd_gap), GFP_ATOMIC);
if (!gap) {
ret = -ENOMEM;
@@ -949,6 +950,91 @@ static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
return ret;
}
+static int handle_hot_add_in_gap(unsigned long start, unsigned long pg_cnt,
+ struct hv_hotadd_state *has)
+{
+ struct hv_hotadd_gap *gap, *new_gap, *tmp_gap;
+ unsigned long pfn_cnt = pg_cnt;
+ unsigned long start_pfn = start;
+ unsigned long end_pfn;
+ unsigned long pages;
+ unsigned long pgs_ol;
+ unsigned long block_pages = HA_CHUNK;
+ unsigned long pfn;
+ int nid;
+ int ret;
+
+ list_for_each_entry_safe(gap, tmp_gap, &has->gap_list, list) {
+
+ if ((start_pfn < gap->start_pfn)
+ || (start_pfn >= gap->end_pfn))
+ continue;
+
+ end_pfn = min(gap->end_pfn, start_pfn + pfn_cnt);
+ pgs_ol = end_pfn - start_pfn;
+
+ /*
+ * hv_bring_pgs_online() identifies whether pfn
+ * should be online or not via checking pfn is in
+ * hot add covered range or gap range(Detail see
+ * has_pfn_is_backed()). So adjust gap before bringing
+ * online or add memory.
+ */
+ if (gap->end_pfn - gap->start_pfn == pgs_ol) {
+ list_del(&gap->list);
+ kfree(gap);
+ } else if (gap->start_pfn < start && gap->end_pfn == end_pfn) {
+ gap->end_pfn = start_pfn;
+ } else if (gap->end_pfn > end_pfn
+ && gap->start_pfn == start_pfn) {
+ gap->start_pfn = end_pfn;
+ } else {
+ gap->end_pfn = start_pfn;
+
+ new_gap = kzalloc(sizeof(struct hv_hotadd_gap),
+ GFP_ATOMIC);
+ if (!new_gap) {
+ do_hot_add = false;
+ return -ENOMEM;
+ }
+
+ INIT_LIST_HEAD(&new_gap->list);
+ new_gap->start_pfn = end_pfn;
+ new_gap->end_pfn = gap->end_pfn;
+ list_add_tail(&gap->list, &has->gap_list);
+ }
+
+ /* Bring online or add memmory in gaps. */
+ for (pfn = start_pfn; pfn < end_pfn;
+ pfn = round_up(pfn + 1, block_pages)) {
+ pages = min(round_up(pfn + 1, block_pages),
+ end_pfn) - pfn;
+
+ if (online_section_nr(pfn_to_section_nr(pfn))) {
+ hv_bring_pgs_online(has, pfn, pages);
+ } else {
+ nid = memory_add_physaddr_to_nid(PFN_PHYS(pfn));
+ ret = add_memory(nid, PFN_PHYS(pfn),
+ round_up(pages, block_pages)
+ << PAGE_SHIFT);
+ if (ret) {
+ pr_err("Fail to add memory in gaps(error=%d).\n",
+ ret);
+ do_hot_add = false;
+ return ret;
+ }
+ }
+ }
+
+ start_pfn += pgs_ol;
+ pfn_cnt -= pgs_ol;
+ if (!pfn_cnt)
+ break;
+ }
+
+ return pg_cnt - pfn_cnt;
+}
+
static unsigned long handle_pg_range(unsigned long pg_start,
unsigned long pg_count)
{
@@ -975,6 +1061,22 @@ static unsigned long handle_pg_range(unsigned long pg_start,
old_covered_state = has->covered_end_pfn;
+ /*
+ * If start_pfn is less than cover_end_pfn, the hot-add memory
+ * area is in the gap range.
+ */
+ if (start_pfn < has->covered_end_pfn) {
+ pgs_ol = handle_hot_add_in_gap(start_pfn, pfn_cnt, has);
+
+ pfn_cnt -= pgs_ol;
+ if (!pfn_cnt) {
+ res = pgs_ol;
+ break;
+ }
+
+ start_pfn += pgs_ol;
+ }
+
if (start_pfn < has->ha_end_pfn) {
/*
* This is the case where we are backing pages
--
2.14.5
On Tue 07-01-20 21:09:42, [email protected] wrote:
> From: Tianyu Lan <[email protected]>
>
> Hyper-V balloon driver will use is_mem_section_removable() to
> check whether memory block is removable or not when receive
> memory hot remove msg. Expose it.
I do not think this is a good idea. The check is inherently racy. Why
cannot the balloon driver simply hotremove the region when asked?
--
Michal Hocko
SUSE Labs
On 07.01.20 14:36, Michal Hocko wrote:
> On Tue 07-01-20 21:09:42, [email protected] wrote:
>> From: Tianyu Lan <[email protected]>
>>
>> Hyper-V balloon driver will use is_mem_section_removable() to
>> check whether memory block is removable or not when receive
>> memory hot remove msg. Expose it.
>
> I do not think this is a good idea. The check is inherently racy. Why
> cannot the balloon driver simply hotremove the region when asked?
>
It's not only racy, it also gives no guarantees. False postives and
false negatives are possible.
If you want to avoid having to loop forever trying to offline when
calling offline_and_remove_memory(), you could try to
alloc_contig_range() the memory first and then play the
PG_offline+notifier game like virtio-mem.
I don't remember clearly, but I think pinned pages can make offlining
loop for a long time. And I remember there were other scenarios as well
(including out of memory conditions and similar).
I sent an RFC [1] for powerpc/memtrace that does the same (just error
handling is more complicated as it wants to offline and remove multiple
consecutive memory blocks) - if you want to try to go down that path.
[1] https://lkml.kernel.org/r/[email protected]
--
Thanks,
David / dhildenb
> From: David Hildenbrand <[email protected]>
> Sent: Friday, January 10, 2020 9:42 PM
> To: Michal Hocko <[email protected]>; [email protected]
> Cc: KY Srinivasan <[email protected]>; Haiyang Zhang
> <[email protected]>; Stephen Hemminger <[email protected]>;
> [email protected]; [email protected]; Michael Kelley
> <[email protected]>; Tianyu Lan <[email protected]>; linux-
> [email protected]; [email protected]; [email protected];
> vkuznets <[email protected]>; [email protected]; [email protected];
> [email protected]; Pasha Tatashin <[email protected]>;
> [email protected]
> Subject: [EXTERNAL] Re: [RFC PATCH V2 2/10] mm: expose
> is_mem_section_removable() symbol
>
> On 07.01.20 14:36, Michal Hocko wrote:
> > On Tue 07-01-20 21:09:42, [email protected] wrote:
> >> From: Tianyu Lan <[email protected]>
> >>
> >> Hyper-V balloon driver will use is_mem_section_removable() to check
> >> whether memory block is removable or not when receive memory hot
> >> remove msg. Expose it.
> >
> > I do not think this is a good idea. The check is inherently racy. Why
> > cannot the balloon driver simply hotremove the region when asked?
> >
>
> It's not only racy, it also gives no guarantees. False postives and false negatives
> are possible.
>
> If you want to avoid having to loop forever trying to offline when calling
> offline_and_remove_memory(), you could try to
> alloc_contig_range() the memory first and then play the PG_offline+notifier
> game like virtio-mem.
>
> I don't remember clearly, but I think pinned pages can make offlining loop for a
> long time. And I remember there were other scenarios as well (including out of
> memory conditions and similar).
>
> I sent an RFC [1] for powerpc/memtrace that does the same (just error
> handling is more complicated as it wants to offline and remove multiple
> consecutive memory blocks) - if you want to try to go down that path.
>
Hi David & Michal:
Thanks for your review. Some memory blocks are not suitable for hot-plug.
If not check memory block's removable, offline_pages() will report some failure error
e.g, "failed due to memory holes" and "failure to isolate range". I think the check maybe
added into offline_and_remove_memory()? This may help to not create/expose a new
interface to do such check in module.
On 13.01.20 15:49, Tianyu Lan wrote:
>> From: David Hildenbrand <[email protected]>
>> Sent: Friday, January 10, 2020 9:42 PM
>> To: Michal Hocko <[email protected]>; [email protected]
>> Cc: KY Srinivasan <[email protected]>; Haiyang Zhang
>> <[email protected]>; Stephen Hemminger <[email protected]>;
>> [email protected]; [email protected]; Michael Kelley
>> <[email protected]>; Tianyu Lan <[email protected]>; linux-
>> [email protected]; [email protected]; [email protected];
>> vkuznets <[email protected]>; [email protected]; [email protected];
>> [email protected]; Pasha Tatashin <[email protected]>;
>> [email protected]
>> Subject: [EXTERNAL] Re: [RFC PATCH V2 2/10] mm: expose
>> is_mem_section_removable() symbol
>>
>> On 07.01.20 14:36, Michal Hocko wrote:
>>> On Tue 07-01-20 21:09:42, [email protected] wrote:
>>>> From: Tianyu Lan <[email protected]>
>>>>
>>>> Hyper-V balloon driver will use is_mem_section_removable() to check
>>>> whether memory block is removable or not when receive memory hot
>>>> remove msg. Expose it.
>>>
>>> I do not think this is a good idea. The check is inherently racy. Why
>>> cannot the balloon driver simply hotremove the region when asked?
>>>
>>
>> It's not only racy, it also gives no guarantees. False postives and false negatives
>> are possible.
>>
>> If you want to avoid having to loop forever trying to offline when calling
>> offline_and_remove_memory(), you could try to
>> alloc_contig_range() the memory first and then play the PG_offline+notifier
>> game like virtio-mem.
>>
>> I don't remember clearly, but I think pinned pages can make offlining loop for a
>> long time. And I remember there were other scenarios as well (including out of
>> memory conditions and similar).
>>
>> I sent an RFC [1] for powerpc/memtrace that does the same (just error
>> handling is more complicated as it wants to offline and remove multiple
>> consecutive memory blocks) - if you want to try to go down that path.
>>
> Hi David & Michal:
> Thanks for your review. Some memory blocks are not suitable for hot-plug.
> If not check memory block's removable, offline_pages() will report some failure error
> e.g, "failed due to memory holes" and "failure to isolate range". I think the check maybe
> added into offline_and_remove_memory()? This may help to not create/expose a new
> interface to do such check in module.
So it's all about the logging output. Duplicating these checks feels
very wrong. And you will still get plenty of page dumps (read below), so
that won't help.
We have pr_debug() for these "failure ..." message. that should
therefore not be an issue on production systems, right?
However, you will see dump_page()s quite often, which logs via pr_warn().
Of course, we could add a mechanism to temporarily disable logging
output for these call paths, but it might actually be helpful for
debugging. We might just want to convert everything that is not actually
a warning to pr_debug() - especially in dump_page().
--
Thanks,
David / dhildenb
On Mon 13-01-20 14:49:38, Tianyu Lan wrote:
> Hi David & Michal:
> Thanks for your review. Some memory blocks are not suitable for hot-plug.
> If not check memory block's removable, offline_pages() will report some failure error
> e.g, "failed due to memory holes" and "failure to isolate range". I think the check maybe
> added into offline_and_remove_memory()? This may help to not create/expose a new
> interface to do such check in module.
Why is a log message a problem in the first place. The operation has
failed afterall. Does the driver try to offline an arbitrary memory?
Could you describe your usecase in more details please?
--
Michal Hocko
SUSE Labs
> From: Michal Hocko <[email protected]>
> Sent: Tuesday, January 14, 2020 5:51 PM
> To: Tianyu Lan <[email protected]>
> Cc: David Hildenbrand <[email protected]>; [email protected]; KY
> Srinivasan <[email protected]>; Haiyang Zhang <[email protected]>;
> Stephen Hemminger <[email protected]>; [email protected];
> [email protected]; Michael Kelley <[email protected]>; linux-
> [email protected]; [email protected]; [email protected];
> vkuznets <[email protected]>; [email protected]; [email protected];
> [email protected]; Pasha Tatashin <[email protected]>;
> [email protected]
> Subject: Re: [EXTERNAL] Re: [RFC PATCH V2 2/10] mm: expose
> is_mem_section_removable() symbol
>
> On Mon 13-01-20 14:49:38, Tianyu Lan wrote:
> > Hi David & Michal:
> > Thanks for your review. Some memory blocks are not suitable for hot-
> plug.
> > If not check memory block's removable, offline_pages() will report
> > some failure error e.g, "failed due to memory holes" and "failure to
> > isolate range". I think the check maybe added into
> > offline_and_remove_memory()? This may help to not create/expose a new
> interface to do such check in module.
>
> Why is a log message a problem in the first place. The operation has failed
> afterall. Does the driver try to offline an arbitrary memory?
Yes.
> Could you describe your usecase in more details please?
Hyper-V sends hot-remove request message which just contains requested
page number but not provide detail range. So Hyper-V driver needs to search
suitable memory block in system memory to return back to host if there is no
memory hot-add before. So I used the is_mem_section_removable() do such check.
On Fri 17-01-20 16:35:03, Tianyu Lan wrote:
[...]
> > Could you describe your usecase in more details please?
>
> Hyper-V sends hot-remove request message which just contains requested
> page number but not provide detail range. So Hyper-V driver needs to search
> suitable memory block in system memory to return back to host if there is no
> memory hot-add before. So I used the is_mem_section_removable() do such check.
As David described, you would be much better of by using
alloc_contig_range to find a memory that would be suitable for
hotremoving without any races.
--
Michal Hocko
SUSE Labs
From: Tianyu Lan <[email protected]> Sent: Tuesday, January 7, 2020 5:10 AM
>
> When release mem region, old mem region may be splited to
> two regions. Current allocate new struct resource for high
> end mem region but not move child resources whose ranges are
> in the high end range to new resource. When adjust old mem
> region's range, adjust_resource() detects child region's range
> is out of new range and return error. Move child resources to
> high end resource before adjusting old mem range.
>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> kernel/resource.c | 38 ++++++++++++++++++++++++++++++++++----
> 1 file changed, 34 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 76036a41143b..1c7362825134 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -181,6 +181,38 @@ static struct resource *alloc_resource(gfp_t flags)
> return res;
> }
>
> +static void move_child_to_newresource(struct resource *old,
> + struct resource *new)
> +{
> + struct resource *tmp, **p, **np;
> +
> + if (!old->child)
> + return;
I don't think the above test is needed. This case is handled by the first
three lines of the "for" loop.
> +
> + p = &old->child;
> + np = &new->child;
> +
> + for (;;) {
> + tmp = *p;
> + if (!tmp)
> + break;
> +
> + if (tmp->start >= new->start && tmp->end <= new->end) {
> + tmp->parent = new;
> + *np = tmp;
> + np = &tmp->sibling;
> + *p = tmp->sibling;
> +
> + if (!tmp->sibling)
> + *np = NULL;
I don't think the above two lines are right. They seem tautological. If the ! were
removed it would be clearing the sibling link for the child as it exists under its new
parent, which should be done. But the child that is moved to the new parent always
becomes the last entry in the new parent's child list. So could you just unconditionally
do tmp->sibling = NULL? That link will get fixed up if another child is moved.
Michael
> + continue;
> + }
> +
> + p = &tmp->sibling;
> + }
> +}
> +
> +
> /* Return the conflict entry if you can't request it */
> static struct resource * __request_resource(struct resource *root, struct resource *new)
> {
> @@ -1246,9 +1278,6 @@ EXPORT_SYMBOL(__release_region);
> * Note:
> * - Additional release conditions, such as overlapping region, can be
> * supported after they are confirmed as valid cases.
> - * - When a busy memory resource gets split into two entries, the code
> - * assumes that all children remain in the lower address entry for
> - * simplicity. Enhance this logic when necessary.
> */
> int release_mem_region_adjustable(struct resource *parent,
> resource_size_t start, resource_size_t size)
> @@ -1331,11 +1360,12 @@ int release_mem_region_adjustable(struct resource *parent,
> new_res->sibling = res->sibling;
> new_res->child = NULL;
>
> + move_child_to_newresource(res, new_res);
> + res->sibling = new_res;
> ret = __adjust_resource(res, res->start,
> start - res->start);
> if (ret)
> break;
> - res->sibling = new_res;
> new_res = NULL;
> }
>
> --
> 2.14.5
From: Tianyu Lan <[email protected]> Sent: Tuesday, January 7, 2020 5:10 AM
>
> The mem hot-remove operation and balloon down will be added
> or moved into work context. Add a common work to handle
> opeations of mem hot-add/remove and balloon up/down.
Let me suggest some improved wording for this commit message:
The Hyper-V balloon driver currently has separate work contexts for memory
hot-add operations and for balloon up (i.e., remove memory from the guest)
operations. Hot-remove is being added and must be done in a work context.
And finally, balloon down is currently not done in a work context, but needs
to move to a work context because of a Hyper-V bug that confuses balloon
operations and hot-remove. Rather than add code for two additional work
contexts, consolidate the current hot-add and balloon up operations onto
a single work context. Subsequent patches will add hot-remove and
balloon down to that same work context.
My assumption is that the consolidation into a single work context is
just code simplification. If there's another reason, please describe it.
The code is this patch looks OK to me.
Michael
>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> drivers/hv/hv_balloon.c | 86 ++++++++++++++++++++++++++++++-------------------
> 1 file changed, 52 insertions(+), 34 deletions(-)
>
From: Tianyu Lan <[email protected]> Sent: Tuesday, January 7, 2020 5:10 AM
>
> When release mem region, old mem region may be splited to
> two regions. Current allocate new struct resource for high
> end mem region but not move child resources whose ranges are
> in the high end range to new resource. When adjust old mem
> region's range, adjust_resource() detects child region's range
> is out of new range and return error. Move child resources to
> high end resource before adjusting old mem range.
Let me also suggests some wording improvements to the commit message:
When releasing a mem region, the old mem region may need to be
split into two regions. In this case, the current code allocates the new
region and adjust the original region to specify a smaller range. But child
regions that fall into the range of the new region are not moved to that
new region. Consequently, when running __adjust_resource() on the
original region, it detects that the child region's range is out of the new
range, and returns an error.
Fix this by moving appropriate child resources to the new region before
adjusting the original mem region range.
>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> kernel/resource.c | 38 ++++++++++++++++++++++++++++++++++----
> 1 file changed, 34 insertions(+), 4 deletions(-)
>