2013-04-10 17:29:35

by Toshi Kani

[permalink] [raw]
Subject: [PATCH v3 0/3] Support memory hot-delete to boot memory

Memory hot-delete to a memory range present at boot causes an
error message in __release_region(), such as:

Trying to free nonexistent resource <0000000070000000-0000000077ffffff>

Hot-delete operation still continues since __release_region() is
a void function, but the target memory range is not freed from
iomem_resource as the result. This also leads a failure in a
subsequent hot-add operation to the same memory range since the
address range is still in-use in iomem_resource.

This problem happens because the granularity of memory resource ranges
may be different between boot and hot-delete. During bootup,
iomem_resource is set up from the boot descriptor table, such as EFI
Memory Table and e820. Each resource entry usually covers the whole
contiguous memory range. Hot-delete request, on the other hand, may
target to a particular range of memory resource, and its size can be
much smaller than the whole contiguous memory. Since the existing
release interfaces like __release_region() require a requested region
to be exactly matched to a resource entry, they do not allow a partial
resource to be released.

This patchset introduces release_mem_region_adjustable() for memory
hot-delete operations, which allows releasing a partial memory range
and adjusts remaining resource accordingly. This patchset makes no
changes to the existing interfaces since their restriction is still
valid for I/O resources.

---
v3:
- Added #ifdef CONFIG_MEMORY_HOTPLUG to release_mem_region_adjustable()
as suggested by Andrew Morton. This #ifdef will be changed to
CONFIG_MEMORY_HOTREMOVE after David Rientjes's patch gets accepted.
- Updated comments & change log of release_mem_region_adjustable()
per code reviews from Ram Pai and Andrew Morton.

v2:
- Updated release_mem_region_adjustable() per code reviews from
Yasuaki Ishimatsu, Ram Pai and Gu Zheng.

---
Toshi Kani (3):
resource: Add __adjust_resource() for internal use
resource: Add release_mem_region_adjustable()
mm: Change __remove_pages() to call release_mem_region_adjustable()

---
include/linux/ioport.h | 4 ++
kernel/resource.c | 135 ++++++++++++++++++++++++++++++++++++++++++++-----
mm/memory_hotplug.c | 11 +++-
3 files changed, 135 insertions(+), 15 deletions(-)


2013-04-10 17:29:42

by Toshi Kani

[permalink] [raw]
Subject: [PATCH v3 3/3] mm: Change __remove_pages() to call release_mem_region_adjustable()

Changed __remove_pages() to call release_mem_region_adjustable().
This allows a requested memory range to be released from
the iomem_resource table even if it does not match exactly to
an resource entry but still fits into. The resource entries
initialized at bootup usually cover the whole contiguous
memory ranges and may not necessarily match with the size of
memory hot-delete requests.

If release_mem_region_adjustable() failed, __remove_pages() emits
a warning message and continues to proceed as it was the case
with release_mem_region(). release_mem_region(), which is defined
to __release_region(), emits a warning message and returns no error
since a void function.

Signed-off-by: Toshi Kani <[email protected]>
Reviewed-by : Yasuaki Ishimatsu <[email protected]>
---
mm/memory_hotplug.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 57decb2..c916582 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -705,8 +705,10 @@ EXPORT_SYMBOL_GPL(__add_pages);
int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
unsigned long nr_pages)
{
- unsigned long i, ret = 0;
+ unsigned long i;
int sections_to_remove;
+ resource_size_t start, size;
+ int ret = 0;

/*
* We can only remove entire sections
@@ -714,7 +716,12 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
BUG_ON(phys_start_pfn & ~PAGE_SECTION_MASK);
BUG_ON(nr_pages % PAGES_PER_SECTION);

- release_mem_region(phys_start_pfn << PAGE_SHIFT, nr_pages * PAGE_SIZE);
+ start = phys_start_pfn << PAGE_SHIFT;
+ size = nr_pages * PAGE_SIZE;
+ ret = release_mem_region_adjustable(&iomem_resource, start, size);
+ if (ret)
+ pr_warn("Unable to release resource <%016llx-%016llx> (%d)\n",
+ start, start + size - 1, ret);

sections_to_remove = nr_pages / PAGES_PER_SECTION;
for (i = 0; i < sections_to_remove; i++) {

2013-04-10 17:29:40

by Toshi Kani

[permalink] [raw]
Subject: [PATCH v3 1/3] resource: Add __adjust_resource() for internal use

Added __adjust_resource(), which is called by adjust_resource()
internally after the resource_lock is held. There is no interface
change to adjust_resource(). This change allows other functions
to call __adjust_resource() internally while the resource_lock is
held.

Signed-off-by: Toshi Kani <[email protected]>
Reviewed-by : Yasuaki Ishimatsu <[email protected]>
Acked-by: David Rientjes <[email protected]>
---
kernel/resource.c | 35 ++++++++++++++++++++++-------------
1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 73f35d4..ae246f9 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -706,24 +706,13 @@ void insert_resource_expand_to_fit(struct resource *root, struct resource *new)
write_unlock(&resource_lock);
}

-/**
- * adjust_resource - modify a resource's start and size
- * @res: resource to modify
- * @start: new start value
- * @size: new size
- *
- * Given an existing resource, change its start and size to match the
- * arguments. Returns 0 on success, -EBUSY if it can't fit.
- * Existing children of the resource are assumed to be immutable.
- */
-int adjust_resource(struct resource *res, resource_size_t start, resource_size_t size)
+static int __adjust_resource(struct resource *res, resource_size_t start,
+ resource_size_t size)
{
struct resource *tmp, *parent = res->parent;
resource_size_t end = start + size - 1;
int result = -EBUSY;

- write_lock(&resource_lock);
-
if (!parent)
goto skip;

@@ -751,6 +740,26 @@ skip:
result = 0;

out:
+ return result;
+}
+
+/**
+ * adjust_resource - modify a resource's start and size
+ * @res: resource to modify
+ * @start: new start value
+ * @size: new size
+ *
+ * Given an existing resource, change its start and size to match the
+ * arguments. Returns 0 on success, -EBUSY if it can't fit.
+ * Existing children of the resource are assumed to be immutable.
+ */
+int adjust_resource(struct resource *res, resource_size_t start,
+ resource_size_t size)
+{
+ int result;
+
+ write_lock(&resource_lock);
+ result = __adjust_resource(res, start, size);
write_unlock(&resource_lock);
return result;
}

2013-04-10 17:30:11

by Toshi Kani

[permalink] [raw]
Subject: [PATCH v3 2/3] resource: Add release_mem_region_adjustable()

Added release_mem_region_adjustable(), which releases a requested
region from a currently busy memory resource. This interface
adjusts the matched memory resource accordingly even if the
requested region does not match exactly but still fits into.

This new interface is intended for memory hot-delete. During
bootup, memory resources are inserted from the boot descriptor
table, such as EFI Memory Table and e820. Each memory resource
entry usually covers the whole contigous memory range. Memory
hot-delete request, on the other hand, may target to a particular
range of memory resource, and its size can be much smaller than
the whole contiguous memory. Since the existing release interfaces
like __release_region() require a requested region to be exactly
matched to a resource entry, they do not allow a partial resource
to be released.

This new interface is restrictive (i.e. release under certain
conditions), which is consistent with other release interfaces,
__release_region() and __release_resource(). Additional release
conditions, such as an overlapping region to a resource entry,
can be supported after they are confirmed as valid cases.

There is no change to the existing interfaces since their restriction
is valid for I/O resources.

Signed-off-by: Toshi Kani <[email protected]>
Reviewed-by : Yasuaki Ishimatsu <[email protected]>
---
include/linux/ioport.h | 4 ++
kernel/resource.c | 100 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 104 insertions(+)

diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 85ac9b9b..961d4dc 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -192,6 +192,10 @@ extern struct resource * __request_region(struct resource *,
extern int __check_region(struct resource *, resource_size_t, resource_size_t);
extern void __release_region(struct resource *, resource_size_t,
resource_size_t);
+#ifdef CONFIG_MEMORY_HOTPLUG
+extern int release_mem_region_adjustable(struct resource *, resource_size_t,
+ resource_size_t);
+#endif

static inline int __deprecated check_region(resource_size_t s,
resource_size_t n)
diff --git a/kernel/resource.c b/kernel/resource.c
index ae246f9..08791c8 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -1021,6 +1021,106 @@ void __release_region(struct resource *parent, resource_size_t start,
}
EXPORT_SYMBOL(__release_region);

+#ifdef CONFIG_MEMORY_HOTPLUG
+/**
+ * release_mem_region_adjustable - release a previously reserved memory region
+ * @parent: parent resource descriptor
+ * @start: resource start address
+ * @size: resource region size
+ *
+ * This interface is intended for memory hot-delete. The requested region
+ * is released from a currently busy memory resource. The requested region
+ * must either match exactly or fit into a single busy resource entry. In
+ * the latter case, the remaining resource is adjusted accordingly.
+ * Existing children of the busy memory resource must be immutable in the
+ * request.
+ *
+ * Note:
+ * - Additional release conditions, such as overlapping region, can be
+ * supported after they are confirmed as valid cases.
+ * - When a busy memory resource gets split into two entries, the code
+ * assumes that all children remain in the lower address entry for
+ * simplicity. Enhance this logic when necessary.
+ */
+int release_mem_region_adjustable(struct resource *parent,
+ resource_size_t start, resource_size_t size)
+{
+ struct resource **p;
+ struct resource *res, *new;
+ resource_size_t end;
+ int ret = -EINVAL;
+
+ end = start + size - 1;
+ if ((start < parent->start) || (end > parent->end))
+ return ret;
+
+ p = &parent->child;
+ write_lock(&resource_lock);
+
+ while ((res = *p)) {
+ if (res->start >= end)
+ break;
+
+ /* look for the next resource if it does not fit into */
+ if (res->start > start || res->end < end) {
+ p = &res->sibling;
+ continue;
+ }
+
+ if (!(res->flags & IORESOURCE_MEM))
+ break;
+
+ if (!(res->flags & IORESOURCE_BUSY)) {
+ p = &res->child;
+ continue;
+ }
+
+ /* found the target resource; let's adjust accordingly */
+ if (res->start == start && res->end == end) {
+ /* free the whole entry */
+ *p = res->sibling;
+ kfree(res);
+ ret = 0;
+ } else if (res->start == start && res->end != end) {
+ /* adjust the start */
+ ret = __adjust_resource(res, end + 1,
+ res->end - end);
+ } else if (res->start != start && res->end == end) {
+ /* adjust the end */
+ ret = __adjust_resource(res, res->start,
+ start - res->start);
+ } else {
+ /* split into two entries */
+ new = kzalloc(sizeof(struct resource), GFP_KERNEL);
+ if (!new) {
+ ret = -ENOMEM;
+ break;
+ }
+ new->name = res->name;
+ new->start = end + 1;
+ new->end = res->end;
+ new->flags = res->flags;
+ new->parent = res->parent;
+ new->sibling = res->sibling;
+ new->child = NULL;
+
+ ret = __adjust_resource(res, res->start,
+ start - res->start);
+ if (ret) {
+ kfree(new);
+ break;
+ }
+ res->sibling = new;
+ }
+
+ break;
+ }
+
+ write_unlock(&resource_lock);
+ return ret;
+}
+#endif /* CONFIG_MEMORY_HOTPLUG */
+
/*
* Managed region resource
*/

2013-04-10 21:44:15

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v3 2/3] resource: Add release_mem_region_adjustable()

On Wed, 10 Apr 2013 11:17:00 -0600 Toshi Kani <[email protected]> wrote:

> Added release_mem_region_adjustable(), which releases a requested
> region from a currently busy memory resource. This interface
> adjusts the matched memory resource accordingly even if the
> requested region does not match exactly but still fits into.
>
> This new interface is intended for memory hot-delete. During
> bootup, memory resources are inserted from the boot descriptor
> table, such as EFI Memory Table and e820. Each memory resource
> entry usually covers the whole contigous memory range. Memory
> hot-delete request, on the other hand, may target to a particular
> range of memory resource, and its size can be much smaller than
> the whole contiguous memory. Since the existing release interfaces
> like __release_region() require a requested region to be exactly
> matched to a resource entry, they do not allow a partial resource
> to be released.
>
> This new interface is restrictive (i.e. release under certain
> conditions), which is consistent with other release interfaces,
> __release_region() and __release_resource(). Additional release
> conditions, such as an overlapping region to a resource entry,
> can be supported after they are confirmed as valid cases.
>
> There is no change to the existing interfaces since their restriction
> is valid for I/O resources.
>
> ...
>
> +int release_mem_region_adjustable(struct resource *parent,
> + resource_size_t start, resource_size_t size)
> +{
> + struct resource **p;
> + struct resource *res, *new;
> + resource_size_t end;
> + int ret = -EINVAL;
> +
> + end = start + size - 1;
> + if ((start < parent->start) || (end > parent->end))
> + return ret;
> +
> + p = &parent->child;
> + write_lock(&resource_lock);
> +
> + while ((res = *p)) {
> + if (res->start >= end)
> + break;
> +
> + /* look for the next resource if it does not fit into */
> + if (res->start > start || res->end < end) {
> + p = &res->sibling;
> + continue;
> + }
> +
> + if (!(res->flags & IORESOURCE_MEM))
> + break;
> +
> + if (!(res->flags & IORESOURCE_BUSY)) {
> + p = &res->child;
> + continue;
> + }
> +
> + /* found the target resource; let's adjust accordingly */
> + if (res->start == start && res->end == end) {
> + /* free the whole entry */
> + *p = res->sibling;
> + kfree(res);
> + ret = 0;
> + } else if (res->start == start && res->end != end) {
> + /* adjust the start */
> + ret = __adjust_resource(res, end + 1,
> + res->end - end);
> + } else if (res->start != start && res->end == end) {
> + /* adjust the end */
> + ret = __adjust_resource(res, res->start,
> + start - res->start);
> + } else {
> + /* split into two entries */
> + new = kzalloc(sizeof(struct resource), GFP_KERNEL);

Nope, we can't perform a GFP_KERNEL allocation under write_lock().

Was this code path runtime tested? If no, please try
to find a way to test it. If yes, please see
Documentation/SubmitChecklist section 12 and use that in the future.

I'll switch it to GFP_ATOMIC. Which is horridly lame but the
allocation is small and alternatives are unobvious.

> + if (!new) {
> + ret = -ENOMEM;
> + break;
> + }
> + new->name = res->name;
> + new->start = end + 1;
> + new->end = res->end;
> + new->flags = res->flags;
> + new->parent = res->parent;
> + new->sibling = res->sibling;
> + new->child = NULL;
> +
> + ret = __adjust_resource(res, res->start,
> + start - res->start);
> + if (ret) {
> + kfree(new);
> + break;
> + }
> + res->sibling = new;
> + }
> +
> + break;
> + }
> +
> + write_unlock(&resource_lock);
> + return ret;
> +}
> +#endif /* CONFIG_MEMORY_HOTPLUG */

2013-04-10 22:02:07

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v3 2/3] resource: Add release_mem_region_adjustable()

On Wed, 2013-04-10 at 14:44 -0700, Andrew Morton wrote:
> On Wed, 10 Apr 2013 11:17:00 -0600 Toshi Kani <[email protected]> wrote:
>
> > Added release_mem_region_adjustable(), which releases a requested
> > region from a currently busy memory resource. This interface
> > adjusts the matched memory resource accordingly even if the
> > requested region does not match exactly but still fits into.
> >
> > This new interface is intended for memory hot-delete. During
> > bootup, memory resources are inserted from the boot descriptor
> > table, such as EFI Memory Table and e820. Each memory resource
> > entry usually covers the whole contigous memory range. Memory
> > hot-delete request, on the other hand, may target to a particular
> > range of memory resource, and its size can be much smaller than
> > the whole contiguous memory. Since the existing release interfaces
> > like __release_region() require a requested region to be exactly
> > matched to a resource entry, they do not allow a partial resource
> > to be released.
> >
> > This new interface is restrictive (i.e. release under certain
> > conditions), which is consistent with other release interfaces,
> > __release_region() and __release_resource(). Additional release
> > conditions, such as an overlapping region to a resource entry,
> > can be supported after they are confirmed as valid cases.
> >
> > There is no change to the existing interfaces since their restriction
> > is valid for I/O resources.
> >
> > ...
> >
> > +int release_mem_region_adjustable(struct resource *parent,
> > + resource_size_t start, resource_size_t size)
> > +{
> > + struct resource **p;
> > + struct resource *res, *new;
> > + resource_size_t end;
> > + int ret = -EINVAL;
> > +
> > + end = start + size - 1;
> > + if ((start < parent->start) || (end > parent->end))
> > + return ret;
> > +
> > + p = &parent->child;
> > + write_lock(&resource_lock);
> > +
> > + while ((res = *p)) {
> > + if (res->start >= end)
> > + break;
> > +
> > + /* look for the next resource if it does not fit into */
> > + if (res->start > start || res->end < end) {
> > + p = &res->sibling;
> > + continue;
> > + }
> > +
> > + if (!(res->flags & IORESOURCE_MEM))
> > + break;
> > +
> > + if (!(res->flags & IORESOURCE_BUSY)) {
> > + p = &res->child;
> > + continue;
> > + }
> > +
> > + /* found the target resource; let's adjust accordingly */
> > + if (res->start == start && res->end == end) {
> > + /* free the whole entry */
> > + *p = res->sibling;
> > + kfree(res);
> > + ret = 0;
> > + } else if (res->start == start && res->end != end) {
> > + /* adjust the start */
> > + ret = __adjust_resource(res, end + 1,
> > + res->end - end);
> > + } else if (res->start != start && res->end == end) {
> > + /* adjust the end */
> > + ret = __adjust_resource(res, res->start,
> > + start - res->start);
> > + } else {
> > + /* split into two entries */
> > + new = kzalloc(sizeof(struct resource), GFP_KERNEL);
>
> Nope, we can't perform a GFP_KERNEL allocation under write_lock().
>
> Was this code path runtime tested? If no, please try
> to find a way to test it. If yes, please see
> Documentation/SubmitChecklist section 12 and use that in the future.

Yes, I tested all cases. But I did not test with all the config options
described in the document. I will make sure to test with the options
next time. Thanks a lot for the pointer!

> I'll switch it to GFP_ATOMIC. Which is horridly lame but the
> allocation is small and alternatives are unobvious.

Great! Again, thanks for the update!
-Toshi

2013-04-10 22:08:33

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH v3 2/3] resource: Add release_mem_region_adjustable()

On Wed, 10 Apr 2013, Toshi Kani wrote:

> > I'll switch it to GFP_ATOMIC. Which is horridly lame but the
> > allocation is small and alternatives are unobvious.
>
> Great! Again, thanks for the update!

release_mem_region_adjustable() allocates at most one struct resource, so
why not do kmalloc(sizeof(struct resource), GFP_KERNEL) before taking
resource_lock and then testing whether it's NULL or not when splitting?
It unnecessarily allocates memory when there's no split, but
__remove_pages() shouldn't be a hotpath.

2013-04-10 22:13:43

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH v3 3/3] mm: Change __remove_pages() to call release_mem_region_adjustable()

On Wed, 10 Apr 2013, Toshi Kani wrote:

> Changed __remove_pages() to call release_mem_region_adjustable().
> This allows a requested memory range to be released from
> the iomem_resource table even if it does not match exactly to
> an resource entry but still fits into. The resource entries
> initialized at bootup usually cover the whole contiguous
> memory ranges and may not necessarily match with the size of
> memory hot-delete requests.
>
> If release_mem_region_adjustable() failed, __remove_pages() emits
> a warning message and continues to proceed as it was the case
> with release_mem_region(). release_mem_region(), which is defined
> to __release_region(), emits a warning message and returns no error
> since a void function.
>
> Signed-off-by: Toshi Kani <[email protected]>
> Reviewed-by : Yasuaki Ishimatsu <[email protected]>

Acked-by: David Rientjes <[email protected]>

2013-04-10 22:24:08

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v3 2/3] resource: Add release_mem_region_adjustable()

On Wed, 10 Apr 2013 15:08:29 -0700 (PDT) David Rientjes <[email protected]> wrote:

> On Wed, 10 Apr 2013, Toshi Kani wrote:
>
> > > I'll switch it to GFP_ATOMIC. Which is horridly lame but the
> > > allocation is small and alternatives are unobvious.
> >
> > Great! Again, thanks for the update!
>
> release_mem_region_adjustable() allocates at most one struct resource, so
> why not do kmalloc(sizeof(struct resource), GFP_KERNEL) before taking
> resource_lock and then testing whether it's NULL or not when splitting?
> It unnecessarily allocates memory when there's no split, but
> __remove_pages() shouldn't be a hotpath.

yup.

--- a/kernel/resource.c~resource-add-release_mem_region_adjustable-fix-fix
+++ a/kernel/resource.c
@@ -1046,7 +1046,8 @@ int release_mem_region_adjustable(struct
resource_size_t start, resource_size_t size)
{
struct resource **p;
- struct resource *res, *new;
+ struct resource *res;
+ struct resource *new_res;
resource_size_t end;
int ret = -EINVAL;

@@ -1054,6 +1055,9 @@ int release_mem_region_adjustable(struct
if ((start < parent->start) || (end > parent->end))
return ret;

+ /* The kzalloc() result gets checked later */
+ new_res = kzalloc(sizeof(struct resource), GFP_KERNEL);
+
p = &parent->child;
write_lock(&resource_lock);

@@ -1091,32 +1095,33 @@ int release_mem_region_adjustable(struct
start - res->start);
} else {
/* split into two entries */
- new = kzalloc(sizeof(struct resource), GFP_ATOMIC);
- if (!new) {
+ if (!new_res) {
ret = -ENOMEM;
break;
}
- new->name = res->name;
- new->start = end + 1;
- new->end = res->end;
- new->flags = res->flags;
- new->parent = res->parent;
- new->sibling = res->sibling;
- new->child = NULL;
+ new_res->name = res->name;
+ new_res->start = end + 1;
+ new_res->end = res->end;
+ new_res->flags = res->flags;
+ new_res->parent = res->parent;
+ new_res->sibling = res->sibling;
+ new_res->child = NULL;

ret = __adjust_resource(res, res->start,
start - res->start);
if (ret) {
- kfree(new);
+ kfree(new_res);
break;
}
- res->sibling = new;
+ res->sibling = new_res;
+ new_res = NULL;
}

break;
}

write_unlock(&resource_lock);
+ kfree(new_res);
return ret;
}
#endif /* CONFIG_MEMORY_HOTPLUG */
_

2013-04-11 16:42:27

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v3 2/3] resource: Add release_mem_region_adjustable()

On Wed, 2013-04-10 at 15:24 -0700, Andrew Morton wrote:
> On Wed, 10 Apr 2013 15:08:29 -0700 (PDT) David Rientjes <[email protected]> wrote:
>
> > On Wed, 10 Apr 2013, Toshi Kani wrote:
> >
> > > > I'll switch it to GFP_ATOMIC. Which is horridly lame but the
> > > > allocation is small and alternatives are unobvious.
> > >
> > > Great! Again, thanks for the update!
> >
> > release_mem_region_adjustable() allocates at most one struct resource, so
> > why not do kmalloc(sizeof(struct resource), GFP_KERNEL) before taking
> > resource_lock and then testing whether it's NULL or not when splitting?
> > It unnecessarily allocates memory when there's no split, but
> > __remove_pages() shouldn't be a hotpath.
>
> yup.
>
> --- a/kernel/resource.c~resource-add-release_mem_region_adjustable-fix-fix
> +++ a/kernel/resource.c
> @@ -1046,7 +1046,8 @@ int release_mem_region_adjustable(struct
> resource_size_t start, resource_size_t size)
> {
> struct resource **p;
> - struct resource *res, *new;
> + struct resource *res;
> + struct resource *new_res;
> resource_size_t end;
> int ret = -EINVAL;
>
> @@ -1054,6 +1055,9 @@ int release_mem_region_adjustable(struct
> if ((start < parent->start) || (end > parent->end))
> return ret;
>
> + /* The kzalloc() result gets checked later */
> + new_res = kzalloc(sizeof(struct resource), GFP_KERNEL);
> +
> p = &parent->child;
> write_lock(&resource_lock);
>
> @@ -1091,32 +1095,33 @@ int release_mem_region_adjustable(struct
> start - res->start);
> } else {
> /* split into two entries */
> - new = kzalloc(sizeof(struct resource), GFP_ATOMIC);
> - if (!new) {
> + if (!new_res) {
> ret = -ENOMEM;
> break;
> }
> - new->name = res->name;
> - new->start = end + 1;
> - new->end = res->end;
> - new->flags = res->flags;
> - new->parent = res->parent;
> - new->sibling = res->sibling;
> - new->child = NULL;
> + new_res->name = res->name;
> + new_res->start = end + 1;
> + new_res->end = res->end;
> + new_res->flags = res->flags;
> + new_res->parent = res->parent;
> + new_res->sibling = res->sibling;
> + new_res->child = NULL;
>
> ret = __adjust_resource(res, res->start,
> start - res->start);
> if (ret) {
> - kfree(new);
> + kfree(new_res);
> break;
> }

The kfree() in the if-statement above is not necessary since kfree() is
called before the return at the end. That is, the if-statement needs to
be:
if (ret)
break;

With this change, I confirmed that all my test cases passed (with all
the config debug options this time :). With the change:

Reviewed-by: Toshi Kani <[email protected]>
Tested-by: Toshi Kani <[email protected]>

Thanks!
-Toshi


> - res->sibling = new;
> + res->sibling = new_res;
> + new_res = NULL;
> }
>
> break;
> }
>
> write_unlock(&resource_lock);
> + kfree(new_res);
> return ret;
> }
> #endif /* CONFIG_MEMORY_HOTPLUG */
> _
>

2013-04-11 20:42:28

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v3 3/3] mm: Change __remove_pages() to call release_mem_region_adjustable()

On Wed, 2013-04-10 at 15:13 -0700, David Rientjes wrote:
> On Wed, 10 Apr 2013, Toshi Kani wrote:
>
> > Changed __remove_pages() to call release_mem_region_adjustable().
> > This allows a requested memory range to be released from
> > the iomem_resource table even if it does not match exactly to
> > an resource entry but still fits into. The resource entries
> > initialized at bootup usually cover the whole contiguous
> > memory ranges and may not necessarily match with the size of
> > memory hot-delete requests.
> >
> > If release_mem_region_adjustable() failed, __remove_pages() emits
> > a warning message and continues to proceed as it was the case
> > with release_mem_region(). release_mem_region(), which is defined
> > to __release_region(), emits a warning message and returns no error
> > since a void function.
> >
> > Signed-off-by: Toshi Kani <[email protected]>
> > Reviewed-by : Yasuaki Ishimatsu <[email protected]>
>
> Acked-by: David Rientjes <[email protected]>

Thanks David!
-Toshi

2013-04-24 08:42:53

by Ram Pai

[permalink] [raw]
Subject: Re: [PATCH v3 2/3] resource: Add release_mem_region_adjustable()

On Thu, Apr 11, 2013 at 10:30:02AM -0600, Toshi Kani wrote:
> On Wed, 2013-04-10 at 15:24 -0700, Andrew Morton wrote:
> > On Wed, 10 Apr 2013 15:08:29 -0700 (PDT) David Rientjes <[email protected]> wrote:
> >
> > > On Wed, 10 Apr 2013, Toshi Kani wrote:
> > >
> > > > > I'll switch it to GFP_ATOMIC. Which is horridly lame but the
> > > > > allocation is small and alternatives are unobvious.
> > > >
> > > > Great! Again, thanks for the update!
> > >
> > > release_mem_region_adjustable() allocates at most one struct resource, so
> > > why not do kmalloc(sizeof(struct resource), GFP_KERNEL) before taking
> > > resource_lock and then testing whether it's NULL or not when splitting?
> > > It unnecessarily allocates memory when there's no split, but
> > > __remove_pages() shouldn't be a hotpath.
> >
> > yup.
> >
> > --- a/kernel/resource.c~resource-add-release_mem_region_adjustable-fix-fix
> > +++ a/kernel/resource.c
> > @@ -1046,7 +1046,8 @@ int release_mem_region_adjustable(struct
> > resource_size_t start, resource_size_t size)
> > {
> > struct resource **p;
> > - struct resource *res, *new;
> > + struct resource *res;
> > + struct resource *new_res;
> > resource_size_t end;
> > int ret = -EINVAL;
> >
> > @@ -1054,6 +1055,9 @@ int release_mem_region_adjustable(struct
> > if ((start < parent->start) || (end > parent->end))
> > return ret;
> >
> > + /* The kzalloc() result gets checked later */
> > + new_res = kzalloc(sizeof(struct resource), GFP_KERNEL);
> > +
> > p = &parent->child;
> > write_lock(&resource_lock);
> >
> > @@ -1091,32 +1095,33 @@ int release_mem_region_adjustable(struct
> > start - res->start);
> > } else {
> > /* split into two entries */
> > - new = kzalloc(sizeof(struct resource), GFP_ATOMIC);
> > - if (!new) {
> > + if (!new_res) {
> > ret = -ENOMEM;
> > break;
> > }
> > - new->name = res->name;
> > - new->start = end + 1;
> > - new->end = res->end;
> > - new->flags = res->flags;
> > - new->parent = res->parent;
> > - new->sibling = res->sibling;
> > - new->child = NULL;
> > + new_res->name = res->name;
> > + new_res->start = end + 1;
> > + new_res->end = res->end;
> > + new_res->flags = res->flags;
> > + new_res->parent = res->parent;
> > + new_res->sibling = res->sibling;
> > + new_res->child = NULL;
> >
> > ret = __adjust_resource(res, res->start,
> > start - res->start);
> > if (ret) {
> > - kfree(new);
> > + kfree(new_res);
> > break;
> > }
>
> The kfree() in the if-statement above is not necessary since kfree() is
> called before the return at the end. That is, the if-statement needs to
> be:
> if (ret)
> break;
>
> With this change, I confirmed that all my test cases passed (with all
> the config debug options this time :). With the change:
>
> Reviewed-by: Toshi Kani <[email protected]>

I am not confortable witht the assumption, that when a split takes
place, the children are assumed to be in the lower entry. Probably a
warning to that effect, would help quickly
nail down the problem, if such a case does encounter ?

Otherwise this looks fine. Sorry for the delayed reply. Was out.

Reviewed-by: Ram Pai <[email protected]>

2013-04-24 14:56:19

by Toshi Kani

[permalink] [raw]
Subject: Re: [PATCH v3 2/3] resource: Add release_mem_region_adjustable()

On Wed, 2013-04-24 at 16:42 +0800, Ram Pai wrote:
> On Thu, Apr 11, 2013 at 10:30:02AM -0600, Toshi Kani wrote:
> > On Wed, 2013-04-10 at 15:24 -0700, Andrew Morton wrote:
> > > On Wed, 10 Apr 2013 15:08:29 -0700 (PDT) David Rientjes <[email protected]> wrote:
> > >
> > > > On Wed, 10 Apr 2013, Toshi Kani wrote:
> > > >
> > > > > > I'll switch it to GFP_ATOMIC. Which is horridly lame but the
> > > > > > allocation is small and alternatives are unobvious.
> > > > >
> > > > > Great! Again, thanks for the update!
> > > >
> > > > release_mem_region_adjustable() allocates at most one struct resource, so
> > > > why not do kmalloc(sizeof(struct resource), GFP_KERNEL) before taking
> > > > resource_lock and then testing whether it's NULL or not when splitting?
> > > > It unnecessarily allocates memory when there's no split, but
> > > > __remove_pages() shouldn't be a hotpath.
> > >
> > > yup.
> > >
> > > --- a/kernel/resource.c~resource-add-release_mem_region_adjustable-fix-fix
> > > +++ a/kernel/resource.c
> > > @@ -1046,7 +1046,8 @@ int release_mem_region_adjustable(struct
> > > resource_size_t start, resource_size_t size)
> > > {
> > > struct resource **p;
> > > - struct resource *res, *new;
> > > + struct resource *res;
> > > + struct resource *new_res;
> > > resource_size_t end;
> > > int ret = -EINVAL;
> > >
> > > @@ -1054,6 +1055,9 @@ int release_mem_region_adjustable(struct
> > > if ((start < parent->start) || (end > parent->end))
> > > return ret;
> > >
> > > + /* The kzalloc() result gets checked later */
> > > + new_res = kzalloc(sizeof(struct resource), GFP_KERNEL);
> > > +
> > > p = &parent->child;
> > > write_lock(&resource_lock);
> > >
> > > @@ -1091,32 +1095,33 @@ int release_mem_region_adjustable(struct
> > > start - res->start);
> > > } else {
> > > /* split into two entries */
> > > - new = kzalloc(sizeof(struct resource), GFP_ATOMIC);
> > > - if (!new) {
> > > + if (!new_res) {
> > > ret = -ENOMEM;
> > > break;
> > > }
> > > - new->name = res->name;
> > > - new->start = end + 1;
> > > - new->end = res->end;
> > > - new->flags = res->flags;
> > > - new->parent = res->parent;
> > > - new->sibling = res->sibling;
> > > - new->child = NULL;
> > > + new_res->name = res->name;
> > > + new_res->start = end + 1;
> > > + new_res->end = res->end;
> > > + new_res->flags = res->flags;
> > > + new_res->parent = res->parent;
> > > + new_res->sibling = res->sibling;
> > > + new_res->child = NULL;
> > >
> > > ret = __adjust_resource(res, res->start,
> > > start - res->start);
> > > if (ret) {
> > > - kfree(new);
> > > + kfree(new_res);
> > > break;
> > > }
> >
> > The kfree() in the if-statement above is not necessary since kfree() is
> > called before the return at the end. That is, the if-statement needs to
> > be:
> > if (ret)
> > break;
> >
> > With this change, I confirmed that all my test cases passed (with all
> > the config debug options this time :). With the change:
> >
> > Reviewed-by: Toshi Kani <[email protected]>
>
> I am not confortable witht the assumption, that when a split takes
> place, the children are assumed to be in the lower entry. Probably a
> warning to that effect, would help quickly
> nail down the problem, if such a case does encounter ?

Yes, __adjust_resource() fails with -EBUSY when such condition happens.
Hence, release_mem_region_adjustable() returns with -EBUSY, and
__remove_pages() emits a warning message per patch 3/3. So, it can be
quickly nailed down as this restriction is documented in the comment as
well.

At this point, the children are only used for Kernel code/data/bss as
follows. Hot-removable memory ranges are located at higher ranges than
them. So, I decided to simplify the implementation for this initial
version. We can always enhance it when needed.

# cat /proc/iomem
:
00100000-defa57ff : System RAM
01000000-0162f8d1 : Kernel code
0162f8d2-01ce52bf : Kernel data
01df1000-01fa5fff : Kernel bss
:
100000000-31fffffff : System RAM


> Otherwise this looks fine. Sorry for the delayed reply. Was out.
>
> Reviewed-by: Ram Pai <[email protected]>

No problem. Thanks for reviewing!
-Toshi