2008-02-11 17:20:36

by Badari Pulavarty

[permalink] [raw]
Subject: [-mm PATCH] register_memory/unregister_memory clean ups

Hi Andrew,

While testing hotplug memory remove against -mm, I noticed
that unregister_memory() is not cleaning up /sysfs entries
correctly. It also de-references structures after destroying
them (luckily in the code which never gets used). So, I cleaned
up the code and fixed the extra reference issue.

Could you please include it in -mm ?

Thanks,
Badari

register_memory()/unregister_memory() never gets called with
"root". unregister_memory() is accessing kobject_name of
the object just freed up. Since no one uses the code,
lets take the code out. And also, make register_memory() static.

Another bug fix - before calling unregister_memory()
remove_memory_block() gets a ref on kobject. unregister_memory()
need to drop that ref before calling sysdev_unregister().

Signed-off-by: Badari Pulavarty <[email protected]>
---
drivers/base/memory.c | 22 +++++++---------------
1 file changed, 7 insertions(+), 15 deletions(-)

Index: linux-2.6.24/drivers/base/memory.c
===================================================================
--- linux-2.6.24.orig/drivers/base/memory.c 2008-02-07 16:59:52.000000000 -0800
+++ linux-2.6.24/drivers/base/memory.c 2008-02-08 15:54:45.000000000 -0800
@@ -62,8 +62,8 @@ void unregister_memory_notifier(struct n
/*
* register_memory - Setup a sysfs device for a memory block
*/
-int register_memory(struct memory_block *memory, struct mem_section *section,
- struct node *root)
+static
+int register_memory(struct memory_block *memory, struct mem_section *section)
{
int error;

@@ -71,26 +71,18 @@ int register_memory(struct memory_block
memory->sysdev.id = __section_nr(section);

error = sysdev_register(&memory->sysdev);
-
- if (root && !error)
- error = sysfs_create_link(&root->sysdev.kobj,
- &memory->sysdev.kobj,
- kobject_name(&memory->sysdev.kobj));
-
return error;
}

static void
-unregister_memory(struct memory_block *memory, struct mem_section *section,
- struct node *root)
+unregister_memory(struct memory_block *memory, struct mem_section *section)
{
BUG_ON(memory->sysdev.cls != &memory_sysdev_class);
BUG_ON(memory->sysdev.id != __section_nr(section));

+ /* drop the ref. we got in remove_memory_block() */
+ kobject_put(&memory->sysdev.kobj);
sysdev_unregister(&memory->sysdev);
- if (root)
- sysfs_remove_link(&root->sysdev.kobj,
- kobject_name(&memory->sysdev.kobj));
}

/*
@@ -361,7 +353,7 @@ static int add_memory_block(unsigned lon
mutex_init(&mem->state_mutex);
mem->phys_device = phys_device;

- ret = register_memory(mem, section, NULL);
+ ret = register_memory(mem, section);
if (!ret)
ret = mem_create_simple_file(mem, phys_index);
if (!ret)
@@ -415,7 +407,7 @@ int remove_memory_block(unsigned long no
mem_remove_simple_file(mem, state);
mem_remove_simple_file(mem, phys_device);
mem_remove_simple_file(mem, removable);
- unregister_memory(mem, section, NULL);
+ unregister_memory(mem, section);

return 0;
}


2008-02-11 17:56:32

by Greg KH

[permalink] [raw]
Subject: Re: [-mm PATCH] register_memory/unregister_memory clean ups

On Mon, Feb 11, 2008 at 09:23:18AM -0800, Badari Pulavarty wrote:
> Hi Andrew,
>
> While testing hotplug memory remove against -mm, I noticed
> that unregister_memory() is not cleaning up /sysfs entries
> correctly. It also de-references structures after destroying
> them (luckily in the code which never gets used). So, I cleaned
> up the code and fixed the extra reference issue.
>
> Could you please include it in -mm ?

Want me to add this to my tree and send it in my next update for the
driver core to Linus?

I'll be glad to do that.

thanks,

greg k-h

2008-02-11 18:02:59

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [-mm PATCH] register_memory/unregister_memory clean ups

On Mon, 2008-02-11 at 09:54 -0800, Greg KH wrote:
> On Mon, Feb 11, 2008 at 09:23:18AM -0800, Badari Pulavarty wrote:
> > Hi Andrew,
> >
> > While testing hotplug memory remove against -mm, I noticed
> > that unregister_memory() is not cleaning up /sysfs entries
> > correctly. It also de-references structures after destroying
> > them (luckily in the code which never gets used). So, I cleaned
> > up the code and fixed the extra reference issue.
> >
> > Could you please include it in -mm ?
>
> Want me to add this to my tree and send it in my next update for the
> driver core to Linus?
>
> I'll be glad to do that.
>
> thanks,
>
> greg k-h

Please do. Only reason I wanted to push through -mm is, I didn't
test this with mainline (since I have patches in -mm for hotplug
memory remove).

Thanks,
Badari

2008-02-11 20:15:32

by Andrew Morton

[permalink] [raw]
Subject: Re: [-mm PATCH] register_memory/unregister_memory clean ups

On Mon, 11 Feb 2008 09:23:18 -0800
Badari Pulavarty <[email protected]> wrote:

> Hi Andrew,
>
> While testing hotplug memory remove against -mm, I noticed
> that unregister_memory() is not cleaning up /sysfs entries
> correctly. It also de-references structures after destroying
> them (luckily in the code which never gets used). So, I cleaned
> up the code and fixed the extra reference issue.
>
> Could you please include it in -mm ?
>
> Thanks,
> Badari
>
> register_memory()/unregister_memory() never gets called with
> "root". unregister_memory() is accessing kobject_name of
> the object just freed up. Since no one uses the code,
> lets take the code out. And also, make register_memory() static.
>
> Another bug fix - before calling unregister_memory()
> remove_memory_block() gets a ref on kobject. unregister_memory()
> need to drop that ref before calling sysdev_unregister().
>

I'd say this:

> Subject: [-mm PATCH] register_memory/unregister_memory clean ups

is rather tame. These are more than cleanups! These sound like
machine-crashing bugs. Do they crash machines? How come nobody noticed
it?

All very strange...

2008-02-11 20:40:20

by Greg KH

[permalink] [raw]
Subject: Re: [-mm PATCH] register_memory/unregister_memory clean ups

On Mon, Feb 11, 2008 at 11:48:18AM -0800, Andrew Morton wrote:
> On Mon, 11 Feb 2008 09:23:18 -0800
> Badari Pulavarty <[email protected]> wrote:
>
> > Hi Andrew,
> >
> > While testing hotplug memory remove against -mm, I noticed
> > that unregister_memory() is not cleaning up /sysfs entries
> > correctly. It also de-references structures after destroying
> > them (luckily in the code which never gets used). So, I cleaned
> > up the code and fixed the extra reference issue.
> >
> > Could you please include it in -mm ?
> >
> > Thanks,
> > Badari
> >
> > register_memory()/unregister_memory() never gets called with
> > "root". unregister_memory() is accessing kobject_name of
> > the object just freed up. Since no one uses the code,
> > lets take the code out. And also, make register_memory() static.
> >
> > Another bug fix - before calling unregister_memory()
> > remove_memory_block() gets a ref on kobject. unregister_memory()
> > need to drop that ref before calling sysdev_unregister().
> >
>
> I'd say this:
>
> > Subject: [-mm PATCH] register_memory/unregister_memory clean ups
>
> is rather tame. These are more than cleanups! These sound like
> machine-crashing bugs. Do they crash machines? How come nobody noticed
> it?
>
> All very strange...

No one has ever run the 'remove memory' codepath before, that's why they
were never seen before :)

thanks,

greg k-h

2008-02-11 21:29:51

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [-mm PATCH] register_memory/unregister_memory clean ups

On Mon, 2008-02-11 at 11:48 -0800, Andrew Morton wrote:
> On Mon, 11 Feb 2008 09:23:18 -0800
> Badari Pulavarty <[email protected]> wrote:
>
> > Hi Andrew,
> >
> > While testing hotplug memory remove against -mm, I noticed
> > that unregister_memory() is not cleaning up /sysfs entries
> > correctly. It also de-references structures after destroying
> > them (luckily in the code which never gets used). So, I cleaned
> > up the code and fixed the extra reference issue.
> >
> > Could you please include it in -mm ?
> >
> > Thanks,
> > Badari
> >
> > register_memory()/unregister_memory() never gets called with
> > "root". unregister_memory() is accessing kobject_name of
> > the object just freed up. Since no one uses the code,
> > lets take the code out. And also, make register_memory() static.
> >
> > Another bug fix - before calling unregister_memory()
> > remove_memory_block() gets a ref on kobject. unregister_memory()
> > need to drop that ref before calling sysdev_unregister().
> >
>
> I'd say this:
>
> > Subject: [-mm PATCH] register_memory/unregister_memory clean ups
>
> is rather tame. These are more than cleanups! These sound like
> machine-crashing bugs. Do they crash machines? How come nobody noticed
> it?
>

No they don't crash machine - mainly because, they never get called
with "root" argument (where we have the bug). They were never tested
before, since we don't have memory remove work yet. All it does
is, it leave /sysfs directory laying around and causing next
memory add failure.

Thanks,
Badari

2008-02-12 08:12:30

by Yasunori Goto

[permalink] [raw]
Subject: Re: [-mm PATCH] register_memory/unregister_memory clean ups

> On Mon, 2008-02-11 at 11:48 -0800, Andrew Morton wrote:
> > On Mon, 11 Feb 2008 09:23:18 -0800
> > Badari Pulavarty <[email protected]> wrote:
> >
> > > Hi Andrew,
> > >
> > > While testing hotplug memory remove against -mm, I noticed
> > > that unregister_memory() is not cleaning up /sysfs entries
> > > correctly. It also de-references structures after destroying
> > > them (luckily in the code which never gets used). So, I cleaned
> > > up the code and fixed the extra reference issue.
> > >
> > > Could you please include it in -mm ?
> > >
> > > Thanks,
> > > Badari
> > >
> > > register_memory()/unregister_memory() never gets called with
> > > "root". unregister_memory() is accessing kobject_name of
> > > the object just freed up. Since no one uses the code,
> > > lets take the code out. And also, make register_memory() static.
> > >
> > > Another bug fix - before calling unregister_memory()
> > > remove_memory_block() gets a ref on kobject. unregister_memory()
> > > need to drop that ref before calling sysdev_unregister().
> > >
> >
> > I'd say this:
> >
> > > Subject: [-mm PATCH] register_memory/unregister_memory clean ups
> >
> > is rather tame. These are more than cleanups! These sound like
> > machine-crashing bugs. Do they crash machines? How come nobody noticed
> > it?
> >
>
> No they don't crash machine - mainly because, they never get called
> with "root" argument (where we have the bug). They were never tested
> before, since we don't have memory remove work yet. All it does
> is, it leave /sysfs directory laying around and causing next
> memory add failure.

Badari-san.

Which function does call unregister_memory() or unregister_memory_section()?
I can't find its caller in current 2.6.24-mm1.


???????()
|
|nothing calls?
|
+-->unregister_memory_section()
|
|call
|
+---> remove_memory_block()
|
|call
|
+----> unregister_memory()

unregister_memory_section() is only externed in linux/memory.h.

Do you have any another patch to call it?
I think it is necessary for physical memory removing.

If you have not posted it or it is not merged to -mm,
I can understand why this bug remains.
If you posted it, could you point it to me?

Or do I misunderstand something?


Thanks.

--
Yasunori Goto

2008-02-12 17:19:56

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [-mm PATCH] register_memory/unregister_memory clean ups

On Tue, 2008-02-12 at 17:06 +0900, Yasunori Goto wrote:
> > On Mon, 2008-02-11 at 11:48 -0800, Andrew Morton wrote:
> > > On Mon, 11 Feb 2008 09:23:18 -0800
> > > Badari Pulavarty <[email protected]> wrote:
> > >
> > > > Hi Andrew,
> > > >
> > > > While testing hotplug memory remove against -mm, I noticed
> > > > that unregister_memory() is not cleaning up /sysfs entries
> > > > correctly. It also de-references structures after destroying
> > > > them (luckily in the code which never gets used). So, I cleaned
> > > > up the code and fixed the extra reference issue.
> > > >
> > > > Could you please include it in -mm ?
> > > >
> > > > Thanks,
> > > > Badari
> > > >
> > > > register_memory()/unregister_memory() never gets called with
> > > > "root". unregister_memory() is accessing kobject_name of
> > > > the object just freed up. Since no one uses the code,
> > > > lets take the code out. And also, make register_memory() static.
> > > >
> > > > Another bug fix - before calling unregister_memory()
> > > > remove_memory_block() gets a ref on kobject. unregister_memory()
> > > > need to drop that ref before calling sysdev_unregister().
> > > >
> > >
> > > I'd say this:
> > >
> > > > Subject: [-mm PATCH] register_memory/unregister_memory clean ups
> > >
> > > is rather tame. These are more than cleanups! These sound like
> > > machine-crashing bugs. Do they crash machines? How come nobody noticed
> > > it?
> > >
> >
> > No they don't crash machine - mainly because, they never get called
> > with "root" argument (where we have the bug). They were never tested
> > before, since we don't have memory remove work yet. All it does
> > is, it leave /sysfs directory laying around and causing next
> > memory add failure.
>
> Badari-san.
>
> Which function does call unregister_memory() or unregister_memory_section()?
> I can't find its caller in current 2.6.24-mm1.
>
>
> ???????()
> |
> |nothing calls?
> |
> +-->unregister_memory_section()
> |
> |call
> |
> +---> remove_memory_block()
> |
> |call
> |
> +----> unregister_memory()
>
> unregister_memory_section() is only externed in linux/memory.h.
>
> Do you have any another patch to call it?
> I think it is necessary for physical memory removing.
>
> If you have not posted it or it is not merged to -mm,
> I can understand why this bug remains.
> If you posted it, could you point it to me?

Yes. I am trying to complete the hotplug memory remove
support, so that I can use it for supporting it on ppc64
DLPAR environment.

Here is the patch to finish up some of the generic work
left. As you can see, I still need to finish up some work :(
Any help is appreciated :)

Thanks,
Badari

---
include/linux/memory_hotplug.h | 4 ++++
mm/memory_hotplug.c | 33 +++++++++++++++++++++++++++++++++
mm/sparse.c | 40 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 77 insertions(+)

Index: linux-2.6.24/mm/memory_hotplug.c
===================================================================
--- linux-2.6.24.orig/mm/memory_hotplug.c 2008-02-07 17:16:52.000000000 -0800
+++ linux-2.6.24/mm/memory_hotplug.c 2008-02-07 17:17:57.000000000 -0800
@@ -81,6 +81,14 @@ static int __add_zone(struct zone *zone,
return 0;
}

+static void __remove_zone(struct zone *zone, unsigned long phys_start_pfn)
+{
+ /*
+ * TODO - Check to see if the zone is correctly adjusted
+ * Need to mark pages reserved ?
+ */
+}
+
static int __add_section(struct zone *zone, unsigned long phys_start_pfn)
{
int nr_pages = PAGES_PER_SECTION;
@@ -102,6 +110,16 @@ static int __add_section(struct zone *zo
return register_new_memory(__pfn_to_section(phys_start_pfn));
}

+static void __remove_section(struct zone *zone, unsigned long phys_start_pfn)
+{
+ if (!pfn_valid(phys_start_pfn))
+ return;
+
+ unregister_memory_section(__pfn_to_section(phys_start_pfn));
+ __remove_zone(zone, phys_start_pfn);
+ sparse_remove_one_section(zone, phys_start_pfn, PAGES_PER_SECTION);
+}
+
/*
* Reasonably generic function for adding memory. It is
* expected that archs that support memory hotplug will
@@ -135,6 +153,21 @@ int __add_pages(struct zone *zone, unsig
}
EXPORT_SYMBOL_GPL(__add_pages);

+void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
+ unsigned long nr_pages)
+{
+ unsigned long i;
+ int start_sec, end_sec;
+
+ start_sec = pfn_to_section_nr(phys_start_pfn);
+ end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
+
+ for (i = start_sec; i <= end_sec; i++)
+ __remove_section(zone, i << PFN_SECTION_SHIFT);
+
+}
+EXPORT_SYMBOL_GPL(__remove_pages);
+
static void grow_zone_span(struct zone *zone,
unsigned long start_pfn, unsigned long end_pfn)
{
Index: linux-2.6.24/mm/sparse.c
===================================================================
--- linux-2.6.24.orig/mm/sparse.c 2008-02-07 17:16:52.000000000 -0800
+++ linux-2.6.24/mm/sparse.c 2008-02-11 14:12:28.000000000 -0800
@@ -415,4 +415,44 @@ out:
}
return ret;
}
+
+void sparse_remove_one_section(struct zone *zone, unsigned long start_pfn,
+ int nr_pages)
+{
+ unsigned long section_nr = pfn_to_section_nr(start_pfn);
+ struct pglist_data *pgdat = zone->zone_pgdat;
+ struct mem_section *ms;
+ struct page *memmap = NULL;
+ unsigned long *usemap = NULL;
+ unsigned long flags;
+
+ pgdat_resize_lock(pgdat, &flags);
+ ms = __pfn_to_section(start_pfn);
+ if (ms->section_mem_map) {
+ memmap = ms->section_mem_map & SECTION_MAP_MASK;
+ usemap = ms->pageblock_flags;
+ memmap = sparse_decode_mem_map((unsigned long)memmap,
+ section_nr);
+ ms->section_mem_map = 0;
+ ms->pageblock_flags = NULL;
+ }
+ pgdat_resize_unlock(pgdat, &flags);
+
+ /*
+ * Its ugly, but this is the best I can do - HELP !!
+ * We don't know where the allocations for section memmap and usemap
+ * came from. If they are allocated at the boot time, they would come
+ * from bootmem. If they are added through hot-memory-add they could be
+ * from sla or vmalloc. If they are allocated as part of hot-mem-add
+ * free them up properly. If they are allocated at boot, no easy way
+ * to correctly free them :(
+ */
+ if (usemap) {
+ if (PageSlab(virt_to_page(usemap))) {
+ kfree(usemap);
+ if (memmap)
+ __kfree_section_memmap(memmap, nr_pages);
+ }
+ }
+}
#endif
Index: linux-2.6.24/include/linux/memory_hotplug.h
===================================================================
--- linux-2.6.24.orig/include/linux/memory_hotplug.h 2008-02-07 17:16:52.000000000 -0800
+++ linux-2.6.24/include/linux/memory_hotplug.h 2008-02-07 17:17:57.000000000 -0800
@@ -64,6 +64,8 @@ extern int offline_pages(unsigned long,
/* reasonably generic interface to expand the physical pages in a zone */
extern int __add_pages(struct zone *zone, unsigned long start_pfn,
unsigned long nr_pages);
+extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
+ unsigned long nr_pages);

/*
* Walk thorugh all memory which is registered as resource.
@@ -188,5 +190,7 @@ extern int arch_add_memory(int nid, u64
extern int remove_memory(u64 start, u64 size);
extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
int nr_pages);
+extern void sparse_remove_one_section(struct zone *zone,
+ unsigned long start_pfn, int nr_pages);

#endif /* __LINUX_MEMORY_HOTPLUG_H */

2008-02-12 20:59:49

by Dave Hansen

[permalink] [raw]
Subject: Re: [-mm PATCH] register_memory/unregister_memory clean ups

On Tue, 2008-02-12 at 09:22 -0800, Badari Pulavarty wrote:
> +static void __remove_section(struct zone *zone, unsigned long phys_start_pfn)
> +{
> + if (!pfn_valid(phys_start_pfn))
> + return;

I think you need at least a WARN_ON() there.

I'd probably also not use pfn_valid(), personally.

> + unregister_memory_section(__pfn_to_section(phys_start_pfn));
> + __remove_zone(zone, phys_start_pfn);
> + sparse_remove_one_section(zone, phys_start_pfn, PAGES_PER_SECTION);
> +}

Can none of this ever fail?

I also think having a function called __remove_section() that takes a
pfn is a bad idea. How about passing an actual 'struct mem_section *'
into it? One of the reasons I even made that structure was so that you
could hand it around to things and never be confused about pfn vs. paddr
vs. vaddr vs. section_nr. Please use it.

> /*
> * Reasonably generic function for adding memory. It is
> * expected that archs that support memory hotplug will
> @@ -135,6 +153,21 @@ int __add_pages(struct zone *zone, unsig
> }
> EXPORT_SYMBOL_GPL(__add_pages);
>
> +void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
> + unsigned long nr_pages)
> +{
> + unsigned long i;
> + int start_sec, end_sec;
> +
> + start_sec = pfn_to_section_nr(phys_start_pfn);
> + end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
> +
> + for (i = start_sec; i <= end_sec; i++)
> + __remove_section(zone, i << PFN_SECTION_SHIFT);
> +
> +}
> +EXPORT_SYMBOL_GPL(__remove_pages);

I'd like to see some warnings in there if nr_pages or phys_start_pfn are
not section-aligned and some other sanity checks. If someone is trying
to remove non-section-aligned areas, we either have something wrong, or
some other work to do, first keeping track of what section portions are
"removed".

I'd probably do:

void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
unsigned long nr_pages)
{
int i;
int sections_to_remove;

/*
* We can only remove entire sections.
*/
if (phys_start_pfn & ~PAGE_SECTION_MASK)
return ...;
if (nr_pages % PAGES_PER_SECTION)
return ...;
sections_to_remove = nr_pages / PAGES_PER_SECTION;

for (i = 0; i < sections_to_remove; i++) {
unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION;
__remove_section(zone, __pfn_to_section(pfn));
}
}


> static void grow_zone_span(struct zone *zone,
> unsigned long start_pfn, unsigned long end_pfn)
> {
> Index: linux-2.6.24/mm/sparse.c
> ===================================================================
> --- linux-2.6.24.orig/mm/sparse.c 2008-02-07 17:16:52.000000000 -0800
> +++ linux-2.6.24/mm/sparse.c 2008-02-11 14:12:28.000000000 -0800
> @@ -415,4 +415,44 @@ out:
> }
> return ret;
> }
> +
> +void sparse_remove_one_section(struct zone *zone, unsigned long start_pfn,
> + int nr_pages)
> +{
> + unsigned long section_nr = pfn_to_section_nr(start_pfn);
> + struct pglist_data *pgdat = zone->zone_pgdat;
> + struct mem_section *ms;
> + struct page *memmap = NULL;
> + unsigned long *usemap = NULL;
> + unsigned long flags;
> +
> + pgdat_resize_lock(pgdat, &flags);
> + ms = __pfn_to_section(start_pfn);
> + if (ms->section_mem_map) {
> + memmap = ms->section_mem_map & SECTION_MAP_MASK;

You're abusing this memmap variable. Please make another variable
that's unsigned long and has a nice name if you're actually going to
store an 'unsigned long' in it. That cast below should be a big red
flag.

> + usemap = ms->pageblock_flags;
> + memmap = sparse_decode_mem_map((unsigned long)memmap,
> + section_nr);
> + ms->section_mem_map = 0;
> + ms->pageblock_flags = NULL;
> + }
> + pgdat_resize_unlock(pgdat, &flags);

Ugh. Please put this in its own helper. Also, sparse_decode_mem_map()
has absolutely no other users. Please modify it so that you don't have
to do this gunk, like put the '& SECTION_MAP_MASK' in there. You
probably just need:

struct page *sparse_decode_mem_map(unsigned long coded_mem_map, unsigned long pnum)
{
/*
* mask off the extra low bits of information
*/
coded_mem_map &= SECTION_MAP_MASK;
return ((struct page *)coded_mem_map) + section_nr_to_pfn(pnum);
}

Then, you can just do this:

memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);

No casting, no temp variables. *PLEASE* look around at things and feel
free to modify to modify them. Otherwise, it'll just become a mess.
(oh, and get rid of the unused attribute on it).

> +
> + /*
> + * Its ugly, but this is the best I can do - HELP !!
> + * We don't know where the allocations for section memmap and usemap
> + * came from. If they are allocated at the boot time, they would come
> + * from bootmem. If they are added through hot-memory-add they could be
> + * from sla or vmalloc. If they are allocated as part of hot-mem-add
> + * free them up properly. If they are allocated at boot, no easy way
> + * to correctly free them :(
> + */
> + if (usemap) {
> + if (PageSlab(virt_to_page(usemap))) {
> + kfree(usemap);
> + if (memmap)
> + __kfree_section_memmap(memmap, nr_pages);
> + }
> + }
> +}

Do what we did with the memmap and store some of its origination
information in the low bits.

-- Dave

2008-02-12 21:54:23

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [-mm PATCH] register_memory/unregister_memory clean ups

On Tue, 2008-02-12 at 12:59 -0800, Dave Hansen wrote:
> On Tue, 2008-02-12 at 09:22 -0800, Badari Pulavarty wrote:
> > +static void __remove_section(struct zone *zone, unsigned long phys_start_pfn)
> > +{
> > + if (!pfn_valid(phys_start_pfn))
> > + return;
>
> I think you need at least a WARN_ON() there.
>
> I'd probably also not use pfn_valid(), personally.
>
> > + unregister_memory_section(__pfn_to_section(phys_start_pfn));
> > + __remove_zone(zone, phys_start_pfn);
> > + sparse_remove_one_section(zone, phys_start_pfn, PAGES_PER_SECTION);
> > +}
>
> Can none of this ever fail?
>
> I also think having a function called __remove_section() that takes a
> pfn is a bad idea. How about passing an actual 'struct mem_section *'
> into it? One of the reasons I even made that structure was so that you
> could hand it around to things and never be confused about pfn vs. paddr
> vs. vaddr vs. section_nr. Please use it.

Yes. I got similar feedback from Andy. I was closely trying to mimic
__add_pages() for easy review/understanding.

I have an updated version (not fully tested) which takes section_nr as
argument instead of playing with pfns. Please review this one and see if
it matches your taste :)

>
> > /*
> > * Reasonably generic function for adding memory. It is
> > * expected that archs that support memory hotplug will
> > @@ -135,6 +153,21 @@ int __add_pages(struct zone *zone, unsig
> > }
> > EXPORT_SYMBOL_GPL(__add_pages);
> >
> > +void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
> > + unsigned long nr_pages)
> > +{
> > + unsigned long i;
> > + int start_sec, end_sec;
> > +
> > + start_sec = pfn_to_section_nr(phys_start_pfn);
> > + end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
> > +
> > + for (i = start_sec; i <= end_sec; i++)
> > + __remove_section(zone, i << PFN_SECTION_SHIFT);
> > +
> > +}
> > +EXPORT_SYMBOL_GPL(__remove_pages);
>
> I'd like to see some warnings in there if nr_pages or phys_start_pfn are
> not section-aligned and some other sanity checks. If someone is trying
> to remove non-section-aligned areas, we either have something wrong, or
> some other work to do, first keeping track of what section portions are
> "removed".
>

Yes. I did most of this already (thanks for pointing out again).


> > +void sparse_remove_one_section(struct zone *zone, unsigned long start_pfn,
> > + int nr_pages)
..
>
> > + usemap = ms->pageblock_flags;
> > + memmap = sparse_decode_mem_map((unsigned long)memmap,
> > + section_nr);
> > + ms->section_mem_map = 0;
> > + ms->pageblock_flags = NULL;
> > + }
> > + pgdat_resize_unlock(pgdat, &flags);
>
> Ugh. Please put this in its own helper. Also, sparse_decode_mem_map()
> has absolutely no other users. Please modify it so that you don't have
> to do this gunk, like put the '& SECTION_MAP_MASK' in there. You
> probably just need:
>
> struct page *sparse_decode_mem_map(unsigned long coded_mem_map, unsigned long pnum)
> {
> /*
> * mask off the extra low bits of information
> */
> coded_mem_map &= SECTION_MAP_MASK;
> return ((struct page *)coded_mem_map) + section_nr_to_pfn(pnum);
> }
>
> Then, you can just do this:
>
> memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
>
> No casting, no temp variables. *PLEASE* look around at things and feel
> free to modify to modify them. Otherwise, it'll just become a mess.
> (oh, and get rid of the unused attribute on it).

Good suggestion.

>
> > +
> > + /*
> > + * Its ugly, but this is the best I can do - HELP !!
> > + * We don't know where the allocations for section memmap and usemap
> > + * came from. If they are allocated at the boot time, they would come
> > + * from bootmem. If they are added through hot-memory-add they could be
> > + * from sla or vmalloc. If they are allocated as part of hot-mem-add
> > + * free them up properly. If they are allocated at boot, no easy way
> > + * to correctly free them :(
> > + */
> > + if (usemap) {
> > + if (PageSlab(virt_to_page(usemap))) {
> > + kfree(usemap);
> > + if (memmap)
> > + __kfree_section_memmap(memmap, nr_pages);
> > + }
> > + }
> > +}
>
> Do what we did with the memmap and store some of its origination
> information in the low bits.

Hmm. my understand of memmap is limited. Can you help me out here ?
I was trying to use free_bootmem_node() to free up the allocations,
but I need nodeid from which allocation came from :(

Here is the updated (currently testing) patch.

Thanks,
Badari

Generic helper function to remove section mappings and sysfs entries
for the section of the memory we are removing. offline_pages() correctly
adjusted zone and marked the pages reserved.

Issue: If mem_map, usemap allocation could come from different places -
kmalloc, vmalloc, alloc_pages or bootmem. There is no easy way
to find and free up properly. Especially for bootmem, we need to
know which node the allocation came from.

---
include/linux/memory_hotplug.h | 4 +++
mm/memory_hotplug.c | 30 ++++++++++++++++++++++++++++
mm/sparse.c | 43 ++++++++++++++++++++++++++++++++++++++---
3 files changed, 74 insertions(+), 3 deletions(-)

Index: linux-2.6.24/mm/memory_hotplug.c
===================================================================
--- linux-2.6.24.orig/mm/memory_hotplug.c 2008-02-07 17:16:52.000000000 -0800
+++ linux-2.6.24/mm/memory_hotplug.c 2008-02-12 13:35:52.000000000 -0800
@@ -102,6 +102,15 @@ static int __add_section(struct zone *zo
return register_new_memory(__pfn_to_section(phys_start_pfn));
}

+static void __remove_section(struct zone *zone, unsigned long section_nr)
+{
+ if (!valid_section_nr(section_nr))
+ return;
+
+ unregister_memory_section(__nr_to_section(section_nr));
+ sparse_remove_one_section(zone, section_nr);
+}
+
/*
* Reasonably generic function for adding memory. It is
* expected that archs that support memory hotplug will
@@ -135,6 +144,27 @@ int __add_pages(struct zone *zone, unsig
}
EXPORT_SYMBOL_GPL(__add_pages);

+void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
+ unsigned long nr_pages)
+{
+ unsigned long i;
+ int sections_to_remove;
+
+ /*
+ * We can only remove entire sections
+ */
+ BUG_ON(phys_start_pfn & ~PAGE_SECTION_MASK);
+ BUG_ON(nr_pages % PAGES_PER_SECTION);
+
+ sections_to_remove = nr_pages / PAGES_PER_SECTION;
+
+ for (i = 0; i < sections_to_remove; i++) {
+ unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION;
+ __remove_section(zone, pfn_to_section_nr(pfn));
+ }
+}
+EXPORT_SYMBOL_GPL(__remove_pages);
+
static void grow_zone_span(struct zone *zone,
unsigned long start_pfn, unsigned long end_pfn)
{
Index: linux-2.6.24/mm/sparse.c
===================================================================
--- linux-2.6.24.orig/mm/sparse.c 2008-02-07 17:16:52.000000000 -0800
+++ linux-2.6.24/mm/sparse.c 2008-02-12 13:40:58.000000000 -0800
@@ -198,12 +198,13 @@ static unsigned long sparse_encode_mem_m
}

/*
- * We need this if we ever free the mem_maps. While not implemented yet,
- * this function is included for parity with its sibling.
+ * Decode mem_map from the coded memmap
*/
-static __attribute((unused))
+static
struct page *sparse_decode_mem_map(unsigned long coded_mem_map, unsigned long pnum)
{
+ /* mask off the extra low bits of information */
+ coded_mem_map &= SECTION_MAP_MASK;
return ((struct page *)coded_mem_map) + section_nr_to_pfn(pnum);
}

@@ -415,4 +416,40 @@ out:
}
return ret;
}
+
+void sparse_remove_one_section(struct zone *zone, unsigned long section_nr)
+{
+ struct pglist_data *pgdat = zone->zone_pgdat;
+ struct mem_section *ms;
+ struct page *memmap = NULL;
+ unsigned long *usemap = NULL;
+ unsigned long flags;
+
+ pgdat_resize_lock(pgdat, &flags);
+ ms = __nr_to_section(section_nr);
+ if (ms->section_mem_map) {
+ usemap = ms->pageblock_flags;
+ memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
+ ms->section_mem_map = 0;
+ ms->pageblock_flags = NULL;
+ }
+ pgdat_resize_unlock(pgdat, &flags);
+
+ /*
+ * Its ugly, but this is the best I can do - HELP !!
+ * We don't know where the allocations for section memmap and usemap
+ * came from. If they are allocated at the boot time, they would come
+ * from bootmem. If they are added through hot-memory-add they could be
+ * from slab, vmalloc. If they are allocated as part of hot-mem-add
+ * free them up properly. If they are allocated at boot, no easy way
+ * to correctly free them :(
+ */
+ if (usemap) {
+ if (PageSlab(virt_to_page(usemap))) {
+ kfree(usemap);
+ if (memmap)
+ __kfree_section_memmap(memmap, PAGES_PER_SECTION);
+ }
+ }
+}
#endif
Index: linux-2.6.24/include/linux/memory_hotplug.h
===================================================================
--- linux-2.6.24.orig/include/linux/memory_hotplug.h 2008-02-07 17:16:52.000000000 -0800
+++ linux-2.6.24/include/linux/memory_hotplug.h 2008-02-12 13:37:47.000000000 -0800
@@ -64,6 +64,8 @@ extern int offline_pages(unsigned long,
/* reasonably generic interface to expand the physical pages in a zone */
extern int __add_pages(struct zone *zone, unsigned long start_pfn,
unsigned long nr_pages);
+extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
+ unsigned long nr_pages);

/*
* Walk thorugh all memory which is registered as resource.
@@ -188,5 +190,7 @@ extern int arch_add_memory(int nid, u64
extern int remove_memory(u64 start, u64 size);
extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
int nr_pages);
+extern void sparse_remove_one_section(struct zone *zone,
+ unsigned long section_nr);

#endif /* __LINUX_MEMORY_HOTPLUG_H */

2008-02-12 21:57:30

by Dave Hansen

[permalink] [raw]
Subject: Re: [-mm PATCH] register_memory/unregister_memory clean ups


On Tue, 2008-02-12 at 13:56 -0800, Badari Pulavarty wrote:
>
> +static void __remove_section(struct zone *zone, unsigned long
> section_nr)
> +{
> + if (!valid_section_nr(section_nr))
> + return;
> +
> + unregister_memory_section(__nr_to_section(section_nr));
> + sparse_remove_one_section(zone, section_nr);
> +}

I do think passing in a mem_section* here is highly superior. It makes
it impossible to pass a pfn in and not get a warning.

-- Dave

2008-02-12 22:05:10

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [-mm PATCH] register_memory/unregister_memory clean ups

On Tue, 2008-02-12 at 13:57 -0800, Dave Hansen wrote:
> On Tue, 2008-02-12 at 13:56 -0800, Badari Pulavarty wrote:
> >
> > +static void __remove_section(struct zone *zone, unsigned long
> > section_nr)
> > +{
> > + if (!valid_section_nr(section_nr))
> > + return;
> > +
> > + unregister_memory_section(__nr_to_section(section_nr));
> > + sparse_remove_one_section(zone, section_nr);
> > +}
>
> I do think passing in a mem_section* here is highly superior. It makes
> it impossible to pass a pfn in and not get a warning.
>

Only problem is, I need to hold pgdat_resize_lock() if pass *ms.
If I don't hold the resize_lock, I have to re-evaluate. And also,
I need to pass section_nr for decoding the mem_map anyway :(

Thanks,
Badari

2008-02-12 22:06:38

by Dave Hansen

[permalink] [raw]
Subject: Re: [-mm PATCH] register_memory/unregister_memory clean ups

On Tue, 2008-02-12 at 13:56 -0800, Badari Pulavarty wrote:
> > > + /*
> > > + * Its ugly, but this is the best I can do - HELP !!
> > > + * We don't know where the allocations for section memmap and usemap
> > > + * came from. If they are allocated at the boot time, they would come
> > > + * from bootmem. If they are added through hot-memory-add they could be
> > > + * from sla or vmalloc. If they are allocated as part of hot-mem-add
> > > + * free them up properly. If they are allocated at boot, no easy way
> > > + * to correctly free them :(
> > > + */
> > > + if (usemap) {
> > > + if (PageSlab(virt_to_page(usemap))) {
> > > + kfree(usemap);
> > > + if (memmap)
> > > + __kfree_section_memmap(memmap, nr_pages);
> > > + }
> > > + }
> > > +}
> >
> > Do what we did with the memmap and store some of its origination
> > information in the low bits.
>
> Hmm. my understand of memmap is limited. Can you help me out here ?

Never mind. That was a bad suggestion. I do think it would be a good
idea to mark the 'struct page' of ever page we use as bootmem in some
way. Perhaps page->private? Otherwise, you can simply try all of the
possibilities and consider the remainder bootmem. Did you ever find out
if we properly initialize the bootmem 'struct page's?

Please have mercy and put this in a helper, first of all.

static void free_usemap(unsigned long *usemap)
{
if (!usemap_
return;

if (PageSlab(virt_to_page(usemap))) {
kfree(usemap)
} else if (is_vmalloc_addr(usemap)) {
vfree(usemap);
} else {
int nid = page_to_nid(virt_to_page(usemap));
bootmem_fun_here(NODE_DATA(nid), usemap);
}
}

right?

> I was trying to use free_bootmem_node() to free up the allocations,
> but I need nodeid from which allocation came from :(

How is this any different from pfn_to_nid() on the thing? Or, can you
not use that because we never init'd the bootmem 'struct page's?

If so, I think the *CORRECT* fix is to give the bootmem areas real
struct pages, probably at boot-time.

-- Dave

2008-02-12 22:15:33

by Dave Hansen

[permalink] [raw]
Subject: Re: [-mm PATCH] register_memory/unregister_memory clean ups

On Tue, 2008-02-12 at 14:07 -0800, Badari Pulavarty wrote:
> On Tue, 2008-02-12 at 13:57 -0800, Dave Hansen wrote:
> > On Tue, 2008-02-12 at 13:56 -0800, Badari Pulavarty wrote:
> > >
> > > +static void __remove_section(struct zone *zone, unsigned long
> > > section_nr)
> > > +{
> > > + if (!valid_section_nr(section_nr))
> > > + return;
> > > +
> > > + unregister_memory_section(__nr_to_section(section_nr));
> > > + sparse_remove_one_section(zone, section_nr);
> > > +}
> >
> > I do think passing in a mem_section* here is highly superior. It makes
> > it impossible to pass a pfn in and not get a warning.
> >
>
> Only problem is, I need to hold pgdat_resize_lock() if pass *ms.
> If I don't hold the resize_lock, I have to re-evaluate.

What's wrong with holding the resize lock? What races, precisely, are
you trying to avoid?

> And also,
> I need to pass section_nr for decoding the mem_map anyway :(

See sparse.c::__section_nr(). It takes a mem_section* and returns a
section_nr.

-- Dave

2008-02-12 22:49:20

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [-mm PATCH] register_memory/unregister_memory clean ups

On Tue, 2008-02-12 at 14:15 -0800, Dave Hansen wrote:
> On Tue, 2008-02-12 at 14:07 -0800, Badari Pulavarty wrote:
> > On Tue, 2008-02-12 at 13:57 -0800, Dave Hansen wrote:
> > > On Tue, 2008-02-12 at 13:56 -0800, Badari Pulavarty wrote:
> > > >
> > > > +static void __remove_section(struct zone *zone, unsigned long
> > > > section_nr)
> > > > +{
> > > > + if (!valid_section_nr(section_nr))
> > > > + return;
> > > > +
> > > > + unregister_memory_section(__nr_to_section(section_nr));
> > > > + sparse_remove_one_section(zone, section_nr);
> > > > +}
> > >
> > > I do think passing in a mem_section* here is highly superior. It makes
> > > it impossible to pass a pfn in and not get a warning.
> > >
> >
> > Only problem is, I need to hold pgdat_resize_lock() if pass *ms.
> > If I don't hold the resize_lock, I have to re-evaluate.
>
> What's wrong with holding the resize lock? What races, precisely, are
> you trying to avoid?

I was trying to avoid holding resize lock for entire duration of
remove_section(), which includes removing sysfs entries etc. Its
needed only to decode and clear out sectionmap. (I am no longer
passing pfns).

Whats wrong with passing section_nr ? It simply checks if that
section exists and if so removes sysfs entries and corresponding
sectionmap. What wrong thing can happen ?

>
> > And also,
> > I need to pass section_nr for decoding the mem_map anyway :(
>
> See sparse.c::__section_nr(). It takes a mem_section* and returns a
> section_nr.

I know. It looked like a round about of getting section_nr while
we have that information easily available.

If you are really passionate about passing mem_section*, sure I
can do that :)

Thanks,
Badari

2008-02-12 23:00:57

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [-mm PATCH] register_memory/unregister_memory clean ups

On Tue, 2008-02-12 at 14:15 -0800, Dave Hansen wrote:
> On Tue, 2008-02-12 at 14:07 -0800, Badari Pulavarty wrote:
> > On Tue, 2008-02-12 at 13:57 -0800, Dave Hansen wrote:
> > > On Tue, 2008-02-12 at 13:56 -0800, Badari Pulavarty wrote:
> > > >
> > > > +static void __remove_section(struct zone *zone, unsigned long
> > > > section_nr)
> > > > +{
> > > > + if (!valid_section_nr(section_nr))
> > > > + return;
> > > > +
> > > > + unregister_memory_section(__nr_to_section(section_nr));
> > > > + sparse_remove_one_section(zone, section_nr);
> > > > +}
> > >
> > > I do think passing in a mem_section* here is highly superior. It makes
> > > it impossible to pass a pfn in and not get a warning.
> > >
> >
> > Only problem is, I need to hold pgdat_resize_lock() if pass *ms.
> > If I don't hold the resize_lock, I have to re-evaluate.
>
> What's wrong with holding the resize lock? What races, precisely, are
> you trying to avoid?
>
> > And also,
> > I need to pass section_nr for decoding the mem_map anyway :(
>
> See sparse.c::__section_nr(). It takes a mem_section* and returns a
> section_nr.

Here is the version with your suggestion. Do you like this better ?

Thanks,
Badari

Generic helper function to remove section mappings and sysfs entries
for the section of the memory we are removing. offline_pages() correctly
adjusted zone and marked the pages reserved.

Issue: If mem_map, usemap allocation could come from different places -
kmalloc, vmalloc, alloc_pages or bootmem. There is no easy way
to find and free up properly. Especially for bootmem, we need to
know which node the allocation came from.

Signed-off-by: Badari Pulavarty <[email protected]>

---
include/linux/memory_hotplug.h | 4 ++++
mm/memory_hotplug.c | 34 ++++++++++++++++++++++++++++++++++
mm/sparse.c | 39 ++++++++++++++++++++++++++++++++++++---
3 files changed, 74 insertions(+), 3 deletions(-)

Index: linux-2.6.24/mm/memory_hotplug.c
===================================================================
--- linux-2.6.24.orig/mm/memory_hotplug.c 2008-02-07 17:16:52.000000000 -0800
+++ linux-2.6.24/mm/memory_hotplug.c 2008-02-12 14:49:07.000000000 -0800
@@ -102,6 +102,15 @@ static int __add_section(struct zone *zo
return register_new_memory(__pfn_to_section(phys_start_pfn));
}

+static void __remove_section(struct zone *zone, struct mem_section *ms)
+{
+ if (!valid_section(ms))
+ return;
+
+ unregister_memory_section(ms);
+ sparse_remove_one_section(zone, ms);
+}
+
/*
* Reasonably generic function for adding memory. It is
* expected that archs that support memory hotplug will
@@ -135,6 +144,31 @@ int __add_pages(struct zone *zone, unsig
}
EXPORT_SYMBOL_GPL(__add_pages);

+void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
+ unsigned long nr_pages)
+{
+ unsigned long i;
+ int sections_to_remove;
+ unsigned long flags;
+ struct pglist_data *pgdat = zone->zone_pgdat;
+
+ /*
+ * We can only remove entire sections
+ */
+ BUG_ON(phys_start_pfn & ~PAGE_SECTION_MASK);
+ BUG_ON(nr_pages % PAGES_PER_SECTION);
+
+ sections_to_remove = nr_pages / PAGES_PER_SECTION;
+
+ for (i = 0; i < sections_to_remove; i++) {
+ unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION;
+ pgdat_resize_lock(pgdat, &flags);
+ __remove_section(zone, pfn_to_section(pfn));
+ pgdat_resize_unlock(pgdat, &flags);
+ }
+}
+EXPORT_SYMBOL_GPL(__remove_pages);
+
static void grow_zone_span(struct zone *zone,
unsigned long start_pfn, unsigned long end_pfn)
{
Index: linux-2.6.24/mm/sparse.c
===================================================================
--- linux-2.6.24.orig/mm/sparse.c 2008-02-07 17:16:52.000000000 -0800
+++ linux-2.6.24/mm/sparse.c 2008-02-12 14:51:07.000000000 -0800
@@ -198,12 +198,13 @@ static unsigned long sparse_encode_mem_m
}

/*
- * We need this if we ever free the mem_maps. While not implemented yet,
- * this function is included for parity with its sibling.
+ * Decode mem_map from the coded memmap
*/
-static __attribute((unused))
+static
struct page *sparse_decode_mem_map(unsigned long coded_mem_map, unsigned long pnum)
{
+ /* mask off the extra low bits of information */
+ coded_mem_map &= SECTION_MAP_MASK;
return ((struct page *)coded_mem_map) + section_nr_to_pfn(pnum);
}

@@ -415,4 +416,36 @@ out:
}
return ret;
}
+
+void sparse_remove_one_section(struct zone *zone, struct mem_section *ms)
+{
+ struct page *memmap = NULL;
+ unsigned long *usemap = NULL;
+ unsigned long flags;
+
+ if (ms->section_mem_map) {
+ usemap = ms->pageblock_flags;
+ memmap = sparse_decode_mem_map(ms->section_mem_map,
+ __section_nr(ms));
+ ms->section_mem_map = 0;
+ ms->pageblock_flags = NULL;
+ }
+
+ /*
+ * Its ugly, but this is the best I can do - HELP !!
+ * We don't know where the allocations for section memmap and usemap
+ * came from. If they are allocated at the boot time, they would come
+ * from bootmem. If they are added through hot-memory-add they could be
+ * from slab, vmalloc. If they are allocated as part of hot-mem-add
+ * free them up properly. If they are allocated at boot, no easy way
+ * to correctly free them :(
+ */
+ if (usemap) {
+ if (PageSlab(virt_to_page(usemap))) {
+ kfree(usemap);
+ if (memmap)
+ __kfree_section_memmap(memmap, PAGES_PER_SECTION);
+ }
+ }
+}
#endif
Index: linux-2.6.24/include/linux/memory_hotplug.h
===================================================================
--- linux-2.6.24.orig/include/linux/memory_hotplug.h 2008-02-07 17:16:52.000000000 -0800
+++ linux-2.6.24/include/linux/memory_hotplug.h 2008-02-12 14:54:40.000000000 -0800
@@ -8,6 +8,7 @@
struct page;
struct zone;
struct pglist_data;
+struct mem_section;

#ifdef CONFIG_MEMORY_HOTPLUG
/*
@@ -64,6 +65,8 @@ extern int offline_pages(unsigned long,
/* reasonably generic interface to expand the physical pages in a zone */
extern int __add_pages(struct zone *zone, unsigned long start_pfn,
unsigned long nr_pages);
+extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
+ unsigned long nr_pages);

/*
* Walk thorugh all memory which is registered as resource.
@@ -188,5 +191,6 @@ extern int arch_add_memory(int nid, u64
extern int remove_memory(u64 start, u64 size);
extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
int nr_pages);
+extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms);

#endif /* __LINUX_MEMORY_HOTPLUG_H */

2008-02-12 23:21:17

by Dave Hansen

[permalink] [raw]
Subject: Re: [-mm PATCH] register_memory/unregister_memory clean ups

On Tue, 2008-02-12 at 15:03 -0800, Badari Pulavarty wrote:
> Here is the version with your suggestion. Do you like this better ?

I do like how it looks, better, thanks.

2008-02-13 05:14:20

by Yasunori Goto

[permalink] [raw]
Subject: Re: [-mm PATCH] register_memory/unregister_memory clean ups

Thanks Badari-san.

I understand what was occured. :-)

> On Tue, 2008-02-12 at 13:56 -0800, Badari Pulavarty wrote:
> > > > + /*
> > > > + * Its ugly, but this is the best I can do - HELP !!
> > > > + * We don't know where the allocations for section memmap and usemap
> > > > + * came from. If they are allocated at the boot time, they would come
> > > > + * from bootmem. If they are added through hot-memory-add they could be
> > > > + * from sla or vmalloc. If they are allocated as part of hot-mem-add
> > > > + * free them up properly. If they are allocated at boot, no easy way
> > > > + * to correctly free them :(
> > > > + */
> > > > + if (usemap) {
> > > > + if (PageSlab(virt_to_page(usemap))) {
> > > > + kfree(usemap);
> > > > + if (memmap)
> > > > + __kfree_section_memmap(memmap, nr_pages);
> > > > + }
> > > > + }
> > > > +}
> > >
> > > Do what we did with the memmap and store some of its origination
> > > information in the low bits.
> >
> > Hmm. my understand of memmap is limited. Can you help me out here ?
>
> Never mind. That was a bad suggestion. I do think it would be a good
> idea to mark the 'struct page' of ever page we use as bootmem in some
> way. Perhaps page->private?

I agree. page->private is not used by bootmem allocator.

I would like to mark not only memmap but also pgdat (and so on)
for next step. It will be necessary for removing whole node. :-)


> Otherwise, you can simply try all of the
> possibilities and consider the remainder bootmem. Did you ever find out
> if we properly initialize the bootmem 'struct page's?
>
> Please have mercy and put this in a helper, first of all.
>
> static void free_usemap(unsigned long *usemap)
> {
> if (!usemap_
> return;
>
> if (PageSlab(virt_to_page(usemap))) {
> kfree(usemap)
> } else if (is_vmalloc_addr(usemap)) {
> vfree(usemap);
> } else {
> int nid = page_to_nid(virt_to_page(usemap));
> bootmem_fun_here(NODE_DATA(nid), usemap);
> }
> }
>
> right?

It may work. But, to be honest, I feel there are TOO MANY allocation/free
way for memmap (usemap and so on). If possible, I would like to
unify some of them. I would like to try it.

Bye.

--
Yasunori Goto

2008-02-13 17:29:18

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [-mm PATCH] register_memory/unregister_memory clean ups

On Wed, 2008-02-13 at 14:09 +0900, Yasunori Goto wrote:
> Thanks Badari-san.
>
> I understand what was occured. :-)
>
> > On Tue, 2008-02-12 at 13:56 -0800, Badari Pulavarty wrote:
> > > > > + /*
> > > > > + * Its ugly, but this is the best I can do - HELP !!
> > > > > + * We don't know where the allocations for section memmap and usemap
> > > > > + * came from. If they are allocated at the boot time, they would come
> > > > > + * from bootmem. If they are added through hot-memory-add they could be
> > > > > + * from sla or vmalloc. If they are allocated as part of hot-mem-add
> > > > > + * free them up properly. If they are allocated at boot, no easy way
> > > > > + * to correctly free them :(
> > > > > + */
> > > > > + if (usemap) {
> > > > > + if (PageSlab(virt_to_page(usemap))) {
> > > > > + kfree(usemap);
> > > > > + if (memmap)
> > > > > + __kfree_section_memmap(memmap, nr_pages);
> > > > > + }
> > > > > + }
> > > > > +}
> > > >
> > > > Do what we did with the memmap and store some of its origination
> > > > information in the low bits.
> > >
> > > Hmm. my understand of memmap is limited. Can you help me out here ?
> >
> > Never mind. That was a bad suggestion. I do think it would be a good
> > idea to mark the 'struct page' of ever page we use as bootmem in some
> > way. Perhaps page->private?
>
> I agree. page->private is not used by bootmem allocator.
>
> I would like to mark not only memmap but also pgdat (and so on)
> for next step. It will be necessary for removing whole node. :-)
>
>
> > Otherwise, you can simply try all of the
> > possibilities and consider the remainder bootmem. Did you ever find out
> > if we properly initialize the bootmem 'struct page's?
> >
> > Please have mercy and put this in a helper, first of all.
> >
> > static void free_usemap(unsigned long *usemap)
> > {
> > if (!usemap_
> > return;
> >
> > if (PageSlab(virt_to_page(usemap))) {
> > kfree(usemap)
> > } else if (is_vmalloc_addr(usemap)) {
> > vfree(usemap);
> > } else {
> > int nid = page_to_nid(virt_to_page(usemap));
> > bootmem_fun_here(NODE_DATA(nid), usemap);
> > }
> > }
> >
> > right?
>
> It may work. But, to be honest, I feel there are TOO MANY allocation/free
> way for memmap (usemap and so on). If possible, I would like to
> unify some of them. I would like to try it.

Thank you for the offer. Here is the latest patch, feel free to
rip it out.

Thanks,
Badari

Generic helper function to remove section mappings and sysfs entries
for the section of the memory we are removing. offline_pages() correctly
adjusted zone and marked the pages reserved.

Issue: Need help on freeing up allocation made from bootmem.

Signed-off-by: Badari Pulavarty <[email protected]>

---
include/linux/memory_hotplug.h | 4 +++
mm/memory_hotplug.c | 34 +++++++++++++++++++++++++++++++
mm/sparse.c | 44 ++++++++++++++++++++++++++++++++++++++---
3 files changed, 79 insertions(+), 3 deletions(-)

Index: linux-2.6.24/mm/memory_hotplug.c
===================================================================
--- linux-2.6.24.orig/mm/memory_hotplug.c 2008-02-12 15:07:09.000000000 -0800
+++ linux-2.6.24/mm/memory_hotplug.c 2008-02-12 15:08:50.000000000 -0800
@@ -102,6 +102,15 @@ static int __add_section(struct zone *zo
return register_new_memory(__pfn_to_section(phys_start_pfn));
}

+static void __remove_section(struct zone *zone, struct mem_section *ms)
+{
+ if (!valid_section(ms))
+ return;
+
+ unregister_memory_section(ms);
+ sparse_remove_one_section(zone, ms);
+}
+
/*
* Reasonably generic function for adding memory. It is
* expected that archs that support memory hotplug will
@@ -135,6 +144,31 @@ int __add_pages(struct zone *zone, unsig
}
EXPORT_SYMBOL_GPL(__add_pages);

+void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
+ unsigned long nr_pages)
+{
+ unsigned long i;
+ int sections_to_remove;
+ unsigned long flags;
+ struct pglist_data *pgdat = zone->zone_pgdat;
+
+ /*
+ * We can only remove entire sections
+ */
+ BUG_ON(phys_start_pfn & ~PAGE_SECTION_MASK);
+ BUG_ON(nr_pages % PAGES_PER_SECTION);
+
+ sections_to_remove = nr_pages / PAGES_PER_SECTION;
+
+ for (i = 0; i < sections_to_remove; i++) {
+ unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION;
+ pgdat_resize_lock(pgdat, &flags);
+ __remove_section(zone, __pfn_to_section(pfn));
+ pgdat_resize_unlock(pgdat, &flags);
+ }
+}
+EXPORT_SYMBOL_GPL(__remove_pages);
+
static void grow_zone_span(struct zone *zone,
unsigned long start_pfn, unsigned long end_pfn)
{
Index: linux-2.6.24/mm/sparse.c
===================================================================
--- linux-2.6.24.orig/mm/sparse.c 2008-02-12 15:07:09.000000000 -0800
+++ linux-2.6.24/mm/sparse.c 2008-02-12 17:10:16.000000000 -0800
@@ -198,12 +198,13 @@ static unsigned long sparse_encode_mem_m
}

/*
- * We need this if we ever free the mem_maps. While not implemented yet,
- * this function is included for parity with its sibling.
+ * Decode mem_map from the coded memmap
*/
-static __attribute((unused))
+static
struct page *sparse_decode_mem_map(unsigned long coded_mem_map, unsigned long pnum)
{
+ /* mask off the extra low bits of information */
+ coded_mem_map &= SECTION_MAP_MASK;
return ((struct page *)coded_mem_map) + section_nr_to_pfn(pnum);
}

@@ -363,6 +364,26 @@ static void __kfree_section_memmap(struc
}
#endif /* CONFIG_SPARSEMEM_VMEMMAP */

+static void free_section_usemap(struct page *memmap, unsigned long *usemap)
+{
+ if (!usemap)
+ return;
+
+ /*
+ * Check to see if allocation came from hot-plug-add
+ */
+ if (PageSlab(virt_to_page(usemap))) {
+ kfree(usemap);
+ if (memmap)
+ __kfree_section_memmap(memmap, PAGES_PER_SECTION);
+ return;
+ }
+
+ /*
+ * Allocations came from bootmem - how do I free up ?
+ */
+}
+
/*
* returns the number of sections whose mem_maps were properly
* set. If this is <=0, then that means that the passed-in
@@ -415,4 +436,21 @@ out:
}
return ret;
}
+
+void sparse_remove_one_section(struct zone *zone, struct mem_section *ms)
+{
+ struct page *memmap = NULL;
+ unsigned long *usemap = NULL;
+ unsigned long flags;
+
+ if (ms->section_mem_map) {
+ usemap = ms->pageblock_flags;
+ memmap = sparse_decode_mem_map(ms->section_mem_map,
+ __section_nr(ms));
+ ms->section_mem_map = 0;
+ ms->pageblock_flags = NULL;
+ }
+
+ free_section_usemap(memmap, usemap);
+}
#endif
Index: linux-2.6.24/include/linux/memory_hotplug.h
===================================================================
--- linux-2.6.24.orig/include/linux/memory_hotplug.h 2008-02-12 15:07:09.000000000 -0800
+++ linux-2.6.24/include/linux/memory_hotplug.h 2008-02-12 15:08:50.000000000 -0800
@@ -8,6 +8,7 @@
struct page;
struct zone;
struct pglist_data;
+struct mem_section;

#ifdef CONFIG_MEMORY_HOTPLUG
/*
@@ -64,6 +65,8 @@ extern int offline_pages(unsigned long,
/* reasonably generic interface to expand the physical pages in a zone */
extern int __add_pages(struct zone *zone, unsigned long start_pfn,
unsigned long nr_pages);
+extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
+ unsigned long nr_pages);

/*
* Walk thorugh all memory which is registered as resource.
@@ -188,5 +191,6 @@ extern int arch_add_memory(int nid, u64
extern int remove_memory(u64 start, u64 size);
extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
int nr_pages);
+extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms);

#endif /* __LINUX_MEMORY_HOTPLUG_H */

2008-02-19 19:46:12

by Greg KH

[permalink] [raw]
Subject: patch driver-core-register_memory-unregister_memory-clean-ups-and-bugfix.patch added to gregkh-2.6 tree


This is a note to let you know that I've just added the patch titled

Subject: driver core: register_memory/unregister_memory clean ups and bugfix

to my gregkh-2.6 tree. Its filename is

driver-core-register_memory-unregister_memory-clean-ups-and-bugfix.patch

This tree can be found at
http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/


>From [email protected] Mon Feb 11 09:20:30 2008
From: Badari Pulavarty <[email protected]>
Date: Mon, 11 Feb 2008 09:23:18 -0800
Subject: driver core: register_memory/unregister_memory clean ups and bugfix
To: Andrew Morton <[email protected]>, lkml <[email protected]>
Cc: Greg KH <[email protected]>, [email protected], linux-mm <[email protected]>
Message-ID: <[email protected]>

register_memory()/unregister_memory() never gets called with
"root". unregister_memory() is accessing kobject_name of
the object just freed up. Since no one uses the code,
lets take the code out. And also, make register_memory() static.

Another bug fix - before calling unregister_memory()
remove_memory_block() gets a ref on kobject. unregister_memory()
need to drop that ref before calling sysdev_unregister().

Signed-off-by: Badari Pulavarty <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/base/memory.c | 22 +++++++---------------
1 file changed, 7 insertions(+), 15 deletions(-)

--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -62,8 +62,8 @@ void unregister_memory_notifier(struct n
/*
* register_memory - Setup a sysfs device for a memory block
*/
-int register_memory(struct memory_block *memory, struct mem_section *section,
- struct node *root)
+static
+int register_memory(struct memory_block *memory, struct mem_section *section)
{
int error;

@@ -71,26 +71,18 @@ int register_memory(struct memory_block
memory->sysdev.id = __section_nr(section);

error = sysdev_register(&memory->sysdev);
-
- if (root && !error)
- error = sysfs_create_link(&root->sysdev.kobj,
- &memory->sysdev.kobj,
- kobject_name(&memory->sysdev.kobj));
-
return error;
}

static void
-unregister_memory(struct memory_block *memory, struct mem_section *section,
- struct node *root)
+unregister_memory(struct memory_block *memory, struct mem_section *section)
{
BUG_ON(memory->sysdev.cls != &memory_sysdev_class);
BUG_ON(memory->sysdev.id != __section_nr(section));

+ /* drop the ref. we got in remove_memory_block() */
+ kobject_put(&memory->sysdev.kobj);
sysdev_unregister(&memory->sysdev);
- if (root)
- sysfs_remove_link(&root->sysdev.kobj,
- kobject_name(&memory->sysdev.kobj));
}

/*
@@ -345,7 +337,7 @@ static int add_memory_block(unsigned lon
mutex_init(&mem->state_mutex);
mem->phys_device = phys_device;

- ret = register_memory(mem, section, NULL);
+ ret = register_memory(mem, section);
if (!ret)
ret = mem_create_simple_file(mem, phys_index);
if (!ret)
@@ -396,7 +388,7 @@ int remove_memory_block(unsigned long no
mem_remove_simple_file(mem, phys_index);
mem_remove_simple_file(mem, state);
mem_remove_simple_file(mem, phys_device);
- unregister_memory(mem, section, NULL);
+ unregister_memory(mem, section);

return 0;
}


Patches currently in gregkh-2.6 which might be from [email protected] are

driver/driver-core-register_memory-unregister_memory-clean-ups-and-bugfix.patch