2006-03-08 13:41:09

by Yasunori Goto

[permalink] [raw]
Subject: [PATCH: 000/017] (RFC)Memory hotplug for new nodes v.3.

Hello.

I'll post newest patches for memory hotadd with pgdat allocation as V3.
There are many changes to make more common code.

This may be too many patches, but I would like to show
total feature of this code now.

This patches are for 2.6.16-rc5-mm3.
I tested these patch on just Tiger4(ia64) with emulation now.
But, I'll test them for x86-64 too after this post.

Please comment.

---------------------------------------

This is memory hotadd code when new node is added.
In this patch, pgdat is allocated when new node is comming.
To initialize pgdat and zones, a set of patches are necessary.
- to allcate and initialize pgdat, zone, zonelist.
- to make new kswapd
- to initialize node_data[] array (ia64)
- to register sysfs file for new node.
- to call memory_hotplug code from acpi container driver.

Note:
- kzalloc is used for pgdat allocation in this version.
So, even if pgdat is allocated, it will be allocated on the other node.
This is only to simplify patches a bit. :-P

-----------------------------
Followings are updates.

Updates from V2 to V3.
- update for 2.6.16-rc5-mm3.
- The caller function of pgdat allcation and so on become common code.
- Passing node id at add_memory().
- build_zonelists() is called after that pages are onlined.
- Updating NODE_DATA() for ia64 become simple and become
common code between booting and hotadd.
(But other consideration will be necessary for hot-remove.)
- kswapd is called by kthread_run().
- finding node id by acpi use handle of its memory device.
-

Updates from V1 to V2.
- update for 2.6.16-rc5-mm2.
- not only ia64, This is tested on x86_64 with NUMA emulation too. :-)
- wait_table_size() allcation is changed.
- Take max size as much as possible.
- Change using GFP_ATOMIC. It is inside of zone_init_lock.
(Warining message of might_sleep() is very well.)
- stop_machine_run(build_zonelists) is move to outside of lock.
- pgdat_insert() is moved to generic code to be used by x86_64.
- add decision of ZONE_DMA32 or ZONE_NORMAL to x86_64's add_memory().
- Make a separated patch to change from __init to __meminit.
- Fix some typo


--
Yasunori Goto



2006-03-09 12:02:30

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH: 000/017] (RFC)Memory hotplug for new nodes v.3.

Yasunori Goto <[email protected]> wrote:
>
> I'll post newest patches for memory hotadd with pgdat allocation as V3.
> There are many changes to make more common code.

General comments:

- Thanks for working against -mm. It can be a bit of a pain, but it
eases staging and integration later on.

- Please review all the code to check that all those functions which can
be made static are indeed made static. I see quite a few global
functions there.

- Make sure that all functions which can be tagged __meminit are so tagged.

- It would be useful to build a CONFIG_MEMORY_HOTPLUG=n kernel both with
and without the patchsets and to publish and maintain the increase in
code size. Ideally that increase will be zero. Probably it won't be,
and it'd be nice to understand why, and to minimise it.

- Arch issues:

- Which architectures is this patchset aimed at and tested on?

- Which other architectures might be able to use this code in the
future? Because we should ask the maintainers of those other
architectures to take a look at the changes.

- What locking does node hot-add use? There are quite a few places in
the kernel which cheerfully iterate across node lists while assuming that
they won't change. The usage of stop_machine_run() is supposed to cover
all that, I assume?

2006-03-10 07:13:16

by Yasunori Goto

[permalink] [raw]
Subject: Re: [PATCH: 000/017] (RFC)Memory hotplug for new nodes v.3.

Thank you for your comment.
I'm very glad. :-)

> Yasunori Goto <[email protected]> wrote:
> >
> > I'll post newest patches for memory hotadd with pgdat allocation as V3.
> > There are many changes to make more common code.
>
> General comments:
>
> - Thanks for working against -mm. It can be a bit of a pain, but it
> eases staging and integration later on.
>
> - Please review all the code to check that all those functions which can
> be made static are indeed made static. I see quite a few global
> functions there.
> - Make sure that all functions which can be tagged __meminit are so tagged.
>
> - It would be useful to build a CONFIG_MEMORY_HOTPLUG=n kernel both with
> and without the patchsets and to publish and maintain the increase in
> code size. Ideally that increase will be zero. Probably it won't be,
> and it'd be nice to understand why, and to minimise it.

Ok. I'll check and fix it.

>
> - Arch issues:
>
> - Which architectures is this patchset aimed at and tested on?

IA64.
At least, Fujitsu is making this style hot-add feature
on ia64 box which is named as PrimeQuest.
(SGI or HP might wait it.)

> - Which other architectures might be able to use this code in the
> future? Because we should ask the maintainers of those other
> architectures to take a look at the changes.

I heard from Andi-san that x86-64 will need this.
And ppc64 might use some of my patch.

It depends on ....
- There is Numa box on its architecture.
- One node of NUMA will be hot-added.

> - What locking does node hot-add use? There are quite a few places in
> the kernel which cheerfully iterate across node lists while assuming that
> they won't change. The usage of stop_machine_run() is supposed to cover
> all that, I assume?

If my understanding is correct, there is 2 critical point.
- One is zonelist update, indeed. Stop_machine_run() can
cover it.
- Another is node_online_map and NODE_DATA().
If node_online_map is onlined before that
NODE_DATA() is updated, or before that pgdat is initialized,
kernel might touch uninitialized pgdat.
So, node_set_online() is called at final point.

The old kernel had pgdat->next link list, it was also critial point
for hot-add. But current -mm remove it. So, it is not issue now. :-)

Thanks.

--
Yasunori Goto