The attached patch, largely written by Andy Whitcroft, implements a
feature which is similar to DISCONTIGMEM, but has some added features.
Instead of splitting up the mem_map for each NUMA node, this splits it
up into areas that represent fixed blocks of memory. This allows
individual pieces of that memory to be easily added and removed.
Because it is so similar to DISCONTIGMEM, it can actually be used in
place of it on NUMA systems such as the NUMAQ, or Summit architectures.
This patch includes an i386 and ppc64 implementation, but there are
x86_64 and ia64 implementations as well.
There are a number of individual patches (with descriptions) which are
rolled up in the attached patch: all of the files up to and including
"G2-no-memory-at-high_memory-ppc64.patch" from this directory:
http://www.sr71.net/patches/2.6.11/2.6.11-rc3-mhp1/broken-out/
I can post individual patches if anyone would like to comment on them.
-- Dave
The attached patch is a prototype implementation of memory hot-add. It
allows you to boot your system, and add memory to it later. Why would
you want to do this? Well, it's a step before memory removal which can
help cope with things like bad RAM. This is primarily useful for a
machine that you don't want to reboot during an upgrade.
For instance, on my 1GB laptop, I booted with mem=512M on the kernel
command-line. Once I had booted, I did the following:
cd /sys/devices/system/memory
echo 0x20000000 > probe
echo 0x30000000 > probe
echo online > memory2/state
echo online > memory3/state
and the last 512MB of my laptop's memory was onlined. The onlining
operations can occur from an /etc/hotplug script if desired.
Here's the config file that I used:
http://www.sr71.net/patches/2.6.11/2.6.11-rc3-mhp1/configs/config-i386-T41-laptop
The important config options are:
CONFIG_MEMORY_HOTPLUG=y
CONFIG_SPARSEMEM=y
CONFIG_SIMULATED_MEM_HOTPLUG=y
This patch depends on the previously posed "Sparse Memory Handling
(hot-add foundation)" patch.
There are a number of individual patches (with descriptions) which are
rolled up in the attached patch: all of the files listed after
"G2-no-memory-at-high_memory-ppc64.patch" from this directory:
http://www.sr71.net/patches/2.6.11/2.6.11-rc3-mhp1/broken-out/
I can post individual patches if anyone would like to comment on them.
-- Dave
On Thu, Feb 17, 2005 at 04:03:53PM -0800, Dave Hansen wrote:
> The attached patch
Just tried to compile this and noticed that there is no definition
of valid_section_nr(), referenced in sparse_init.
--
Mike
Dave Hansen <[email protected]> writes:
> The attached patch, largely written by Andy Whitcroft, implements a
> feature which is similar to DISCONTIGMEM, but has some added features.
> Instead of splitting up the mem_map for each NUMA node, this splits it
> up into areas that represent fixed blocks of memory. This allows
> individual pieces of that memory to be easily added and removed.
[...]
I'm curious - how does this affect .text size for a i386 or x86-64 NUMA
kernel? One area I wanted to improve on x86-64 for a long time was
to shrink the big virt_to_page() etc. inline macros. Your new code
actually looks a bit smaller.
-Andi
On Thu, 2005-02-17 at 21:16 -0800, Mike Kravetz wrote:
> On Thu, Feb 17, 2005 at 04:03:53PM -0800, Dave Hansen wrote:
> > The attached patch
>
> Just tried to compile this and noticed that there is no definition
> of valid_section_nr(), referenced in sparse_init.
What's your .config? I didn't actually try it on ppc64, and I may have
missed one of the necessary patches. I trimmed it down to very near the
minimum set on x86.
-- Dave
On Fri, 2005-02-18 at 11:04 +0100, Andi Kleen wrote:
> Dave Hansen <[email protected]> writes:
>
> > The attached patch, largely written by Andy Whitcroft, implements a
> > feature which is similar to DISCONTIGMEM, but has some added features.
> > Instead of splitting up the mem_map for each NUMA node, this splits it
> > up into areas that represent fixed blocks of memory. This allows
> > individual pieces of that memory to be easily added and removed.
>
> I'm curious - how does this affect .text size for a i386 or x86-64 NUMA
> kernel? One area I wanted to improve on x86-64 for a long time was
> to shrink the big virt_to_page() etc. inline macros. Your new code
> actually looks a bit smaller.
On x86, it looks like a 3k increase in text size. I know Matt Tolentino
has been testing it on x86_64, he might have a comparison there for you.
$ size i386-T41-laptop*/vmlinux
text data bss dec hex filename
2897131 580592 204252 3681975 382eb7 i386-T41-laptop.sparse/vmlinux
2894166 581832 203228 3679226 3823fa i386-T41-laptop/vmlinux
BTW, this PAE is on and uses 36-bits of physaddr space.
-- Dave
On Thu, 17 Feb 2005, Dave Hansen wrote:
> The attached patch is a prototype implementation of memory hot-add. It
> allows you to boot your system, and add memory to it later. Why would
> you want to do this?
I want it so I can grow Xen guests after they have been booted
up. Being able to hot-add memory is essential for dynamically
resizing the memory of various guest OSes, to readjust them for
the workload.
Memory hot-remove isn't really needed with Xen, the balloon
driver takes care of that.
> I can post individual patches if anyone would like to comment on them.
I'm interested. I want to get this stuff working with Xen ;)
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
On Fri, 2005-02-18 at 16:52 -0500, Rik van Riel wrote:
> On Thu, 17 Feb 2005, Dave Hansen wrote:
> > The attached patch is a prototype implementation of memory hot-add. It
> > allows you to boot your system, and add memory to it later. Why would
> > you want to do this?
>
> I want it so I can grow Xen guests after they have been booted
> up. Being able to hot-add memory is essential for dynamically
> resizing the memory of various guest OSes, to readjust them for
> the workload.
That's the same thing we like about it on ppc64 partitions.
> Memory hot-remove isn't really needed with Xen, the balloon
> driver takes care of that.
You can free up individual pages back to the hypervisor, but you might
also want the opportunity to free up some unused mem_map if you shrink
the partition by a large amount.
> > I can post individual patches if anyone would like to comment on them.
>
> I'm interested. I want to get this stuff working with Xen ;)
You can either pull them from here:
http://www.sr71.net/patches/2.6.11/2.6.11-rc3-mhp1/broken-out/
or grab the whole tarball:
http://www.sr71.net/patches/2.6.11/2.6.11-rc3-mhp1/broken-out-2.6.11-rc3-mhp1.tar.gz
Or, I could always post the whole bunch to lhms. Nobody there should
mind too much. :)
The largest part of porting hot-add to a new architecture is usually the
sparsemem portion. You'll pretty much have to #ifdef pfn_to_page() and
friends, declare a few macros, and then do a bit of debugging. Here's
ppc64 as an example:
http://www.sr71.net/patches/2.6.11/2.6.11-rc3-mhp1/broken-out/B-sparse-170-sparsemem-ppc64.patch
-- Dave
On Fri, 18 Feb 2005, Dave Hansen wrote:
>> Memory hot-remove isn't really needed with Xen, the balloon
>> driver takes care of that.
>
> You can free up individual pages back to the hypervisor, but you might
> also want the opportunity to free up some unused mem_map if you shrink
> the partition by a large amount.
Agreed, though I rather like the fact that the code can
be introduced bit by bit, so the memory hot-remove code
(probably the most complex part) doesn't need to be
maintained out-of-tree for Xen, but can wait until it
is upstream.
>>> I can post individual patches if anyone would like to comment on them.
>>
>> I'm interested. I want to get this stuff working with Xen ;)
>
> You can either pull them from here:
>
> http://www.sr71.net/patches/2.6.11/2.6.11-rc3-mhp1/broken-out/
Thanks, I'll take a stab at porting this functionality to Xen.
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan