2008-03-26 23:12:24

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Trying to make use of hotplug memory for xen balloon driver

Hi,

I'm trying to make use of hotplug memory in the Xen balloon driver. If
you want to expand a domain to be larger than its initial size, it must
add new page structures to describe the new memory.

The platform is x86-32, with CONFIG_SPARSEMEM and
CONFIG_HOTPLUG_MEMORY. Because the new memory is only pseudo-physical,
the physical address within the domain is arbitrary, and I added a
add_memory_resource() function so I could use allocate_resource() to
find an appropriate address to put the new memory at.

When I want to expand the domain's memory, I do (error checking edited
out for brevity):

res = kzalloc(sizeof(*res), GFP_KERNEL);

res->name = "Xen Balloon";
res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;

ret = allocate_resource(&iomem_resource, res, size, 0, -1,
PAGE_SIZE, NULL, NULL);

ret = add_memory_resource(0, res);

start_pfn = res->start >> PAGE_SHIFT;
end_pfn = (res->end + 1) >> PAGE_SHIFT;

ret = xen_resize_phys_to_mach(end_pfn);

for(pfn = start_pfn; pfn < end_pfn; pfn++) {
struct page *page = pfn_to_page(pfn);

if (PageReserved(page))
continue;

set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
balloon_append(page);
}

at this point the pages have no underlying machine (physical) memory,
but are added to the list of potentially usable pages. This all works fine.

However, when I actually want to use one of these pages, I do:

page = balloon_retrieve();

pfn = page_to_pfn(page);

set_phys_to_machine(pfn, frame_list[i]);

/* Relinquish the page back to the allocator. */
online_page(page);

/* Link back into the page tables if not highmem. */
if (pfn < max_low_pfn) { /* !PageHighMem(page) ? */
int ret;
ret = HYPERVISOR_update_va_mapping(
(unsigned long)__va(pfn << PAGE_SHIFT),
mfn_pte(frame_list[i], PAGE_KERNEL),
0);
BUG_ON(ret);
}

This has two problems:

1. the online_page() raises an error:

Bad page state in process 'events/0'
page:c16fa0cc flags:0x00000000 mapping:00000000 mapcount:1 count:0
Trying to fix it up, but a reboot is needed
Backtrace:
Pid: 9, comm: events/0 Not tainted 2.6.25-rc7-x86-latest.git-dirty #353
[<c015643a>] bad_page+0x55/0x82
[<c0156be6>] free_hot_cold_page+0x60/0x1f1
[<c0103069>] ? xen_restore_fl+0x2e/0x52
[<c0156dae>] free_hot_page+0xa/0xc
[<c0156dcb>] __free_pages+0x1b/0x26
[<c0466e8c>] free_new_highpage+0x11/0x19
[<c0466ea1>] online_page+0xd/0x1b
[<c02809ac>] balloon_process+0x1e6/0x4d3
[<c014671a>] ? lock_acquire+0x90/0x9d
[<c0137720>] run_workqueue+0xbb/0x186
[<c01376e5>] ? run_workqueue+0x80/0x186
[<c02807c6>] ? balloon_process+0x0/0x4d3
[<c0137fe6>] ? worker_thread+0x0/0xbe
[<c0138099>] worker_thread+0xb3/0xbe
[<c013a635>] ? autoremove_wake_function+0x0/0x33
[<c013a56a>] kthread+0x3b/0x61
[<c013a52f>] ? kthread+0x0/0x61
[<c0108b67>] kernel_thread_helper+0x7/0x10
=======================


I can solve this by putting an explicit reset_page_mapcount(page)
before online_page(), but I can't see any other hotplug memory
code which does this.

2. The new pages don't appear to be in the right zone. When I boot a
256M domain I get an initial setup of:

Zone PFN ranges:
DMA 0 -> 4096
Normal 4096 -> 65536
HighMem 65536 -> 65536
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
0: 0 -> 65536
On node 0 totalpages: 65536
DMA zone: 52 pages used for memmap
DMA zone: 0 pages reserved
DMA zone: 4044 pages, LIFO batch:0
Normal zone: 780 pages used for memmap
Normal zone: 60660 pages, LIFO batch:15
HighMem zone: 0 pages used for memmap
Movable zone: 0 pages used for memmap


which presumably means that new pages above pfn 65536 should be in
the highmem zone? But PageHighMem() returns false for those pages.

What am I missing here?

Thanks,
J


2008-03-27 00:09:33

by Dave Hansen

[permalink] [raw]
Subject: Re: Trying to make use of hotplug memory for xen balloon driver


On Wed, 2008-03-26 at 16:11 -0700, Jeremy Fitzhardinge wrote:
>
>
> I'm trying to make use of hotplug memory in the Xen balloon driver.
> If
> you want to expand a domain to be larger than its initial size, it
> must
> add new page structures to describe the new memory.
>
> The platform is x86-32, with CONFIG_SPARSEMEM and
> CONFIG_HOTPLUG_MEMORY. Because the new memory is only
> pseudo-physical,
> the physical address within the domain is arbitrary, and I added a
> add_memory_resource() function so I could use allocate_resource() to
> find an appropriate address to put the new memory at.
>
> When I want to expand the domain's memory, I do (error checking
> edited
> out for brevity):
>
> res = kzalloc(sizeof(*res), GFP_KERNEL);
>
> res->name = "Xen Balloon";
> res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
>
> ret = allocate_resource(&iomem_resource, res, size, 0, -1,
> PAGE_SIZE, NULL, NULL);
>
> ret = add_memory_resource(0, res);

Yeah, this is your problem. You've only allocated the iomem *resource*
for the memory area, which means that you've basically claimed the
physical addresses.

But, you don't have any 'struct page's there.

We really screwed up the memory hotplug code and ended up with some
incredibly arcane function names. You might want to look at
add_memory(). It is hidden away in mm/memory_hotplug.c :)

You might also note that most of the ppc64 memory hotplug is driven by
userspace. The hypervisor actually contacts a daemon on the guest to
tell it where its new memory is. That daemon does the addition
through /sys/devices/system/memory/probe.

-- Dave

2008-03-27 00:16:11

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: Trying to make use of hotplug memory for xen balloon driver

Dave Hansen wrote:
> Yeah, this is your problem. You've only allocated the iomem *resource*
> for the memory area, which means that you've basically claimed the
> physical addresses.
>
> But, you don't have any 'struct page's there.
>
> We really screwed up the memory hotplug code and ended up with some
> incredibly arcane function names. You might want to look at
> add_memory(). It is hidden away in mm/memory_hotplug.c :)
>

Sorry, I should have been clearer. add_memory_resource() is a function
I added; it's effectively add_memory() with the resource-allocating part
factored out:

--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -171,7 +171,10 @@

#endif /* ! CONFIG_MEMORY_HOTPLUG */

+struct resource;
+
extern int add_memory(int nid, u64 start, u64 size);
+extern int add_memory_resource(int nid, struct resource *res);
extern int arch_add_memory(int nid, u64 start, u64 size);
extern int remove_memory(u64 start, u64 size);
extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
===================================================================
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -278,14 +278,28 @@

int add_memory(int nid, u64 start, u64 size)
{
- pg_data_t *pgdat = NULL;
- int new_pgdat = 0;
struct resource *res;
int ret;

res = register_memory_resource(start, size);
if (!res)
return -EEXIST;
+
+ ret = add_memory_resource(nid, res);
+
+ if (ret)
+ release_memory_resource(res);
+
+ return ret;
+}
+
+int add_memory_resource(int nid, struct resource *res)
+{
+ pg_data_t *pgdat = NULL;
+ int new_pgdat = 0;
+ int ret;
+ u64 start = res->start;
+ u64 size = res->end - res->start + 1;

if (!node_online(nid)) {
pgdat = hotadd_new_pgdat(nid, start);
@@ -320,8 +334,6 @@
/* rollback pgdat allocation and others */
if (new_pgdat)
rollback_node_hotadd(nid, pgdat);
- if (res)
- release_memory_resource(res);

return ret;
}


> You might also note that most of the ppc64 memory hotplug is driven by
> userspace. The hypervisor actually contacts a daemon on the guest to
> tell it where its new memory is. That daemon does the addition
> through /sys/devices/system/memory/probe.
>

X86 Xen does it with a combination of hypervisor and userspace. Mostly
it comes down to asking the hypervisor to provide a machine page to put
under a guest pseudo-physical page.

J

2008-03-27 00:26:55

by Dave Hansen

[permalink] [raw]
Subject: Re: Trying to make use of hotplug memory for xen balloon driver


On Wed, 2008-03-26 at 16:11 -0700, Jeremy Fitzhardinge wrote:
> Bad page state in process 'events/0'
> page:c16fa0cc flags:0x00000000 mapping:00000000 mapcount:1 count:0
> Trying to fix it up, but a reboot is needed

The flags being all null looks highly suspicious to me.

Once you've done an add_memory(), the new sections should show up
in /sys. Do you see them in there?

Once they show up, you can online them with:

echo online > /sys/devices/system/memory/memoryXXX/state

That's what actually goes and mucks with the 'struct zone's and the
pgdats to expand them. It will also call online_page() on the whole
range. I think you're trying to do this manually, and missing part of
it.

There's some documentation here:

http://kerneltrap.org/node/14009

But, think of it this way: "add" is what the hardware does. "online" is
what Linux does after the memory has been added so that it can be used.

-- Dave

2008-03-27 00:47:10

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: Trying to make use of hotplug memory for xen balloon driver

On Wed, 26 Mar 2008 16:11:54 -0700
Jeremy Fitzhardinge <[email protected]> wrote:

> Hi,
>
> I'm trying to make use of hotplug memory in the Xen balloon driver. If
> you want to expand a domain to be larger than its initial size, it must
> add new page structures to describe the new memory.
>
> The platform is x86-32, with CONFIG_SPARSEMEM and
> CONFIG_HOTPLUG_MEMORY. Because the new memory is only pseudo-physical,
> the physical address within the domain is arbitrary, and I added a
> add_memory_resource() function so I could use allocate_resource() to
> find an appropriate address to put the new memory at.
>
welcome to chaos of memory hotplug :)

> 1. the online_page() raises an error:
>
> Bad page state in process 'events/0'
> page:c16fa0cc flags:0x00000000 mapping:00000000 mapcount:1 count:0
> Trying to fix it up, but a reboot is needed

Hmm, this seems memmap is not initialized correctly...
page->flags == 0 means page is in ZONE_DMA.(it's only 16MB range on x86)
I think memmap is not initilalized.

Calling path to memmap initailization is.
==
add_memory()
-> arch_add_memory()
-> __add_page()
-> __add_section()
-> __add_zone()
-> memmap_init_zone()
==
Please check what arch_add_memory() is called, at first.



> 2. The new pages don't appear to be in the right zone. When I boot a
> 256M domain I get an initial setup of:
>
> Zone PFN ranges:
> DMA 0 -> 4096
> Normal 4096 -> 65536
> HighMem 65536 -> 65536
> Movable zone start PFN for each node
> early_node_map[1] active PFN ranges
> 0: 0 -> 65536
> On node 0 totalpages: 65536
> DMA zone: 52 pages used for memmap
> DMA zone: 0 pages reserved
> DMA zone: 4044 pages, LIFO batch:0
> Normal zone: 780 pages used for memmap
> Normal zone: 60660 pages, LIFO batch:15
> HighMem zone: 0 pages used for memmap
> Movable zone: 0 pages used for memmap
>
>
> which presumably means that new pages above pfn 65536 should be in
> the highmem zone? But PageHighMem() returns false for those pages.
>
See x86-32's arch_add_memory(). It's now designed that "all new memory will go into
ZONE_HIGHMEM".
(Because added memory is tend to be removed later.)

Thanks,
-Kame

2008-03-27 01:25:48

by Christoph Lameter

[permalink] [raw]
Subject: Re: Trying to make use of hotplug memory for xen balloon driver

On Wed, 26 Mar 2008, Dave Hansen wrote:

> You might also note that most of the ppc64 memory hotplug is driven by
> userspace. The hypervisor actually contacts a daemon on the guest to
> tell it where its new memory is. That daemon does the addition
> through /sys/devices/system/memory/probe.

Would it be possible to have the balloon driver use the memory hotplug
interface instead? That would generalize the memory hotplug logic and you
will likely find that lots of issues have already been addressed in that
code.


2008-03-27 05:58:18

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: Trying to make use of hotplug memory for xen balloon driver

KAMEZAWA Hiroyuki wrote:
> On Wed, 26 Mar 2008 16:11:54 -0700
> Jeremy Fitzhardinge <[email protected]> wrote:
>
>
>> Hi,
>>
>> I'm trying to make use of hotplug memory in the Xen balloon driver. If
>> you want to expand a domain to be larger than its initial size, it must
>> add new page structures to describe the new memory.
>>
>> The platform is x86-32, with CONFIG_SPARSEMEM and
>> CONFIG_HOTPLUG_MEMORY. Because the new memory is only pseudo-physical,
>> the physical address within the domain is arbitrary, and I added a
>> add_memory_resource() function so I could use allocate_resource() to
>> find an appropriate address to put the new memory at.
>>
>>
> welcome to chaos of memory hotplug :)
>
>
>> 1. the online_page() raises an error:
>>
>> Bad page state in process 'events/0'
>> page:c16fa0cc flags:0x00000000 mapping:00000000 mapcount:1 count:0
>> Trying to fix it up, but a reboot is needed
>>
>
> Hmm, this seems memmap is not initialized correctly...
> page->flags == 0 means page is in ZONE_DMA.(it's only 16MB range on x86)
> I think memmap is not initilalized.
>
> Calling path to memmap initailization is.
> ==
> add_memory()
> -> arch_add_memory()
> -> __add_page()
> -> __add_section()
> -> __add_zone()
> -> memmap_init_zone()
> ==
> Please check what arch_add_memory() is called, at first.
>

Ah, I see what it is. I wasn't trying to add enough memory. It adds in
units of SECTION_SIZE_BITS, which is 2^30 on 32-bit PAE. When I
increase the initial balloon extension to PAGES_PER_SECTION pages, I
make some more progress:

xen_balloon: Initialising balloon driver.
trying to reserve 262144 pages (1073741824 bytes) for balloon
bootmem alloc of 147456 bytes failed!
Kernel panic - not syncing: Out of memory
Pid: 1, comm: swapper Not tainted 2.6.25-rc7-x86-latest.git-dirty #361
[<c01299dc>] panic+0x49/0x102
[<c0647c3c>] __alloc_bootmem+0x24/0x29
[<c0647c6d>] __alloc_bootmem_node+0x2c/0x34
[<c044bd97>] zone_wait_table_init+0x45/0x95
[<c0467258>] init_currently_empty_zone+0x1d/0xaa
[<c01738ea>] __add_pages+0x88/0xdb
[<c011c1a5>] arch_add_memory+0x25/0x2b
[<c01737a9>] add_memory_resource+0x2f/0x36
[<c064e487>] balloon_init+0x1b8/0x2b9
[<c0635495>] kernel_init+0x137/0x292
[<c063535e>] ? kernel_init+0x0/0x292
[<c063535e>] ? kernel_init+0x0/0x292
[<c0108b67>] kernel_thread_helper+0x7/0x10
=======================


What's the rationale for setting SECTION_SIZE_BITS to 30? Seems like a
fairly large chunk.

J

2008-03-27 06:06:54

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: Trying to make use of hotplug memory for xen balloon driver

On Wed, 26 Mar 2008 22:57:57 -0700
Jeremy Fitzhardinge <[email protected]> wrote:

> Ah, I see what it is. I wasn't trying to add enough memory. It adds in
> units of SECTION_SIZE_BITS, which is 2^30 on 32-bit PAE. When I
> increase the initial balloon extension to PAGES_PER_SECTION pages, I
> make some more progress:
>
> xen_balloon: Initialising balloon driver.
> trying to reserve 262144 pages (1073741824 bytes) for balloon
> bootmem alloc of 147456 bytes failed!
> Kernel panic - not syncing: Out of memory
> Pid: 1, comm: swapper Not tainted 2.6.25-rc7-x86-latest.git-dirty #361
> [<c01299dc>] panic+0x49/0x102
> [<c0647c3c>] __alloc_bootmem+0x24/0x29
> [<c0647c6d>] __alloc_bootmem_node+0x2c/0x34
> [<c044bd97>] zone_wait_table_init+0x45/0x95
> [<c0467258>] init_currently_empty_zone+0x1d/0xaa
> [<c01738ea>] __add_pages+0x88/0xdb
> [<c011c1a5>] arch_add_memory+0x25/0x2b
> [<c01737a9>] add_memory_resource+0x2f/0x36
> [<c064e487>] balloon_init+0x1b8/0x2b9
> [<c0635495>] kernel_init+0x137/0x292
> [<c063535e>] ? kernel_init+0x0/0x292
> [<c063535e>] ? kernel_init+0x0/0x292
> [<c0108b67>] kernel_thread_helper+0x7/0x10
> =======================
>
>
> What's the rationale for setting SECTION_SIZE_BITS to 30? Seems like a
> fairly large chunk.
>
At first, I believe usual DIMM size is bigger than SECTION_SIZE_BITS. This is
designed for hardware-based hotplug.

If you want to use memory-hotplug for virtualized enviroment, it's good to make
this to be smaller chunk. Powerpc/IBM lpar uses 16MB chunk.

It's a trade-off between section mainainance cost v.s. size of plugged memory.
please find the best.

Thanks,
-Kame





2008-03-27 06:09:38

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: Trying to make use of hotplug memory for xen balloon driver

KAMEZAWA Hiroyuki wrote:
> At first, I believe usual DIMM size is bigger than SECTION_SIZE_BITS. This is
> designed for hardware-based hotplug.
>
> If you want to use memory-hotplug for virtualized enviroment, it's good to make
> this to be smaller chunk. Powerpc/IBM lpar uses 16MB chunk.
>
> It's a trade-off between section mainainance cost v.s. size of plugged memory.
> please find the best.
>

Yes, that's what I thought. I'd been thinking of something around the
64-256MB mark. I'll experiment, but I've got some Xen-specific problems
to solve first.

Thanks,
J

2008-03-27 20:55:21

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: Trying to make use of hotplug memory for xen balloon driver

KAMEZAWA Hiroyuki wrote:
> On Wed, 26 Mar 2008 22:57:57 -0700
> Jeremy Fitzhardinge <[email protected]> wrote:
>
>
>> Ah, I see what it is. I wasn't trying to add enough memory. It adds in
>> units of SECTION_SIZE_BITS, which is 2^30 on 32-bit PAE. When I
>> increase the initial balloon extension to PAGES_PER_SECTION pages, I
>> make some more progress:
>>
>> xen_balloon: Initialising balloon driver.
>> trying to reserve 262144 pages (1073741824 bytes) for balloon
>> bootmem alloc of 147456 bytes failed!
>> Kernel panic - not syncing: Out of memory
>> Pid: 1, comm: swapper Not tainted 2.6.25-rc7-x86-latest.git-dirty #361
>> [<c01299dc>] panic+0x49/0x102
>> [<c0647c3c>] __alloc_bootmem+0x24/0x29
>> [<c0647c6d>] __alloc_bootmem_node+0x2c/0x34
>> [<c044bd97>] zone_wait_table_init+0x45/0x95
>> [<c0467258>] init_currently_empty_zone+0x1d/0xaa
>> [<c01738ea>] __add_pages+0x88/0xdb
>> [<c011c1a5>] arch_add_memory+0x25/0x2b
>> [<c01737a9>] add_memory_resource+0x2f/0x36
>> [<c064e487>] balloon_init+0x1b8/0x2b9
>> [<c0635495>] kernel_init+0x137/0x292
>> [<c063535e>] ? kernel_init+0x0/0x292
>> [<c063535e>] ? kernel_init+0x0/0x292
>> [<c0108b67>] kernel_thread_helper+0x7/0x10
>> =======================
>>
>>
>> What's the rationale for setting SECTION_SIZE_BITS to 30? Seems like a
>> fairly large chunk.
>>
>>
> At first, I believe usual DIMM size is bigger than SECTION_SIZE_BITS. This is
> designed for hardware-based hotplug.
>
> If you want to use memory-hotplug for virtualized enviroment, it's good to make
> this to be smaller chunk. Powerpc/IBM lpar uses 16MB chunk.
>
> It's a trade-off between section mainainance cost v.s. size of plugged memory.
> please find the best.

Hm, I tried reducing it to 2^28 (=256M), but I get a compilation failure:

CC arch/x86/kernel/asm-offsets.s
In file included from /home/jeremy/hg/xen/paravirt/linux/include/linux/suspend.h:11,
from /home/jeremy/hg/xen/paravirt/linux/arch/x86/kernel/asm-offsets_32.c:11,
from /home/jeremy/hg/xen/paravirt/linux/arch/x86/kernel/asm-offsets.c:2:
/home/jeremy/hg/xen/paravirt/linux/include/linux/mm.h:458:2: error: #error SECTIONS_WIDTH+NODES_WIDTH+ZONES_WIDTH > FLAGS_RESERVED
make[3]: *** [arch/x86/kernel/asm-offsets.s] Error 1


2^29 works.

J

2008-03-27 22:24:19

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: Trying to make use of hotplug memory for xen balloon driver

Dave Hansen wrote:
> The flags being all null looks highly suspicious to me.
>
> Once you've done an add_memory(), the new sections should show up
> in /sys. Do you see them in there?
>
> Once they show up, you can online them with:
>
> echo online > /sys/devices/system/memory/memoryXXX/state
>
> That's what actually goes and mucks with the 'struct zone's and the
> pgdats to expand them. It will also call online_page() on the whole
> range. I think you're trying to do this manually, and missing part of
> it.

Hm, actually this is precisely the wrong thing to do in this case. When
the balloon driver adds a new section of hotplug memory, its doing it to
get the page structures, but there's no actual memory backing those
pages. The memory only comes into existence on a page-by-page basis
when the balloon driver gets memory from the hypervisor and attaches it
to each page (the balloon driver uses online_page() on each page as its
ready).

If the user does a mass online via /sys the system explodes because it
onlines a large number of pages which have no backing memory. Since
none of those pages can be mapped, the kernel explodes in a variety of
interesting ways.

So I'd really like to inhibit the sysfs interface on these sections.
Thoughts?

Thanks,
J

2008-03-28 00:17:29

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: Trying to make use of hotplug memory for xen balloon driver

On Thu, 27 Mar 2008 13:54:52 -0700
Jeremy Fitzhardinge <[email protected]> wrote:

> KAMEZAWA Hiroyuki wrote:
> > On Wed, 26 Mar 2008 22:57:57 -0700
> > Jeremy Fitzhardinge <[email protected]> wrote:
> >
> >
> >> Ah, I see what it is. I wasn't trying to add enough memory. It adds in
> >> units of SECTION_SIZE_BITS, which is 2^30 on 32-bit PAE. When I
> >> increase the initial balloon extension to PAGES_PER_SECTION pages, I
> >> make some more progress:
> >>
> >> xen_balloon: Initialising balloon driver.
> >> trying to reserve 262144 pages (1073741824 bytes) for balloon
> >> bootmem alloc of 147456 bytes failed!
> >> Kernel panic - not syncing: Out of memory
> >> Pid: 1, comm: swapper Not tainted 2.6.25-rc7-x86-latest.git-dirty #361
> >> [<c01299dc>] panic+0x49/0x102
> >> [<c0647c3c>] __alloc_bootmem+0x24/0x29
> >> [<c0647c6d>] __alloc_bootmem_node+0x2c/0x34
> >> [<c044bd97>] zone_wait_table_init+0x45/0x95
> >> [<c0467258>] init_currently_empty_zone+0x1d/0xaa
> >> [<c01738ea>] __add_pages+0x88/0xdb
> >> [<c011c1a5>] arch_add_memory+0x25/0x2b
> >> [<c01737a9>] add_memory_resource+0x2f/0x36
> >> [<c064e487>] balloon_init+0x1b8/0x2b9
> >> [<c0635495>] kernel_init+0x137/0x292
> >> [<c063535e>] ? kernel_init+0x0/0x292
> >> [<c063535e>] ? kernel_init+0x0/0x292
> >> [<c0108b67>] kernel_thread_helper+0x7/0x10
> >> =======================
> >>
> >>
> >> What's the rationale for setting SECTION_SIZE_BITS to 30? Seems like a
> >> fairly large chunk.
> >>
> >>
> > At first, I believe usual DIMM size is bigger than SECTION_SIZE_BITS. This is
> > designed for hardware-based hotplug.
> >
> > If you want to use memory-hotplug for virtualized enviroment, it's good to make
> > this to be smaller chunk. Powerpc/IBM lpar uses 16MB chunk.
> >
> > It's a trade-off between section mainainance cost v.s. size of plugged memory.
> > please find the best.
>
> Hm, I tried reducing it to 2^28 (=256M), but I get a compilation failure:
>
> CC arch/x86/kernel/asm-offsets.s
> In file included from /home/jeremy/hg/xen/paravirt/linux/include/linux/suspend.h:11,
> from /home/jeremy/hg/xen/paravirt/linux/arch/x86/kernel/asm-offsets_32.c:11,
> from /home/jeremy/hg/xen/paravirt/linux/arch/x86/kernel/asm-offsets.c:2:
> /home/jeremy/hg/xen/paravirt/linux/include/linux/mm.h:458:2: error: #error SECTIONS_WIDTH+NODES_WIDTH+ZONES_WIDTH > FLAGS_RESERVED
> make[3]: *** [arch/x86/kernel/asm-offsets.s] Error 1
>
Ah, Now, section number of the page is encoded in page->flags.
(Sorry, I'm usually working on 64bit memory-hotplug...)
see mm.h
==
371 * There are three possibilities for how page->flags get
372 * laid out. The first is for the normal case, without
373 * sparsemem. The second is for sparsemem when there is
374 * plenty of space for node and section. The last is when
375 * we have run out of space and have to fall back to an
376 * alternate (slower) way of determining the node.
377 *
378 * No sparsemem: | NODE | ZONE | ... | FLAGS |
379 * with space for node: | SECTION | NODE | ZONE | ... | FLAGS |
380 * no space for node: | SECTION | ZONE | ... | FLAGS |
==

Hmm, in other archs, sparsemem-vmemmap allows us to remove bits for section
(recent Christoph's work.) But for x86-32, kernel's NORMAL area seems to be
not enough to maintain vmemmap.

I have no good idea against this, now.

Thanks,
-Kame



2008-03-28 18:21:31

by Dave Hansen

[permalink] [raw]
Subject: Re: Trying to make use of hotplug memory for xen balloon driver

On Thu, 2008-03-27 at 15:23 -0700, Jeremy Fitzhardinge wrote:
> If the user does a mass online via /sys the system explodes because it
> onlines a large number of pages which have no backing memory. Since
> none of those pages can be mapped, the kernel explodes in a variety of
> interesting ways.

Yeah, it does look like you need some kind of partial onlining.

> So I'd really like to inhibit the sysfs interface on these sections.
> Thoughts?

The balloon driver isn't an exact fit for memory hotplug as it stands,
so there are going to be a few growing pains here. :)

I'm not sure just inhibiting sysfs is the best thing. What would you
think about adding partial sections, initializing the 'struct page's,
but just not touching the memory?

-- Dave