2006-03-08 13:42:42

by Yasunori Goto

[permalink] [raw]
Subject: [PATCH: 010/017](RFC) Memory hotplug for new nodes v.3. (allocate wait table)


Wait_table is initialized according to zone size at boot time.
But, we cannot know the maixmum zone size when memory hotplug is enabled.
It can be changed.... And resizing of wait_table is too hard.

So kernel allocate and initialzie wait_table as its maximum size.

Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: Yasunori Goto <[email protected]>

Index: pgdat6/mm/page_alloc.c
===================================================================
--- pgdat6.orig/mm/page_alloc.c 2006-03-06 20:27:16.000000000 +0900
+++ pgdat6/mm/page_alloc.c 2006-03-06 20:27:36.000000000 +0900
@@ -1784,6 +1784,7 @@ void __init build_all_zonelists(void)
*/
#define PAGES_PER_WAITQUEUE 256

+#ifdef CONFIG_MEMORY_HOTPLUG
static inline unsigned long wait_table_size(unsigned long pages)
{
unsigned long size = 1;
@@ -1802,6 +1803,17 @@ static inline unsigned long wait_table_s

return max(size, 4UL);
}
+#else
+/*
+ * Because zone size might be changed by hot-add,
+ * We can't determin suitable size for wait_table as traditional.
+ * So, we use maximum size.
+ */
+static inline unsigned long wait_table_size(unsigned long pages)
+{
+ return 4096UL:
+}
+#endif

/*
* This is an integer logarithm so that shifts can be used later
@@ -2070,7 +2082,7 @@ void __init setup_per_cpu_pageset(void)
#endif

static __meminit
-void zone_wait_table_init(struct zone *zone, unsigned long zone_size_pages)
+int zone_wait_table_init(struct zone *zone, unsigned long zone_size_pages)
{
int i;
struct pglist_data *pgdat = zone->zone_pgdat;
@@ -2081,12 +2093,31 @@ void zone_wait_table_init(struct zone *z
*/
zone->wait_table_size = wait_table_size(zone_size_pages);
zone->wait_table_bits = wait_table_bits(zone->wait_table_size);
- zone->wait_table = (wait_queue_head_t *)
- alloc_bootmem_node(pgdat, zone->wait_table_size
- * sizeof(wait_queue_head_t));
+ if (system_state == SYSTEM_BOOTING) {
+ zone->wait_table = (wait_queue_head_t *)
+ alloc_bootmem_node(pgdat, zone->wait_table_size
+ * sizeof(wait_queue_head_t));
+ } else {
+ int table_size;
+ /* we can use kmalloc() in run time */
+ do {
+ table_size = zone->wait_table_size
+ * sizeof(wait_queue_head_t);
+ zone->wait_table = kmalloc(table_size, GFP_ATOMIC);
+ if (!zone->wait_table) {
+ /* try half size */
+ zone->wait_table_size >>= 1;
+ zone->wait_table_bits =
+ wait_table_bits(zone->wait_table_size);
+ }
+ } while (zone->wait_table_size && !zone->wait_table);
+ }
+ if (!zone->wait_table)
+ return -ENOMEM;

for(i = 0; i < zone->wait_table_size; ++i)
init_waitqueue_head(zone->wait_table + i);
+ return 0;
}

static __meminit void zone_pcp_init(struct zone *zone)
@@ -2112,8 +2143,10 @@ __meminit int init_currently_empty_zone(
unsigned long size)
{
struct pglist_data *pgdat = zone->zone_pgdat;
-
- zone_wait_table_init(zone, size);
+ int ret;
+ ret = zone_wait_table_init(zone, size);
+ if (ret)
+ return ret;
pgdat->nr_zones = zone_idx(zone) + 1;

zone->zone_start_pfn = zone_start_pfn;

--
Yasunori Goto



2006-03-09 12:03:13

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH: 010/017](RFC) Memory hotplug for new nodes v.3. (allocate wait table)

Yasunori Goto <[email protected]> wrote:
>
> + /* we can use kmalloc() in run time */
> + do {
> + table_size = zone->wait_table_size
> + * sizeof(wait_queue_head_t);
> + zone->wait_table = kmalloc(table_size, GFP_ATOMIC);

Again, GFP_KERNEL would be better is possible.

Won't this place the node's wait_table into a different node's memory?

2006-03-09 12:23:20

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH: 010/017](RFC) Memory hotplug for new nodes v.3. (allocate wait table)

On Thursday 09 March 2006 13:00, Andrew Morton wrote:
> Yasunori Goto <[email protected]> wrote:
> >
> > + /* we can use kmalloc() in run time */
> > + do {
> > + table_size = zone->wait_table_size
> > + * sizeof(wait_queue_head_t);
> > + zone->wait_table = kmalloc(table_size, GFP_ATOMIC);
>
> Again, GFP_KERNEL would be better is possible.
>
> Won't this place the node's wait_table into a different node's memory?

Yes, kmalloc_node would be better.

-Andi


2006-03-10 07:20:43

by Yasunori Goto

[permalink] [raw]
Subject: Re: [PATCH: 010/017](RFC) Memory hotplug for new nodes v.3. (allocate wait table)

> On Thursday 09 March 2006 13:00, Andrew Morton wrote:
> > Yasunori Goto <[email protected]> wrote:
> > >
> > > + /* we can use kmalloc() in run time */
> > > + do {
> > > + table_size = zone->wait_table_size
> > > + * sizeof(wait_queue_head_t);
> > > + zone->wait_table = kmalloc(table_size, GFP_ATOMIC);
> >
> > Again, GFP_KERNEL would be better is possible.

Oops.
This was inside of spin_lock in old my patch.
But, it is moved out from spin_lock as a result of refactoring
and I didn't notice that.
Yes. GFP_KERNEL is better.

> >
> > Won't this place the node's wait_table into a different node's memory?
>
> Yes, kmalloc_node would be better.

Kmalloc_node() will not work well at here,
because this patch is to initialize structures for new node -itself-.
It will work after that completion of initalize pgdat and wait_table.

To use new node's memory at here, other consideration will be necessary.
But, I would like to use kmalloc() to simplify my patch at this time.

Thanks.

--
Yasunori Goto