2008-03-28 01:09:29

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 0 of 4] [RFC] hotplug memory: minor updates

Hi all,

Here's a small set of patches for memory hotplug which I needed to get
the Xen balloon driver working.

1. add add_memory_resource(), which allow a caller to provide the
resource structure for memory being added,
2. add warnings when you try to add non-section-aligned and sized memory
since it has undesireable results when you do so, and
3. decrease the i386 PAE section size to 2^29 (=512MB), since 1G is a
bit too large for virtual memory ballooning
(4. don't steal all memory in paravirt_disable_iospace)

Still unresolved is how to prevent the user from onlining balloon
memory via sysfs, since that will almost certainly cause the kernel to
crash.

Thanks,
J


2008-03-28 00:37:56

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 3 of 4] sparsemem: reduce i386 PAE section size

On Thu, 27 Mar 2008, Jeremy Fitzhardinge wrote:

> A 1G section size makes memory hotplug too coarse in a virtual
> environment. Retuce it by a factor of 2 to 512M. I would have liked
> to make it smaller, but it runs out of reserved flags in the page flags.

There are patches in mm which changes that situation. You should have 5 or
6 more flags to play with.

2008-03-28 01:09:45

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 4 of 4] paravirt_ops: don't steal memory resources in paravirt_disable_iospace

The memory resource is also used for main memory, and we need it to
allocate physical addresses for memory hotplug. Knobbling io space is
enough to get the job done anyway.

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
Cc: Rusty Russell <[email protected]>
---
arch/x86/kernel/paravirt.c | 18 +-----------------
1 file changed, 1 insertion(+), 17 deletions(-)

diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -206,13 +206,6 @@
.flags = IORESOURCE_IO | IORESOURCE_BUSY,
};

-static struct resource reserve_iomem = {
- .start = 0,
- .end = -1,
- .name = "paravirt-iomem",
- .flags = IORESOURCE_MEM | IORESOURCE_BUSY,
-};
-
/*
* Reserve the whole legacy IO space to prevent any legacy drivers
* from wasting time probing for their hardware. This is a fairly
@@ -222,16 +215,7 @@
*/
int paravirt_disable_iospace(void)
{
- int ret;
-
- ret = request_resource(&ioport_resource, &reserve_ioports);
- if (ret == 0) {
- ret = request_resource(&iomem_resource, &reserve_iomem);
- if (ret)
- release_resource(&reserve_ioports);
- }
-
- return ret;
+ return request_resource(&ioport_resource, &reserve_ioports);
}

static DEFINE_PER_CPU(enum paravirt_lazy_mode, paravirt_lazy_mode) = PARAVIRT_LAZY_NONE;

2008-03-28 01:10:00

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 1 of 4] hotplug-memory: add add_memory_resource

The Xen balloon driver uses hotplug memory to extend the amount of
pseudo-physical memory available to a virtual domain. Since this memory
is virtual, it can go anywhere within the kernel's pseudo-physical
address space.

This patch adds a new hotplug memory call, add_memory_resource(), which
adds memory corresponding to a pre-constructed resource. This allows
callers to use allocate_resource() to allocate a suitable chunk of
address space for the new memory (as well as cosmetic details, like a
descriptive name).

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Yasunori Goto <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Dave Hansen <[email protected]>
---
include/linux/memory_hotplug.h | 3 +++
mm/memory_hotplug.c | 20 ++++++++++++++++----
2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -171,7 +171,10 @@

#endif /* ! CONFIG_MEMORY_HOTPLUG */

+struct resource;
+
extern int add_memory(int nid, u64 start, u64 size);
+extern int add_memory_resource(int nid, struct resource *res);
extern int arch_add_memory(int nid, u64 start, u64 size);
extern int remove_memory(u64 start, u64 size);
extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -278,14 +278,28 @@

int add_memory(int nid, u64 start, u64 size)
{
- pg_data_t *pgdat = NULL;
- int new_pgdat = 0;
struct resource *res;
int ret;

res = register_memory_resource(start, size);
if (!res)
return -EEXIST;
+
+ ret = add_memory_resource(nid, res);
+
+ if (ret)
+ release_memory_resource(res);
+
+ return ret;
+}
+
+int add_memory_resource(int nid, struct resource *res)
+{
+ pg_data_t *pgdat = NULL;
+ int new_pgdat = 0;
+ int ret;
+ u64 start = res->start;
+ u64 size = res->end - res->start + 1;

if (!node_online(nid)) {
pgdat = hotadd_new_pgdat(nid, start);
@@ -320,8 +334,6 @@
/* rollback pgdat allocation and others */
if (new_pgdat)
rollback_node_hotadd(nid, pgdat);
- if (res)
- release_memory_resource(res);

return ret;
}

2008-03-28 01:10:28

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 2 of 4] hotplug-memory: adding non-section-aligned memory is bad

Adding non-section-aligned memory will cause bad results, because page
structures are only built on section-aligned boundaries. Also, memory
whose size isn't a multiple of the section size is silently ignored.

This patch adds a couple of WARN_ONs to help confused programmers work
out what's going wrong when they hotplug memory without being aware of
these constraints.

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Yasunori Goto <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Dave Hansen <[email protected]>

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -784,6 +784,9 @@
#define PAGES_PER_SECTION (1UL << PFN_SECTION_SHIFT)
#define PAGE_SECTION_MASK (~(PAGES_PER_SECTION-1))

+#define SECTION_SIZE (1UL << SECTION_SIZE_BITS)
+#define SECTION_SIZE_MASK (SECTION_SIZE - 1)
+
#define SECTION_BLOCKFLAGS_BITS \
((1UL << (PFN_SECTION_SHIFT - pageblock_order)) * NR_PAGEBLOCK_BITS)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -300,6 +300,11 @@
int ret;
u64 start = res->start;
u64 size = res->end - res->start + 1;
+
+ /* Adding non-section-aligned memory will give unexpected
+ and unintuitive results. */
+ WARN_ON((start & SECTION_SIZE_MASK) != 0);
+ WARN_ON((size & SECTION_SIZE_MASK) != 0);

if (!node_online(nid)) {
pgdat = hotadd_new_pgdat(nid, start);

2008-03-28 01:10:46

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 3 of 4] sparsemem: reduce i386 PAE section size

A 1G section size makes memory hotplug too coarse in a virtual
environment. Retuce it by a factor of 2 to 512M. I would have liked
to make it smaller, but it runs out of reserved flags in the page flags.

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Yasunori Goto <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Dave Hansen <[email protected]>
---
include/asm-x86/sparsemem.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/asm-x86/sparsemem.h b/include/asm-x86/sparsemem.h
--- a/include/asm-x86/sparsemem.h
+++ b/include/asm-x86/sparsemem.h
@@ -16,7 +16,7 @@

#ifdef CONFIG_X86_32
# ifdef CONFIG_X86_PAE
-# define SECTION_SIZE_BITS 30
+# define SECTION_SIZE_BITS 29
# define MAX_PHYSADDR_BITS 36
# define MAX_PHYSMEM_BITS 36
# else

2008-03-28 01:26:33

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 2 of 4] hotplug-memory: adding non-section-aligned memory is bad

On Thu, 27 Mar 2008 17:28:38 -0700
Jeremy Fitzhardinge <[email protected]> wrote:
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -784,6 +784,9 @@
> #define PAGES_PER_SECTION (1UL << PFN_SECTION_SHIFT)
> #define PAGE_SECTION_MASK (~(PAGES_PER_SECTION-1))
>
> +#define SECTION_SIZE (1UL << SECTION_SIZE_BITS)
> +#define SECTION_SIZE_MASK (SECTION_SIZE - 1)
> +
> #define SECTION_BLOCKFLAGS_BITS \
> ((1UL << (PFN_SECTION_SHIFT - pageblock_order)) * NR_PAGEBLOCK_BITS)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -300,6 +300,11 @@
> int ret;
> u64 start = res->start;
> u64 size = res->end - res->start + 1;
> +
> + /* Adding non-section-aligned memory will give unexpected
> + and unintuitive results. */
> + WARN_ON((start & SECTION_SIZE_MASK) != 0);
> + WARN_ON((size & SECTION_SIZE_MASK) != 0);
>
Why just WARNING ? not BUG_ON?

Thanks,
-Kame

2008-03-28 01:36:56

by Yasunori Goto

[permalink] [raw]
Subject: Re: [PATCH 2 of 4] hotplug-memory: adding non-section-aligned memory is bad

> On Thu, 27 Mar 2008 17:28:38 -0700
> Jeremy Fitzhardinge <[email protected]> wrote:
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -784,6 +784,9 @@
> > #define PAGES_PER_SECTION (1UL << PFN_SECTION_SHIFT)
> > #define PAGE_SECTION_MASK (~(PAGES_PER_SECTION-1))
> >
> > +#define SECTION_SIZE (1UL << SECTION_SIZE_BITS)
> > +#define SECTION_SIZE_MASK (SECTION_SIZE - 1)
> > +
> > #define SECTION_BLOCKFLAGS_BITS \
> > ((1UL << (PFN_SECTION_SHIFT - pageblock_order)) * NR_PAGEBLOCK_BITS)
> >
> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -300,6 +300,11 @@
> > int ret;
> > u64 start = res->start;
> > u64 size = res->end - res->start + 1;
> > +
> > + /* Adding non-section-aligned memory will give unexpected
> > + and unintuitive results. */
> > + WARN_ON((start & SECTION_SIZE_MASK) != 0);
> > + WARN_ON((size & SECTION_SIZE_MASK) != 0);
> >
> Why just WARNING ? not BUG_ON?

Both Nack.

Because, firmware may occupy some area in the section.
Firmware must exclude those area to notify kernel. So, E820, EFI,
or _CRS of ACPI may return not aligned address and size.
register_memory_resource() and walk_memory_resource() are to skip
them silently. This is intended.

Bye.

--
Yasunori Goto

2008-03-28 01:51:08

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 2 of 4] hotplug-memory: adding non-section-aligned memory is bad

On Fri, 28 Mar 2008 10:34:51 +0900
Yasunori Goto <[email protected]> wrote:
> > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > > --- a/mm/memory_hotplug.c
> > > +++ b/mm/memory_hotplug.c
> > > @@ -300,6 +300,11 @@
> > > int ret;
> > > u64 start = res->start;
> > > u64 size = res->end - res->start + 1;
> > > +
> > > + /* Adding non-section-aligned memory will give unexpected
> > > + and unintuitive results. */
> > > + WARN_ON((start & SECTION_SIZE_MASK) != 0);
> > > + WARN_ON((size & SECTION_SIZE_MASK) != 0);
> > >
> > Why just WARNING ? not BUG_ON?
>
> Both Nack.
>
> Because, firmware may occupy some area in the section.
> Firmware must exclude those area to notify kernel. So, E820, EFI,
> or _CRS of ACPI may return not aligned address and size.
> register_memory_resource() and walk_memory_resource() are to skip
> them silently. This is intended.
>
Ah, ok. sorry.

Jeremy, I think you can check whether you have 'struct page' or not by
pfn_valid().

If pfn_valid() == false, you should call add_memory() and create
a section/mem_map. If pfn_valid() == true, you should just remove
PG_reserved bit in mem_map by online_page().

Thanks,
-Kame

2008-03-28 02:13:58

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH 2 of 4] hotplug-memory: adding non-section-aligned memory is bad

KAMEZAWA Hiroyuki wrote:
> On Fri, 28 Mar 2008 10:34:51 +0900
> Yasunori Goto <[email protected]> wrote:
>
>>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>>> --- a/mm/memory_hotplug.c
>>>> +++ b/mm/memory_hotplug.c
>>>> @@ -300,6 +300,11 @@
>>>> int ret;
>>>> u64 start = res->start;
>>>> u64 size = res->end - res->start + 1;
>>>> +
>>>> + /* Adding non-section-aligned memory will give unexpected
>>>> + and unintuitive results. */
>>>> + WARN_ON((start & SECTION_SIZE_MASK) != 0);
>>>> + WARN_ON((size & SECTION_SIZE_MASK) != 0);
>>>>
>>>>
>>> Why just WARNING ? not BUG_ON?
>>>
>> Both Nack.
>>
>> Because, firmware may occupy some area in the section.
>> Firmware must exclude those area to notify kernel. So, E820, EFI,
>> or _CRS of ACPI may return not aligned address and size.
>> register_memory_resource() and walk_memory_resource() are to skip
>> them silently. This is intended.
>>
>>
> Ah, ok. sorry.
>
> Jeremy, I think you can check whether you have 'struct page' or not by
> pfn_valid().
>
> If pfn_valid() == false, you should call add_memory() and create
> a section/mem_map. If pfn_valid() == true, you should just remove
> PG_reserved bit in mem_map by online_page().

OK. Would that ever be necessary if I explicitly align my start and size?

J

2008-03-28 02:26:48

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH 3 of 4] sparsemem: reduce i386 PAE section size

Christoph Lameter wrote:
>
> There are patches in mm which changes that situation. You should have 5 or
> 6 more flags to play with.
>

Oh, good. We can play with it some more when everything gets merged
together.

J

2008-03-28 02:30:06

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 3 of 4] sparsemem: reduce i386 PAE section size

On Thu, 27 Mar 2008, Jeremy Fitzhardinge wrote:

> Christoph Lameter wrote:
> >
> > There are patches in mm which changes that situation. You should have 5 or 6
> > more flags to play with.
> >
>
> Oh, good. We can play with it some more when everything gets merged together.

Only works on i386 if you either use sparsemem / vmemmap or not sparsemem
though. Is there any need for the other sparsemem memory models? Or could
we disable them like on x86_64?

2008-03-28 02:31:27

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 2 of 4] hotplug-memory: adding non-section-aligned memory is bad

On Thu, 27 Mar 2008 19:13:20 -0700
Jeremy Fitzhardinge <[email protected]> wrote:

> >> Because, firmware may occupy some area in the section.
> >> Firmware must exclude those area to notify kernel. So, E820, EFI,
> >> or _CRS of ACPI may return not aligned address and size.
> >> register_memory_resource() and walk_memory_resource() are to skip
> >> them silently. This is intended.
> >>
> >>
> > Ah, ok. sorry.
> >
> > Jeremy, I think you can check whether you have 'struct page' or not by
> > pfn_valid().
> >
> > If pfn_valid() == false, you should call add_memory() and create
> > a section/mem_map. If pfn_valid() == true, you should just remove
> > PG_reserved bit in mem_map by online_page().
>
> OK. Would that ever be necessary if I explicitly align my start and size?
>
Maybe no. but be carefull not to register resource in overlapped manner.
(I wrote online_page() in above, but online_pages() is maybe better.
It does all what you want.)

Start/Size are automatically alined to section in __add_pages().

See below.
==
110 int __add_pages(struct zone *zone, unsigned long phys_start_pfn,
111 unsigned long nr_pages)
112 {
113 unsigned long i;
114 int err = 0;
115 int start_sec, end_sec;
116 /* during initialize mem_map, align hot-added range to section */
117 start_sec = pfn_to_section_nr(phys_start_pfn);
118 end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
==

And online_pages(), which onlines pages in [pfn, pfn + size), will see
registerred resources within [pfn, pfn + size).
==
184 int online_pages(unsigned long pfn, unsigned long nr_pages)
<snip>
227 walk_memory_resource(pfn, nr_pages, &onlined_pages,
228 online_pages_range);
==

One of my concern is how-to-handle sysfs status in this case.

Another concerns is, currently, I think no one tried to online a section twice
to online reserved pages in a section. so, you may see bug.
For example, mem_notify() in online_pages() will be called several times against
a section.

Thanks,
-Kame

2008-03-28 02:42:36

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH 4 of 4] paravirt_ops: don't steal memory resources in paravirt_disable_iospace

On Friday 28 March 2008 11:28:40 Jeremy Fitzhardinge wrote:
> The memory resource is also used for main memory, and we need it to
> allocate physical addresses for memory hotplug. Knobbling io space is
> enough to get the job done anyway.

Indeed, tested with lguest and it works for me. I've put this in my
patchqueue, but if it all gets taken by someone else I can easily drop it.

Thanks,
Rusty.

2008-03-28 02:51:53

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH 2 of 4] hotplug-memory: adding non-section-aligned memory is bad

KAMEZAWA Hiroyuki wrote:
> On Thu, 27 Mar 2008 19:13:20 -0700
> Jeremy Fitzhardinge <[email protected]> wrote:
>
>
>>>> Because, firmware may occupy some area in the section.
>>>> Firmware must exclude those area to notify kernel. So, E820, EFI,
>>>> or _CRS of ACPI may return not aligned address and size.
>>>> register_memory_resource() and walk_memory_resource() are to skip
>>>> them silently. This is intended.
>>>>
>>>>
>>>>
>>> Ah, ok. sorry.
>>>
>>> Jeremy, I think you can check whether you have 'struct page' or not by
>>> pfn_valid().
>>>
>>> If pfn_valid() == false, you should call add_memory() and create
>>> a section/mem_map. If pfn_valid() == true, you should just remove
>>> PG_reserved bit in mem_map by online_page().
>>>
>> OK. Would that ever be necessary if I explicitly align my start and size?
>>
>>
> Maybe no. but be carefull not to register resource in overlapped manner.
>

Yes. That's why I added add_memory_resource(), so I could use
allocate_resource() to find a non-overlapping range to put the new memory.

> (I wrote online_page() in above, but online_pages() is maybe better.
> It does all what you want.)
>

No, for my use-case the pages must be onlined one by one as they get
some physical memory assigned to them. At the time I do add_memory(),
I'm just allocating page structures, but there's no memory backing that
range.

That's why I need to disable the sysfs onlining interface, because it
bulk onlines the pages before there's anything behind them.

> Start/Size are automatically alined to section in __add_pages().
>
> See below.
> ==
> 110 int __add_pages(struct zone *zone, unsigned long phys_start_pfn,
> 111 unsigned long nr_pages)
> 112 {
> 113 unsigned long i;
> 114 int err = 0;
> 115 int start_sec, end_sec;
> 116 /* during initialize mem_map, align hot-added range to section */
> 117 start_sec = pfn_to_section_nr(phys_start_pfn);
> 118 end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
> ==
>
> And online_pages(), which onlines pages in [pfn, pfn + size), will see
> registerred resources within [pfn, pfn + size).
> ==
> 184 int online_pages(unsigned long pfn, unsigned long nr_pages)
> <snip>
> 227 walk_memory_resource(pfn, nr_pages, &onlined_pages,
> 228 online_pages_range);
> ==
>
> One of my concern is how-to-handle sysfs status in this case.
>
> Another concerns is, currently, I think no one tried to online a section twice
> to online reserved pages in a section. so, you may see bug.
> For example, mem_notify() in online_pages() will be called several times against
> a section.
>

I'd really rather prevent online_pages from happening at all, since it
can only cause havoc.

Thanks,
J

2008-03-28 03:17:17

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 2 of 4] hotplug-memory: adding non-section-aligned memory is bad

On Thu, 27 Mar 2008 19:51:26 -0700
Jeremy Fitzhardinge <[email protected]> wrote:

> > (I wrote online_page() in above, but online_pages() is maybe better.
> > It does all what you want.)
> >
>
> No, for my use-case the pages must be onlined one by one as they get
> some physical memory assigned to them. At the time I do add_memory(),
> I'm just allocating page structures, but there's no memory backing that
> range.
>
yes, I see.

> That's why I need to disable the sysfs onlining interface, because it
> bulk onlines the pages before there's anything behind them.
>

My point is. online_pages() does following things.

- call notifier
- update information , total_pages etc...
- re-configure zonelist if necessary...

But online_page() not. Hmm...

How about capturing online_page() by balloon ?
ex.)
==
call add_memory() to create mem_map
call online_pages() against the whole section. <=== call this without sysfs.
online_pages() do misc. jobs
call online_page() one by one (arch dependent) called by walk_memory_resource.
online_page() will finally call free_page(page).
<=========== Xen capture here.
Don't free onlined page and swallow them into baloon driver.
==

Thanks,
-Kame

2008-03-28 03:23:24

by Yasunori Goto

[permalink] [raw]
Subject: Re: [PATCH 2 of 4] hotplug-memory: adding non-section-aligned memory is bad

Jeremy-san.

BTW, have you ever seen this slides?

http://www.linuxsymposium.org/2006/hotplug_slides.pdf

I'm glad if his idea can help you.

Thanks.

--
Yasunori Goto

2008-03-28 04:07:36

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH 2 of 4] hotplug-memory: adding non-section-aligned memory is bad

Yasunori Goto wrote:
> BTW, have you ever seen this slides?
>
> http://www.linuxsymposium.org/2006/hotplug_slides.pdf
>
> I'm glad if his idea can help you.
>

I hadn't seen that before, but its fairly similar to what I've implemented.

Thanks,
J

2008-03-28 04:20:49

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH 2 of 4] hotplug-memory: adding non-section-aligned memory is bad

KAMEZAWA Hiroyuki wrote:
> On Thu, 27 Mar 2008 19:51:26 -0700
> Jeremy Fitzhardinge <[email protected]> wrote:
>
>
>>> (I wrote online_page() in above, but online_pages() is maybe better.
>>> It does all what you want.)
>>>
>>>
>> No, for my use-case the pages must be onlined one by one as they get
>> some physical memory assigned to them. At the time I do add_memory(),
>> I'm just allocating page structures, but there's no memory backing that
>> range.
>>
>>
> yes, I see.
>
>
>> That's why I need to disable the sysfs onlining interface, because it
>> bulk onlines the pages before there's anything behind them.
>>
>>
>
> My point is. online_pages() does following things.
>
> - call notifier
> - update information , total_pages etc...
> - re-configure zonelist if necessary...
>
> But online_page() not. Hmm...
>
> How about capturing online_page() by balloon ?
>

You're saying that using online_page() on each page on its own is not
sufficient?

> ex.)
> ==
> call add_memory() to create mem_map
> call online_pages() against the whole section. <=== call this without sysfs.
> online_pages() do misc. jobs
> call online_page() one by one (arch dependent) called by walk_memory_resource.
> online_page() will finally call free_page(page).
> <=========== Xen capture here.
> Don't free onlined page and swallow them into baloon driver.
> ==
>

I'm not sure what you mean by "capture" here. Do you mean a hook? I'd
rather not have to put some Xen-specific hook in here.

What would happen if I did online_pages(pfn, 1) on each page as I
populate it?

J

2008-03-28 04:40:46

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 2 of 4] hotplug-memory: adding non-section-aligned memory is bad

On Thu, 27 Mar 2008 21:20:18 -0700
Jeremy Fitzhardinge <[email protected]> wrote:
> > How about capturing online_page() by balloon ?
> >
>
> You're saying that using online_page() on each page on its own is not
> sufficient?
>
online_page() just remove PG_reserved flags and free it.
Not dealing with global data, like total pages, zonelist, etc.

> > ex.)
> > ==
> > call add_memory() to create mem_map
> > call online_pages() against the whole section. <=== call this without sysfs.
> > online_pages() do misc. jobs
> > call online_page() one by one (arch dependent) called by walk_memory_resource.
> > online_page() will finally call free_page(page).
> > <=========== Xen capture here.
> > Don't free onlined page and swallow them into baloon driver.
> > ==
> >
>
> I'm not sure what you mean by "capture" here. Do you mean a hook? I'd
> rather not have to put some Xen-specific hook in here.
>
ok.

> What would happen if I did online_pages(pfn, 1) on each page as I
> populate it?
>
I think (hope) it works well. But it seems no one tries to do that.

Thanks,
-Kame


2008-03-28 06:19:25

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH 3 of 4] sparsemem: reduce i386 PAE section size

Christoph Lameter wrote:
> Only works on i386 if you either use sparsemem / vmemmap or not sparsemem
> though. Is there any need for the other sparsemem memory models? Or could
> we disable them like on x86_64?

Probably, but I don't know what the tradeoffs are. All I want is
hotplug memory.

J

2008-03-28 09:13:58

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 0 of 4] [RFC] hotplug memory: minor updates


* Jeremy Fitzhardinge <[email protected]> wrote:

> Hi all,
>
> Here's a small set of patches for memory hotplug which I needed to get
> the Xen balloon driver working.

the x86 bits look small and trivial, so i'd suggest you get this into
-mm to after all the other VM patches, because this seems primarily a VM
issue.

i've applied #3 to x86.git straight away. Does #4 have dependencies on
the other patches? I suspect it does so it cannot be applied standalone.

Ingo

2008-03-28 16:07:45

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH 0 of 4] [RFC] hotplug memory: minor updates

Ingo Molnar wrote:
> the x86 bits look small and trivial, so i'd suggest you get this into
> -mm to after all the other VM patches, because this seems primarily a VM
> issue.
>

I would do that, but I'm planning on posting some Xen balloon driver
patches which depend on at least one of these patches.

I'll drop "hotplug-memory: adding non-section-aligned memory is bad"
because it was just warnings and it was NACKed. "hotplug-memory: add
add_memory_resource()" is just a refactor, and should be a noop as far
as the existing add_memory() is concerned.


> i've applied #3 to x86.git straight away. Does #4 have dependencies on
> the other patches? I suspect it does so it cannot be applied standalone.
>

You mean the "paravirt_ops: don't steal memory resources in
paravirt_disable_iospace"? That shouldn't have any dependencies. It's
the only patch in my queue which even references that file. Rusty said
he picked it up too; are you exchanging patches with him?

J

2008-03-28 16:43:41

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 3 of 4] sparsemem: reduce i386 PAE section size

On Thu, 2008-03-27 at 23:18 -0700, Jeremy Fitzhardinge wrote:
> Christoph Lameter wrote:
> > Only works on i386 if you either use sparsemem / vmemmap or not sparsemem
> > though. Is there any need for the other sparsemem memory models? Or could
> > we disable them like on x86_64?
>
> Probably, but I don't know what the tradeoffs are. All I want is
> hotplug memory.

Developers should probably learn the tradeoffs before they go start
poking around in code and sending patches. ;)

Maybe Christoph has a good idea what the tradeoffs are for the vmemmap
variant, but the table-based one is pretty simple. Smaller sections
mean that it costs you larger sparsemem structures and more bits in
page->flags. But, the smaller the section, the more mem_map[] you might
waste.

BTW, the current 1G sections were picked because all the hardware we
knew of at the time had 512MB dimms that had to be added in pairs. So,
we didn't ever have physical hotplug of less than that.

-- Dave

2008-03-28 16:45:27

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 3 of 4] sparsemem: reduce i386 PAE section size

On Thu, 2008-03-27 at 17:28 -0700, Jeremy Fitzhardinge wrote:
> diff --git a/include/asm-x86/sparsemem.h b/include/asm-x86/sparsemem.h
> --- a/include/asm-x86/sparsemem.h
> +++ b/include/asm-x86/sparsemem.h
> @@ -16,7 +16,7 @@
>
> #ifdef CONFIG_X86_32
> # ifdef CONFIG_X86_PAE
> -# define SECTION_SIZE_BITS 30
> +# define SECTION_SIZE_BITS 29
> # define MAX_PHYSADDR_BITS 36
> # define MAX_PHYSMEM_BITS 36
> # else

This is fine with me as long as we have the room in page->flags. I
probably need to look at Christoph's new patches, too.

-- Dave

2008-03-28 18:05:58

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH 3 of 4] sparsemem: reduce i386 PAE section size

Dave Hansen wrote:
>
> Developers should probably learn the tradeoffs before they go start
> poking around in code and sending patches. ;)
>

I don't know any better way of learning the tradeoffs than by poking
around and sending patches ;)

> BTW, the current 1G sections were picked because all the hardware we
> knew of at the time had 512MB dimms that had to be added in pairs. So,
> we didn't ever have physical hotplug of less than that.
>

I guessed it was something like that. Even for my case 1G isn't too
bad. It was bad when I was trying to hotplug early and it was having
allocation failures in the bootmem allocator, but I think it should be
OK now. On the other hand, when you're hotplugging new memory is
precisely when you don't want to be allocating lots of new memory...

J

2008-03-28 18:20:06

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH 2 of 4] hotplug-memory: adding non-section-aligned memory is bad

KAMEZAWA Hiroyuki wrote:
>> What would happen if I did online_pages(pfn, 1) on each page as I
>> populate it?
>>
>>
> I think (hope) it works well. But it seems no one tries to do that.

Well I had immediate problems because I try to use it under spinlock and
it calls online_pages -> build_all_zonelists -> stop_machine_run. I can
easily rearrange to fix that, but it seems to me that stop_machine_run()
is probably too expensive to call thousands of times (one for each page,
rather than once per section).

I'm just looking at how I can refactor online_pages() into a function
which does the general zone setup stuff, and one for actually onlining
pages. The function of online_pages() would remain unchanged for
existing users, but I could call the pieces separately at the
appropriate times.

J

2008-03-28 18:27:37

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 2 of 4] hotplug-memory: adding non-section-aligned memory is bad

On Fri, 2008-03-28 at 11:19 -0700, Jeremy Fitzhardinge wrote:
> KAMEZAWA Hiroyuki wrote:
> >> What would happen if I did online_pages(pfn, 1) on each page as I
> >> populate it?
> >>
> >>
> > I think (hope) it works well. But it seems no one tries to do that.
>
> Well I had immediate problems because I try to use it under spinlock and
> it calls online_pages -> build_all_zonelists -> stop_machine_run. I can
> easily rearrange to fix that, but it seems to me that stop_machine_run()
> is probably too expensive to call thousands of times (one for each page,
> rather than once per section).

Yeah, you certainly don't want to be messing with things like the zone
boundaries for each page online operation.

All that you really want to do is hook into add_one_highpage_hotplug()
and keep the pages from going back into the allocator, right? Then, you
can have the Xen code go and actually free_page() on them, later.

I don't think *any* of this code actually touches the new physical pages
themselves. The issue that you're probably seeing is because they get
stuck into the allocator and some other user grabs them right away and
*does* touch them.

-- Dave

2008-03-28 21:30:53

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH 2 of 4] hotplug-memory: adding non-section-aligned memory is bad

KAMEZAWA Hiroyuki wrote:
> For example, mem_notify() in online_pages() will be called several times against
> a section.
>

It seems the only in-tree user of the notifier is mm/slub.c, and it only
cares about MEM_GOING_ONLINE, GOING_OFFLINE, OFFLINE and CANCEL_ONLINE
(not ONLINE or CANCEL_OFFLINE).

Also,

if (onlined_pages)
memory_notify(MEM_ONLINE, &arg);

doesn't seem correct, because arg.nr_pages still has the original
nr_pages, and not the number of pages actually onlined (onlined_pages).
Also arg.start_pfn isn't terribly meaningful if a discontiguous set of
pages can be actually onlined.

J

2008-03-28 22:05:37

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 0 of 4] [RFC] hotplug memory: minor updates


* Jeremy Fitzhardinge <[email protected]> wrote:

>> the x86 bits look small and trivial, so i'd suggest you get this into
>> -mm to after all the other VM patches, because this seems primarily a
>> VM issue.
>
> I would do that, but I'm planning on posting some Xen balloon driver
> patches which depend on at least one of these patches.

ok. Will wait patiently then :)

Ingo