2008-10-29 11:00:12

by Pavel Machek

[permalink] [raw]
Subject: [PATCH] hibernation should work ok with memory hotplug


hibernation + memory hotplug was disabled in kconfig because we could
not handle hibernation + sparse mem at some point. It seems to work
now, so I guess we can enable it.

Signed-off-by: Pavel Machek <[email protected]>


diff -ur linux/mm/Kconfig linux.tmp/mm/Kconfig
--- linux/mm/Kconfig 2008-10-27 10:10:59.000000000 +0100
+++ linux.tmp/mm/Kconfig 2008-10-29 10:02:41.000000000 +0100
@@ -128,12 +128,9 @@
config MEMORY_HOTPLUG
bool "Allow for memory hot-add"
depends on SPARSEMEM || X86_64_ACPI_NUMA
- depends on HOTPLUG && !HIBERNATION && ARCH_ENABLE_MEMORY_HOTPLUG
+ depends on HOTPLUG && ARCH_ENABLE_MEMORY_HOTPLUG
depends on (IA64 || X86 || PPC64 || SUPERH || S390)

-comment "Memory hotplug is currently incompatible with Software Suspend"
- depends on SPARSEMEM && HOTPLUG && HIBERNATION
-
config MEMORY_HOTPLUG_SPARSE
def_bool y
depends on SPARSEMEM && MEMORY_HOTPLUG

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


2008-10-29 12:20:25

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] hibernation should work ok with memory hotplug

On Wednesday, 29 of October 2008, Pavel Machek wrote:
>
> hibernation + memory hotplug was disabled in kconfig because we could
> not handle hibernation + sparse mem at some point. It seems to work
> now, so I guess we can enable it.

OK, if "it seems to work now" means that it has been tested and confirmed to
work, no objection from me.

Thanks,
Rafael


> Signed-off-by: Pavel Machek <[email protected]>
>
>
> diff -ur linux/mm/Kconfig linux.tmp/mm/Kconfig
> --- linux/mm/Kconfig 2008-10-27 10:10:59.000000000 +0100
> +++ linux.tmp/mm/Kconfig 2008-10-29 10:02:41.000000000 +0100
> @@ -128,12 +128,9 @@
> config MEMORY_HOTPLUG
> bool "Allow for memory hot-add"
> depends on SPARSEMEM || X86_64_ACPI_NUMA
> - depends on HOTPLUG && !HIBERNATION && ARCH_ENABLE_MEMORY_HOTPLUG
> + depends on HOTPLUG && ARCH_ENABLE_MEMORY_HOTPLUG
> depends on (IA64 || X86 || PPC64 || SUPERH || S390)
>
> -comment "Memory hotplug is currently incompatible with Software Suspend"
> - depends on SPARSEMEM && HOTPLUG && HIBERNATION
> -
> config MEMORY_HOTPLUG_SPARSE
> def_bool y
> depends on SPARSEMEM && MEMORY_HOTPLUG
>

2008-11-03 20:52:16

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] hibernation should work ok with memory hotplug

On Wed, 29 Oct 2008 13:25:00 +0100
"Rafael J. Wysocki" <[email protected]> wrote:

> On Wednesday, 29 of October 2008, Pavel Machek wrote:
> >
> > hibernation + memory hotplug was disabled in kconfig because we could
> > not handle hibernation + sparse mem at some point. It seems to work
> > now, so I guess we can enable it.
>
> OK, if "it seems to work now" means that it has been tested and confirmed to
> work, no objection from me.

yes, that was not a terribly confidence-inspiring commit message.

3947be1969a9ce455ec30f60ef51efb10e4323d1 said "For now, disable memory
hotplug when swsusp is enabled. There's a lot of churn there right
now. We'll fix it up properly once it calms down." which is also
rather rubbery.

Cough up, guys: what was the issue with memory hotplug and swsusp, and
is it indeed now fixed?

Thanks.


>
> > Signed-off-by: Pavel Machek <[email protected]>
> >
> >
> > diff -ur linux/mm/Kconfig linux.tmp/mm/Kconfig
> > --- linux/mm/Kconfig 2008-10-27 10:10:59.000000000 +0100
> > +++ linux.tmp/mm/Kconfig 2008-10-29 10:02:41.000000000 +0100
> > @@ -128,12 +128,9 @@
> > config MEMORY_HOTPLUG
> > bool "Allow for memory hot-add"
> > depends on SPARSEMEM || X86_64_ACPI_NUMA
> > - depends on HOTPLUG && !HIBERNATION && ARCH_ENABLE_MEMORY_HOTPLUG
> > + depends on HOTPLUG && ARCH_ENABLE_MEMORY_HOTPLUG
> > depends on (IA64 || X86 || PPC64 || SUPERH || S390)
> >
> > -comment "Memory hotplug is currently incompatible with Software Suspend"
> > - depends on SPARSEMEM && HOTPLUG && HIBERNATION
> > -
> > config MEMORY_HOTPLUG_SPARSE
> > def_bool y
> > depends on SPARSEMEM && MEMORY_HOTPLUG
> >
>

2008-11-03 21:18:18

by Nigel Cunningham

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

Hi.

On Mon, 2008-11-03 at 12:51 -0800, Andrew Morton wrote:
> On Wed, 29 Oct 2008 13:25:00 +0100
> "Rafael J. Wysocki" <[email protected]> wrote:
>
> > On Wednesday, 29 of October 2008, Pavel Machek wrote:
> > >
> > > hibernation + memory hotplug was disabled in kconfig because we could
> > > not handle hibernation + sparse mem at some point. It seems to work
> > > now, so I guess we can enable it.
> >
> > OK, if "it seems to work now" means that it has been tested and confirmed to
> > work, no objection from me.
>
> yes, that was not a terribly confidence-inspiring commit message.
>
> 3947be1969a9ce455ec30f60ef51efb10e4323d1 said "For now, disable memory
> hotplug when swsusp is enabled. There's a lot of churn there right
> now. We'll fix it up properly once it calms down." which is also
> rather rubbery.
>
> Cough up, guys: what was the issue with memory hotplug and swsusp, and
> is it indeed now fixed?

IIRC, at least part of the question was "What if memory is hot unplugged
in the middle of hibernating, or between hibernating and
resuming." (Would apply to cold unplugging too).

Regards,

Nigel

2008-11-03 21:22:22

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] hibernation should work ok with memory hotplug

On Mon, 2008-11-03 at 12:51 -0800, Andrew Morton wrote:
> On Wed, 29 Oct 2008 13:25:00 +0100
> "Rafael J. Wysocki" <[email protected]> wrote:
> > On Wednesday, 29 of October 2008, Pavel Machek wrote:
> > >
> > > hibernation + memory hotplug was disabled in kconfig because we could
> > > not handle hibernation + sparse mem at some point. It seems to work
> > > now, so I guess we can enable it.
> >
> > OK, if "it seems to work now" means that it has been tested and confirmed to
> > work, no objection from me.
>
> yes, that was not a terribly confidence-inspiring commit message.
>
> 3947be1969a9ce455ec30f60ef51efb10e4323d1 said "For now, disable memory
> hotplug when swsusp is enabled. There's a lot of churn there right
> now. We'll fix it up properly once it calms down." which is also
> rather rubbery.
>
> Cough up, guys: what was the issue with memory hotplug and swsusp, and
> is it indeed now fixed?

I suck. That commit message was horrid and I'm racking my brain now to
remember what I meant. Don't end up like me, kids.

I've attached the message that I sent to the swsusp folks. I never got
a reply from that as far as I can tell.

http://sourceforge.net/mailarchive/forum.php?thread_name=1118682535.22631.22.camel%40localhost&forum_name=lhms-devel

As I look at it now, it hasn't improved much since 2005. Take a look at
kernel/power/snapshot.c::copy_data_pages(). It still assumes that the
list of zones that a system has is static. Memory hotplug needs to be
excluded while that operation is going on.

page_is_saveable() checks for pfn_valid(). But, with memory hotplug,
things can become invalid at any time since no references are held or
taken on the page. Or, a page that *was* invalid may become valid and
get missed.

The "missing a page" thing is probably correctable via the
zone_span_seqbegin() locks. The "page becoming invalid" thing is
probably mostly fixable by acquiring a reference to the page itself.
I'd need to look how the locking on the hot remove side is working these
days to be much more constructive than that.

-- Dave


Attachments:
(No filename) (1.88 kB)
Attached message - memory hotplug and software suspend

2008-11-03 22:19:42

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] hibernation should work ok with memory hotplug

On Monday, 3 of November 2008, Dave Hansen wrote:
> On Mon, 2008-11-03 at 12:51 -0800, Andrew Morton wrote:
> > On Wed, 29 Oct 2008 13:25:00 +0100
> > "Rafael J. Wysocki" <[email protected]> wrote:
> > > On Wednesday, 29 of October 2008, Pavel Machek wrote:
> > > >
> > > > hibernation + memory hotplug was disabled in kconfig because we could
> > > > not handle hibernation + sparse mem at some point. It seems to work
> > > > now, so I guess we can enable it.
> > >
> > > OK, if "it seems to work now" means that it has been tested and confirmed to
> > > work, no objection from me.
> >
> > yes, that was not a terribly confidence-inspiring commit message.
> >
> > 3947be1969a9ce455ec30f60ef51efb10e4323d1 said "For now, disable memory
> > hotplug when swsusp is enabled. There's a lot of churn there right
> > now. We'll fix it up properly once it calms down." which is also
> > rather rubbery.
> >
> > Cough up, guys: what was the issue with memory hotplug and swsusp, and
> > is it indeed now fixed?
>
> I suck. That commit message was horrid and I'm racking my brain now to
> remember what I meant. Don't end up like me, kids.
>
> I've attached the message that I sent to the swsusp folks. I never got
> a reply from that as far as I can tell.
>
> http://sourceforge.net/mailarchive/forum.php?thread_name=1118682535.22631.22.camel%40localhost&forum_name=lhms-devel
>
> As I look at it now, it hasn't improved much since 2005. Take a look at
> kernel/power/snapshot.c::copy_data_pages(). It still assumes that the
> list of zones that a system has is static. Memory hotplug needs to be
> excluded while that operation is going on.

This operation is carried out on one CPU with interrupts disabled. Is that
not enough?

> page_is_saveable() checks for pfn_valid(). But, with memory hotplug,
> things can become invalid at any time since no references are held or
> taken on the page. Or, a page that *was* invalid may become valid and
> get missed.

Can that really happen given the conditions above?

> The "missing a page" thing is probably correctable via the
> zone_span_seqbegin() locks. The "page becoming invalid" thing is
> probably mostly fixable by acquiring a reference to the page itself.
> I'd need to look how the locking on the hot remove side is working these
> days to be much more constructive than that.

Well, I don't think any locking is necessary for the image creation.

Thanks,
Rafael

2008-11-03 22:35:58

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] hibernation should work ok with memory hotplug

On Mon, 2008-11-03 at 23:24 +0100, Rafael J. Wysocki wrote:
> On Monday, 3 of November 2008, Dave Hansen wrote:
> > On Mon, 2008-11-03 at 12:51 -0800, Andrew Morton wrote:
> > > On Wed, 29 Oct 2008 13:25:00 +0100
> > > "Rafael J. Wysocki" <[email protected]> wrote:
> > > > On Wednesday, 29 of October 2008, Pavel Machek wrote:
> > > > >
> > > > > hibernation + memory hotplug was disabled in kconfig because we could
> > > > > not handle hibernation + sparse mem at some point. It seems to work
> > > > > now, so I guess we can enable it.
> > > >
> > > > OK, if "it seems to work now" means that it has been tested and confirmed to
> > > > work, no objection from me.
> > >
> > > yes, that was not a terribly confidence-inspiring commit message.
> > >
> > > 3947be1969a9ce455ec30f60ef51efb10e4323d1 said "For now, disable memory
> > > hotplug when swsusp is enabled. There's a lot of churn there right
> > > now. We'll fix it up properly once it calms down." which is also
> > > rather rubbery.
> > >
> > > Cough up, guys: what was the issue with memory hotplug and swsusp, and
> > > is it indeed now fixed?
> >
> > I suck. That commit message was horrid and I'm racking my brain now to
> > remember what I meant. Don't end up like me, kids.
> >
> > I've attached the message that I sent to the swsusp folks. I never got
> > a reply from that as far as I can tell.
> >
> > http://sourceforge.net/mailarchive/forum.php?thread_name=1118682535.22631.22.camel%40localhost&forum_name=lhms-devel
> >
> > As I look at it now, it hasn't improved much since 2005. Take a look at
> > kernel/power/snapshot.c::copy_data_pages(). It still assumes that the
> > list of zones that a system has is static. Memory hotplug needs to be
> > excluded while that operation is going on.
>
> This operation is carried out on one CPU with interrupts disabled. Is that
> not enough?

If that's true then you don't need any locking for anything at all,
right?

All of the changes I was talking about occur inside the kernel and code
has to run for it to happen. So, if you are saying that absolutely no
other code on the system can possibly run, then it should be OK.

> > page_is_saveable() checks for pfn_valid(). But, with memory hotplug,
> > things can become invalid at any time since no references are held or
> > taken on the page. Or, a page that *was* invalid may become valid and
> > get missed.
>
> Can that really happen given the conditions above?

Nope.

But, as I think about it, there is another issue that we need to
address, CONFIG_NODES_SPAN_OTHER_NODES.

A node might have a node_start_pfn=0 and a node_end_pfn=100 (and it may
have only one zone). But, there may be another node with
node_start_pfn=10 and a node_end_pfn=20. This loop:

for_each_zone(zone) {
...
for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++)
if (page_is_saveable(zone, pfn))
memory_bm_set_bit(orig_bm, pfn);
}

will walk over the smaller node's pfn range multiple times. Is this OK?

I think all you have to do to fix it is check page_zone(page) == zone
and skip out if they don't match.

Andy, does anything else stick out to you?

-- Dave

2008-11-03 23:00:47

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] hibernation should work ok with memory hotplug

On Monday, 3 of November 2008, Dave Hansen wrote:
> On Mon, 2008-11-03 at 23:24 +0100, Rafael J. Wysocki wrote:
> > On Monday, 3 of November 2008, Dave Hansen wrote:
> > > On Mon, 2008-11-03 at 12:51 -0800, Andrew Morton wrote:
> > > > On Wed, 29 Oct 2008 13:25:00 +0100
> > > > "Rafael J. Wysocki" <[email protected]> wrote:
> > > > > On Wednesday, 29 of October 2008, Pavel Machek wrote:
> > > > > >
> > > > > > hibernation + memory hotplug was disabled in kconfig because we could
> > > > > > not handle hibernation + sparse mem at some point. It seems to work
> > > > > > now, so I guess we can enable it.
> > > > >
> > > > > OK, if "it seems to work now" means that it has been tested and confirmed to
> > > > > work, no objection from me.
> > > >
> > > > yes, that was not a terribly confidence-inspiring commit message.
> > > >
> > > > 3947be1969a9ce455ec30f60ef51efb10e4323d1 said "For now, disable memory
> > > > hotplug when swsusp is enabled. There's a lot of churn there right
> > > > now. We'll fix it up properly once it calms down." which is also
> > > > rather rubbery.
> > > >
> > > > Cough up, guys: what was the issue with memory hotplug and swsusp, and
> > > > is it indeed now fixed?
> > >
> > > I suck. That commit message was horrid and I'm racking my brain now to
> > > remember what I meant. Don't end up like me, kids.
> > >
> > > I've attached the message that I sent to the swsusp folks. I never got
> > > a reply from that as far as I can tell.
> > >
> > > http://sourceforge.net/mailarchive/forum.php?thread_name=1118682535.22631.22.camel%40localhost&forum_name=lhms-devel
> > >
> > > As I look at it now, it hasn't improved much since 2005. Take a look at
> > > kernel/power/snapshot.c::copy_data_pages(). It still assumes that the
> > > list of zones that a system has is static. Memory hotplug needs to be
> > > excluded while that operation is going on.
> >
> > This operation is carried out on one CPU with interrupts disabled. Is that
> > not enough?
>
> If that's true then you don't need any locking for anything at all,
> right?

Yes.

> All of the changes I was talking about occur inside the kernel and code
> has to run for it to happen. So, if you are saying that absolutely no
> other code on the system can possibly run, then it should be OK.
>
> > > page_is_saveable() checks for pfn_valid(). But, with memory hotplug,
> > > things can become invalid at any time since no references are held or
> > > taken on the page. Or, a page that *was* invalid may become valid and
> > > get missed.
> >
> > Can that really happen given the conditions above?
>
> Nope.
>
> But, as I think about it, there is another issue that we need to
> address, CONFIG_NODES_SPAN_OTHER_NODES.
>
> A node might have a node_start_pfn=0 and a node_end_pfn=100 (and it may
> have only one zone). But, there may be another node with
> node_start_pfn=10 and a node_end_pfn=20. This loop:
>
> for_each_zone(zone) {
> ...
> for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++)
> if (page_is_saveable(zone, pfn))
> memory_bm_set_bit(orig_bm, pfn);
> }
>
> will walk over the smaller node's pfn range multiple times. Is this OK?

Hm, well, I'm not really sure at the moment.

Does it mean that, in your example, the pfns 10 to 20 from the first node
refer to the same page frames that are referred to by the pfns from the
second node?

> I think all you have to do to fix it is check page_zone(page) == zone
> and skip out if they don't match.

Well, probably. I need to know exactly what's the relationship between pfns,
pages and physical page frames in that case.

Thanks,
Rafael

2008-11-03 23:10:53

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] hibernation should work ok with memory hotplug

On Tue, 2008-11-04 at 00:05 +0100, Rafael J. Wysocki wrote:
> On Monday, 3 of November 2008, Dave Hansen wrote:
> > But, as I think about it, there is another issue that we need to
> > address, CONFIG_NODES_SPAN_OTHER_NODES.
> >
> > A node might have a node_start_pfn=0 and a node_end_pfn=100 (and it may
> > have only one zone). But, there may be another node with
> > node_start_pfn=10 and a node_end_pfn=20. This loop:
> >
> > for_each_zone(zone) {
> > ...
> > for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++)
> > if (page_is_saveable(zone, pfn))
> > memory_bm_set_bit(orig_bm, pfn);
> > }
> >
> > will walk over the smaller node's pfn range multiple times. Is this OK?
>
> Hm, well, I'm not really sure at the moment.
>
> Does it mean that, in your example, the pfns 10 to 20 from the first node
> refer to the same page frames that are referred to by the pfns from the
> second node?

Maybe using pfns didn't make for a good example. I could have used
physical addresses as well.

All that I'm saying is that nodes (and zones) can span other nodes (and
zones). This means that the address ranges making up that node can
overlap with the address ranges of another node. This doesn't mean that
*each* node has those address ranges. Each individual address can only
be in one node.

Since zone *ranges* overlap, you can't tell to which zone a page belongs
simply from its address. You need to ask the 'struct page'.

> > I think all you have to do to fix it is check page_zone(page) == zone
> > and skip out if they don't match.
>
> Well, probably. I need to know exactly what's the relationship between pfns,
> pages and physical page frames in that case.

1 pfn == 1 'struct page' == 1 physical page

The only exception to that is that we may have more 'struct pages' than
we have actual physical memory due to rounding and so forth.

-- Dave

2008-11-03 23:39:43

by Andy Whitcroft

[permalink] [raw]
Subject: Re: [PATCH] hibernation should work ok with memory hotplug

On Mon, Nov 03, 2008 at 02:34:25PM -0800, Dave Hansen wrote:
> On Mon, 2008-11-03 at 23:24 +0100, Rafael J. Wysocki wrote:
> > On Monday, 3 of November 2008, Dave Hansen wrote:
> > > On Mon, 2008-11-03 at 12:51 -0800, Andrew Morton wrote:
> > > > On Wed, 29 Oct 2008 13:25:00 +0100
> > > > "Rafael J. Wysocki" <[email protected]> wrote:
> > > > > On Wednesday, 29 of October 2008, Pavel Machek wrote:
> > > > > >
> > > > > > hibernation + memory hotplug was disabled in kconfig because we could
> > > > > > not handle hibernation + sparse mem at some point. It seems to work
> > > > > > now, so I guess we can enable it.
> > > > >
> > > > > OK, if "it seems to work now" means that it has been tested and confirmed to
> > > > > work, no objection from me.
> > > >
> > > > yes, that was not a terribly confidence-inspiring commit message.
> > > >
> > > > 3947be1969a9ce455ec30f60ef51efb10e4323d1 said "For now, disable memory
> > > > hotplug when swsusp is enabled. There's a lot of churn there right
> > > > now. We'll fix it up properly once it calms down." which is also
> > > > rather rubbery.
> > > >
> > > > Cough up, guys: what was the issue with memory hotplug and swsusp, and
> > > > is it indeed now fixed?
> > >
> > > I suck. That commit message was horrid and I'm racking my brain now to
> > > remember what I meant. Don't end up like me, kids.
> > >
> > > I've attached the message that I sent to the swsusp folks. I never got
> > > a reply from that as far as I can tell.
> > >
> > > http://sourceforge.net/mailarchive/forum.php?thread_name=1118682535.22631.22.camel%40localhost&forum_name=lhms-devel
> > >
> > > As I look at it now, it hasn't improved much since 2005. Take a look at
> > > kernel/power/snapshot.c::copy_data_pages(). It still assumes that the
> > > list of zones that a system has is static. Memory hotplug needs to be
> > > excluded while that operation is going on.
> >
> > This operation is carried out on one CPU with interrupts disabled. Is that
> > not enough?
>
> If that's true then you don't need any locking for anything at all,
> right?
>
> All of the changes I was talking about occur inside the kernel and code
> has to run for it to happen. So, if you are saying that absolutely no
> other code on the system can possibly run, then it should be OK.
>
> > > page_is_saveable() checks for pfn_valid(). But, with memory hotplug,
> > > things can become invalid at any time since no references are held or
> > > taken on the page. Or, a page that *was* invalid may become valid and
> > > get missed.
> >
> > Can that really happen given the conditions above?
>
> Nope.
>
> But, as I think about it, there is another issue that we need to
> address, CONFIG_NODES_SPAN_OTHER_NODES.
>
> A node might have a node_start_pfn=0 and a node_end_pfn=100 (and it may
> have only one zone). But, there may be another node with
> node_start_pfn=10 and a node_end_pfn=20. This loop:
>
> for_each_zone(zone) {
> ...
> for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++)
> if (page_is_saveable(zone, pfn))
> memory_bm_set_bit(orig_bm, pfn);
> }
>
> will walk over the smaller node's pfn range multiple times. Is this OK?
>
> I think all you have to do to fix it is check page_zone(page) == zone
> and skip out if they don't match.
>
> Andy, does anything else stick out to you?

I agree that there needs to be a check for being in the zone there to
avoid the overlapping nodes issue. Also need to make sure when
constructing that check we check for pfn_valid before looking at the
page to avoid holes in the memmap.

-apw

2008-11-04 00:25:11

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] hibernation should work ok with memory hotplug

On Tuesday, 4 of November 2008, Dave Hansen wrote:
> On Tue, 2008-11-04 at 00:05 +0100, Rafael J. Wysocki wrote:
> > On Monday, 3 of November 2008, Dave Hansen wrote:
> > > But, as I think about it, there is another issue that we need to
> > > address, CONFIG_NODES_SPAN_OTHER_NODES.
> > >
> > > A node might have a node_start_pfn=0 and a node_end_pfn=100 (and it may
> > > have only one zone). But, there may be another node with
> > > node_start_pfn=10 and a node_end_pfn=20. This loop:
> > >
> > > for_each_zone(zone) {
> > > ...
> > > for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++)
> > > if (page_is_saveable(zone, pfn))
> > > memory_bm_set_bit(orig_bm, pfn);
> > > }
> > >
> > > will walk over the smaller node's pfn range multiple times. Is this OK?
> >
> > Hm, well, I'm not really sure at the moment.
> >
> > Does it mean that, in your example, the pfns 10 to 20 from the first node
> > refer to the same page frames that are referred to by the pfns from the
> > second node?
>
> Maybe using pfns didn't make for a good example. I could have used
> physical addresses as well.
>
> All that I'm saying is that nodes (and zones) can span other nodes (and
> zones). This means that the address ranges making up that node can
> overlap with the address ranges of another node. This doesn't mean that
> *each* node has those address ranges. Each individual address can only
> be in one node.
>
> Since zone *ranges* overlap, you can't tell to which zone a page belongs
> simply from its address. You need to ask the 'struct page'.

Understood.

This means that some zones may contain some ranges of pfns that correspond
to struct pages in another zone, correct?

> > > I think all you have to do to fix it is check page_zone(page) == zone
> > > and skip out if they don't match.
> >
> > Well, probably. I need to know exactly what's the relationship between pfns,
> > pages and physical page frames in that case.
>
> 1 pfn == 1 'struct page' == 1 physical page
>
> The only exception to that is that we may have more 'struct pages' than
> we have actual physical memory due to rounding and so forth.

OK, I think that the appended patch will do the trick (compiled, untested).

Thanks,
Rafael


Not-yet-signed-off-by: Rafael J. Wysocki <[email protected]>
---
kernel/power/snapshot.c | 23 ++++++++++++++---------
1 file changed, 14 insertions(+), 9 deletions(-)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -817,8 +817,7 @@ static unsigned int count_free_highmem_p
* We should save the page if it isn't Nosave or NosaveFree, or Reserved,
* and it isn't a part of a free chunk of pages.
*/
-
-static struct page *saveable_highmem_page(unsigned long pfn)
+static struct page *saveable_highmem_page(struct zone *zone, unsigned long pfn)
{
struct page *page;

@@ -826,6 +825,8 @@ static struct page *saveable_highmem_pag
return NULL;

page = pfn_to_page(pfn);
+ if (page_zone(page) != zone)
+ return NULL;

BUG_ON(!PageHighMem(page));

@@ -855,13 +856,16 @@ unsigned int count_highmem_pages(void)
mark_free_pages(zone);
max_zone_pfn = zone->zone_start_pfn + zone->spanned_pages;
for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++)
- if (saveable_highmem_page(pfn))
+ if (saveable_highmem_page(zone, pfn))
n++;
}
return n;
}
#else
-static inline void *saveable_highmem_page(unsigned long pfn) { return NULL; }
+static inline void *saveable_highmem_page(struct zone *z, unsigned long p)
+{
+ return NULL;
+}
#endif /* CONFIG_HIGHMEM */

/**
@@ -872,8 +876,7 @@ static inline void *saveable_highmem_pag
* of pages statically defined as 'unsaveable', and it isn't a part of
* a free chunk of pages.
*/
-
-static struct page *saveable_page(unsigned long pfn)
+static struct page *saveable_page(struct zone *zone, unsigned long pfn)
{
struct page *page;

@@ -881,6 +884,8 @@ static struct page *saveable_page(unsign
return NULL;

page = pfn_to_page(pfn);
+ if (page_zone(page) != zone)
+ return NULL;

BUG_ON(PageHighMem(page));

@@ -912,7 +917,7 @@ unsigned int count_data_pages(void)
mark_free_pages(zone);
max_zone_pfn = zone->zone_start_pfn + zone->spanned_pages;
for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++)
- if(saveable_page(pfn))
+ if(saveable_page(zone, pfn))
n++;
}
return n;
@@ -953,7 +958,7 @@ static inline struct page *
page_is_saveable(struct zone *zone, unsigned long pfn)
{
return is_highmem(zone) ?
- saveable_highmem_page(pfn) : saveable_page(pfn);
+ saveable_highmem_page(zone, pfn) : saveable_page(zone, pfn);
}

static void copy_data_page(unsigned long dst_pfn, unsigned long src_pfn)
@@ -984,7 +989,7 @@ static void copy_data_page(unsigned long
}
}
#else
-#define page_is_saveable(zone, pfn) saveable_page(pfn)
+#define page_is_saveable(zone, pfn) saveable_page(zone, pfn)

static inline void copy_data_page(unsigned long dst_pfn, unsigned long src_pfn)
{

2008-11-04 00:53:15

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] hibernation should work ok with memory hotplug

On Tue, 2008-11-04 at 01:29 +0100, Rafael J. Wysocki wrote:
> > Since zone *ranges* overlap, you can't tell to which zone a page belongs
> > simply from its address. You need to ask the 'struct page'.
>
> Understood.
>
> This means that some zones may contain some ranges of pfns that correspond
> to struct pages in another zone, correct?

Yup, that's correct.

The patch looks good to me.

-- Dave

2008-11-04 04:16:48

by Nigel Cunningham

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

Hi.

On Mon, 2008-11-03 at 14:34 -0800, Dave Hansen wrote:
> A node might have a node_start_pfn=0 and a node_end_pfn=100 (and it may
> have only one zone). But, there may be another node with
> node_start_pfn=10 and a node_end_pfn=20. This loop:
>
> for_each_zone(zone) {
> ...
> for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++)
> if (page_is_saveable(zone, pfn))
> memory_bm_set_bit(orig_bm, pfn);
> }
>
> will walk over the smaller node's pfn range multiple times. Is this OK?
>
> I think all you have to do to fix it is check page_zone(page) == zone
> and skip out if they don't match.

So pfn 10 in the first node refers to the same memory as pfn 10 in the
second node?

Regards,

Nigel

2008-11-04 07:04:19

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

On Tuesday, 4 of November 2008, Nigel Cunningham wrote:
> Hi.
>
> On Mon, 2008-11-03 at 14:34 -0800, Dave Hansen wrote:
> > A node might have a node_start_pfn=0 and a node_end_pfn=100 (and it may
> > have only one zone). But, there may be another node with
> > node_start_pfn=10 and a node_end_pfn=20. This loop:
> >
> > for_each_zone(zone) {
> > ...
> > for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++)
> > if (page_is_saveable(zone, pfn))
> > memory_bm_set_bit(orig_bm, pfn);
> > }
> >
> > will walk over the smaller node's pfn range multiple times. Is this OK?
> >
> > I think all you have to do to fix it is check page_zone(page) == zone
> > and skip out if they don't match.
>
> So pfn 10 in the first node refers to the same memory as pfn 10 in the
> second node?

A pfn always refers to specific page frame and/or struct page, so yes.
However, in one of the nodes these pfns are sort of "invalid" (they point
to struct pages belonging to other zones). AFAICS.

Thanks,
Rafael

2008-11-04 07:09:46

by Dave Hansen

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

On Tue, 2008-11-04 at 15:02 +1100, Nigel Cunningham wrote:
> On Mon, 2008-11-03 at 14:34 -0800, Dave Hansen wrote:
> > A node might have a node_start_pfn=0 and a node_end_pfn=100 (and it may
> > have only one zone). But, there may be another node with
> > node_start_pfn=10 and a node_end_pfn=20. This loop:
> >
> > for_each_zone(zone) {
> > ...
> > for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++)
> > if (page_is_saveable(zone, pfn))
> > memory_bm_set_bit(orig_bm, pfn);
> > }
> >
> > will walk over the smaller node's pfn range multiple times. Is this OK?
> >
> > I think all you have to do to fix it is check page_zone(page) == zone
> > and skip out if they don't match.
>
> So pfn 10 in the first node refers to the same memory as pfn 10 in the
> second node?

Sure. But, remember that the pfns (and the entire physical address
space) is consistent across the entire system. It's not like both nodes
have an address and the kernel only "gives" it to one of them.

There's real confusion about zone->zone_start/end_pfn, I think. *All*
that they mean is this:

- zone_start_pfn is the lowest physical address present in the zone.
- zone_end_pfn is the highest physical address present in the zone

That's *it*. Those numbers imply *nothing* about the pages between
them, except that there might be 0 or more pages in there belonging to
the same zone.

"All pages in this zone lie between these two physical addresses." is
all they say.

-- Dave

2008-11-04 07:30:52

by Nigel Cunningham

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

Hi.

On Mon, 2008-11-03 at 23:09 -0800, Dave Hansen wrote:
> > So pfn 10 in the first node refers to the same memory as pfn 10 in the
> > second node?
>
> Sure. But, remember that the pfns (and the entire physical address
> space) is consistent across the entire system. It's not like both nodes
> have an address and the kernel only "gives" it to one of them.
>
> There's real confusion about zone->zone_start/end_pfn, I think. *All*
> that they mean is this:
>
> - zone_start_pfn is the lowest physical address present in the zone.
> - zone_end_pfn is the highest physical address present in the zone
>
> That's *it*. Those numbers imply *nothing* about the pages between
> them, except that there might be 0 or more pages in there belonging to
> the same zone.
>
> "All pages in this zone lie between these two physical addresses." is
> all they say.

Okay. Thanks (and to Rafael).

One other question, if I may. Would you please explain (or point me to
an explanation) of PHYS_PFN_OFFSET/ARCH_PFN_OFFSET? I've been dealing
occasionally with people wanting to have hibernation on arm, and I don't
really get the concept or the implementation (particularly when it comes
to trying to do the sort of iterating over zones and pfns that was being
discussed in previous messages in this thread.

Regards,

Nigel

2008-11-04 07:36:28

by Dave Hansen

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

On Tue, 2008-11-04 at 08:08 +0100, Rafael J. Wysocki wrote:
> A pfn always refers to specific page frame and/or struct page, so yes.
> However, in one of the nodes these pfns are sort of "invalid" (they point
> to struct pages belonging to other zones). AFAICS.

Part of this problem is getting out of the old zone mindset. It used to
be that there were one, two, or three zones, set up at boot, with static
ranges. These never had holes, never changed, and were always stacked
up nice and tightly on top of one another. It ain't that way no more.

Now, the zones are much more truly "allocation pools". They're bunches
of memory with similar attributes and hypervisors or firmware can hand
them to the OS in very interesting ways. This means that the attributes
that help us pool the memory together have less and less to do with
physical addresses. A given physical address a decreasing chance of
being related to its neighbor.

-- Dave

2008-11-04 07:53:59

by Dave Hansen

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

On Tue, 2008-11-04 at 18:30 +1100, Nigel Cunningham wrote:
> One other question, if I may. Would you please explain (or point me to
> an explanation) of PHYS_PFN_OFFSET/ARCH_PFN_OFFSET? I've been dealing
> occasionally with people wanting to have hibernation on arm, and I don't
> really get the concept or the implementation (particularly when it comes
> to trying to do the sort of iterating over zones and pfns that was being
> discussed in previous messages in this thread.

First of all, I think PHYS_PFN_OFFSET is truly an arch-dependent
construct. It only appears in arm an avr32. I'll tell you only how
ARCH_PFN_OFFSET looks to me. My guess is that those two arches need to
reconcile themselves and start using ARCH_PFN_OFFSET instead.

In the old days, we only had memory that started at physical address 0x0
and went up to some larger address. We allocated a mem_map[] of 'struct
pages' in one big chunk, one for each address. mem_map[0] was for
physical address 0x0 and mem_map[1] was for 0x1000, mem_map[2] was for
0x2000 and so on...

If a machine didn't have a physical address 0x0, we allocated mem_map[]
for it anyway and just wasted that entry. What ARCH_PFN_OFFSET does is
let us bias the mem_map[] structure so that mem_map[0] does not
represent 0x0.

If ARCH_PFN_OFFSET is 1, then mem_map[0] actually represents the
physical address 0x1000. If it is 2, then mem_map[0] represents
physical addr 0x2000. ARCH_PFN_OFFSET means that the first physical
address on the machine is at ARCH_PFN_OFFSET*PAGE_SIZE. We bias all
lookups into the mem_map[] so that we don't waste space in it. There
will never be a zone_start_pfn lower than ARCH_PFN_OFFSET, for instance.

What does that mean for walking zones? Nothing. It only has meaning
for how we allocate and do lookups into the mem_map[]. But, since
everyone uses pfn_to_page() and friends, you don't ever see this.

I'm curious why you think you need to be concerned with it.

-- Dave

2008-11-04 08:50:19

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

On Tuesday, 4 of November 2008, Dave Hansen wrote:
> On Tue, 2008-11-04 at 08:08 +0100, Rafael J. Wysocki wrote:
> > A pfn always refers to specific page frame and/or struct page, so yes.
> > However, in one of the nodes these pfns are sort of "invalid" (they point
> > to struct pages belonging to other zones). AFAICS.
>
> Part of this problem is getting out of the old zone mindset. It used to
> be that there were one, two, or three zones, set up at boot, with static
> ranges. These never had holes, never changed, and were always stacked
> up nice and tightly on top of one another. It ain't that way no more.

In fact there were two assumptions about zones in the hibernation code,
that they are static and that they don't overlap. The second one may be
removed by the patch I sent before, but there still is a problem related to
the first one, which is that the memory management structures (bitmaps) used
by us depend on the zones being static.

Of course, we create the image atomically, but the bitmaps are created earlier
with the assumption that the zones won't change aferwards. Also, if memory
hotplugging is used _after_ the image has been created, the zones may change
in a way that's not compatible with the structure of the bitmaps.

To handle this, I need to know two things:
1) what changes of the zones are possible due to memory hotplugging (i.e.
can they grow, shring, change boundaries etc.)
2) what kind of locking is needed to prevent zones from changing.

Thanks,
Rafael

2008-11-04 15:22:19

by Dave Hansen

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

On Tue, 2008-11-04 at 09:54 +0100, Rafael J. Wysocki wrote:
> To handle this, I need to know two things:
> 1) what changes of the zones are possible due to memory hotplugging
> (i.e. can they grow, shring, change boundaries etc.)

All of the above.

> 2) what kind of locking is needed to prevent zones from changing.

The amount of locking is pretty minimal. We depend on some locking in
sysfs to keep two attempts to online from stepping on the other.

There is the zone_span_seq*() set of functions. These are used pretty
sparsely, but we do use them in page_outside_zone_boundaries() to notice
when a zone is resized.

There are also the pgdat_resize*() locks. Those are more for internal
use guarding the sparsemem structures and so forth.

Could you describe a little more why you need to lock down zone
resizing? Do you *really* mean zones, or do you mean "the set of memory
on the system"? Why walk zones instead of pgdats?

-- Dave

2008-11-04 15:31:30

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

On Tuesday, 4 of November 2008, Dave Hansen wrote:
> On Tue, 2008-11-04 at 09:54 +0100, Rafael J. Wysocki wrote:
> > To handle this, I need to know two things:
> > 1) what changes of the zones are possible due to memory hotplugging
> > (i.e. can they grow, shring, change boundaries etc.)
>
> All of the above.

OK

If I allocate a page frame corresponding to specific pfn, is it guaranteed to
be associated with the same pfn in future?

> > 2) what kind of locking is needed to prevent zones from changing.
>
> The amount of locking is pretty minimal. We depend on some locking in
> sysfs to keep two attempts to online from stepping on the other.
>
> There is the zone_span_seq*() set of functions. These are used pretty
> sparsely, but we do use them in page_outside_zone_boundaries() to notice
> when a zone is resized.
>
> There are also the pgdat_resize*() locks. Those are more for internal
> use guarding the sparsemem structures and so forth.
>
> Could you describe a little more why you need to lock down zone
> resizing? Do you *really* mean zones, or do you mean "the set of memory
> on the system"?

The latter, but our internal data structures are designed with zones in mind.

> Why walk zones instead of pgdats?

This is a historical thing rather than anything else. I think we could switch
to pgdats, but that would require a code rewrite that's likely to introduce
bugs, while our image-creating code is really well tested and doesn't change
very often.

Thanks,
Rafael

2008-11-04 15:40:14

by Dave Hansen

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

On Tue, 2008-11-04 at 16:35 +0100, Rafael J. Wysocki wrote:
> On Tuesday, 4 of November 2008, Dave Hansen wrote:
> > On Tue, 2008-11-04 at 09:54 +0100, Rafael J. Wysocki wrote:
> > > To handle this, I need to know two things:
> > > 1) what changes of the zones are possible due to memory hotplugging
> > > (i.e. can they grow, shring, change boundaries etc.)
> >
> > All of the above.
>
> OK
>
> If I allocate a page frame corresponding to specific pfn, is it guaranteed to
> be associated with the same pfn in future?

Page allocation is different. Since you hold a reference to a page, it
can not be removed until you release that reference. That's why every
normal alloc_pages() user in the kernel doesn't have to worry about
memory hotplug.

> > Why walk zones instead of pgdats?
>
> This is a historical thing rather than anything else. I think we could switch
> to pgdats, but that would require a code rewrite that's likely to introduce
> bugs, while our image-creating code is really well tested and doesn't change
> very often.

OK, fair enough. I just wanted you to know that there are options other
than zones.

-- Dave

2008-11-04 16:29:42

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

On Tuesday, 4 of November 2008, Dave Hansen wrote:
> On Tue, 2008-11-04 at 16:35 +0100, Rafael J. Wysocki wrote:
> > On Tuesday, 4 of November 2008, Dave Hansen wrote:
> > > On Tue, 2008-11-04 at 09:54 +0100, Rafael J. Wysocki wrote:
> > > > To handle this, I need to know two things:
> > > > 1) what changes of the zones are possible due to memory hotplugging
> > > > (i.e. can they grow, shring, change boundaries etc.)
> > >
> > > All of the above.
> >
> > OK
> >
> > If I allocate a page frame corresponding to specific pfn, is it guaranteed to
> > be associated with the same pfn in future?
>
> Page allocation is different. Since you hold a reference to a page, it
> can not be removed until you release that reference. That's why every
> normal alloc_pages() user in the kernel doesn't have to worry about
> memory hotplug.

Good. :-)

So, if I allocate the image pages right prior to creating the image, they
won't be touched by memory hotplug.

Now, I need to do one more thing, which is to check how much memory has to be
freed before creating the image. For this purpose I need to lock memory
hotplug temporarily, count pages to free and unlock it. What interface should
I use for this purpose?

[I'll also need to lock memory hotplug temporarily during resume.]

Rafael

2008-11-04 17:01:58

by Dave Hansen

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

On Tue, 2008-11-04 at 17:34 +0100, Rafael J. Wysocki wrote:
> Now, I need to do one more thing, which is to check how much memory has to be
> freed before creating the image. For this purpose I need to lock memory
> hotplug temporarily, count pages to free and unlock it. What interface should
> I use for this purpose?
>
> [I'll also need to lock memory hotplug temporarily during resume.]

We currently don't have any big switch to disable memory hotplug, like
lock_memory_hotplug() or something. :)

If you are simply scanning and counting pages, I think the best thing to
use would be the zone_span_seq*() seqlock stuff. Do your count inside
the seqlock's while loop. That covers detecting a zone changing while
it is being scanned.

The other case to detect is when a new zone gets added. These are
really rare. Rare enough that we actually use a stop_machine() call in
build_all_zonelists() to do it. All you would have to do is detect when
one of these calls gets made. I think that's a good application for a
new seq_lock.

I've attached an utterly untested patch that should do the trick.
Yasunori and KAME should probably take a look at it since the node
addition code is theirs.

-- Dave


Attachments:
zone-list-seqlock.patch (1.33 kB)

2008-11-05 00:39:25

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

On Tue, 04 Nov 2008 08:59:05 -0800
Dave Hansen <[email protected]> wrote:

> On Tue, 2008-11-04 at 17:34 +0100, Rafael J. Wysocki wrote:
> > Now, I need to do one more thing, which is to check how much memory has to be
> > freed before creating the image. For this purpose I need to lock memory
> > hotplug temporarily, count pages to free and unlock it. What interface should
> > I use for this purpose?
> >
> > [I'll also need to lock memory hotplug temporarily during resume.]
>
> We currently don't have any big switch to disable memory hotplug, like
> lock_memory_hotplug() or something. :)
>
> If you are simply scanning and counting pages, I think the best thing to
> use would be the zone_span_seq*() seqlock stuff. Do your count inside
> the seqlock's while loop. That covers detecting a zone changing while
> it is being scanned.
>
> The other case to detect is when a new zone gets added. These are
> really rare. Rare enough that we actually use a stop_machine() call in
> build_all_zonelists() to do it. All you would have to do is detect when
> one of these calls gets made. I think that's a good application for a
> new seq_lock.
>
> I've attached an utterly untested patch that should do the trick.
> Yasunori and KAME should probably take a look at it since the node
> addition code is theirs.
>

Hmm ? I think there is no real requirement for doing hibernation while
memory is under hotplug.

Assume following.
- memory hotplug can be triggerred by
1. interrupt from system.
2. "probe" interface in sysfs.
- ONLINE/OFFLINE is only trigerred by sysfs interface.

I believe we can't block "1", but "1" cannot be raised while hibernation.
(If it happens, it's mistake of the firmware.)

"probe" interface can be triggered from userland. Then it may be worth to be
blocked. How about to add device_pm_lock() to following place ?

- /sys/device/system/memory/probe
- ONLINE
- OFFLINE



off-topic:
BTW, I hear hibernation can be done by kexec + kexec-tools.
If so, boot option for disabling memory hotplug is enough for us, isn't it ?
Or there is long way to make use of it in real world ?

Thanks,
-Kame
















> -- Dave
>

2008-11-05 09:10:33

by Nigel Cunningham

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

Hi.

On Mon, 2008-11-03 at 23:53 -0800, Dave Hansen wrote:
> On Tue, 2008-11-04 at 18:30 +1100, Nigel Cunningham wrote:
> > One other question, if I may. Would you please explain (or point me to
> > an explanation) of PHYS_PFN_OFFSET/ARCH_PFN_OFFSET? I've been dealing
> > occasionally with people wanting to have hibernation on arm, and I don't
> > really get the concept or the implementation (particularly when it comes
> > to trying to do the sort of iterating over zones and pfns that was being
> > discussed in previous messages in this thread.
>
> First of all, I think PHYS_PFN_OFFSET is truly an arch-dependent
> construct. It only appears in arm an avr32. I'll tell you only how
> ARCH_PFN_OFFSET looks to me. My guess is that those two arches need to
> reconcile themselves and start using ARCH_PFN_OFFSET instead.
>
> In the old days, we only had memory that started at physical address 0x0
> and went up to some larger address. We allocated a mem_map[] of 'struct
> pages' in one big chunk, one for each address. mem_map[0] was for
> physical address 0x0 and mem_map[1] was for 0x1000, mem_map[2] was for
> 0x2000 and so on...
>
> If a machine didn't have a physical address 0x0, we allocated mem_map[]
> for it anyway and just wasted that entry. What ARCH_PFN_OFFSET does is
> let us bias the mem_map[] structure so that mem_map[0] does not
> represent 0x0.
>
> If ARCH_PFN_OFFSET is 1, then mem_map[0] actually represents the
> physical address 0x1000. If it is 2, then mem_map[0] represents
> physical addr 0x2000. ARCH_PFN_OFFSET means that the first physical
> address on the machine is at ARCH_PFN_OFFSET*PAGE_SIZE. We bias all
> lookups into the mem_map[] so that we don't waste space in it. There
> will never be a zone_start_pfn lower than ARCH_PFN_OFFSET, for instance.
>
> What does that mean for walking zones? Nothing. It only has meaning
> for how we allocate and do lookups into the mem_map[]. But, since
> everyone uses pfn_to_page() and friends, you don't ever see this.
>
> I'm curious why you think you need to be concerned with it.

Sorry for the delay in replying.

It's because I'm looking at old patches for arm support for TuxOnIce and
because of the way TuxOnIce records what pages need attention:

My method of recording what needs doing is different to Rafael's. I use
per zone bitmaps (constructed out of order 0 allocations) and therefore
look at zone_start_pfn in calculating what bit within the zone needs to
be set/cleared/tested.

In your example above, zone_start_pfn will be 1, won't it? If that's the
case, I shouldn't need to subtract ARCH_PFN_OFFSET to get the right
index within the zone and avoid the same wastage that ARCH_PFN_OFFSET
avoids with mem_map.

Nigel

2008-11-05 10:54:19

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

On Wednesday, 5 of November 2008, Nigel Cunningham wrote:
> Hi.
>
> On Mon, 2008-11-03 at 23:53 -0800, Dave Hansen wrote:
> > On Tue, 2008-11-04 at 18:30 +1100, Nigel Cunningham wrote:
> > > One other question, if I may. Would you please explain (or point me to
> > > an explanation) of PHYS_PFN_OFFSET/ARCH_PFN_OFFSET? I've been dealing
> > > occasionally with people wanting to have hibernation on arm, and I don't
> > > really get the concept or the implementation (particularly when it comes
> > > to trying to do the sort of iterating over zones and pfns that was being
> > > discussed in previous messages in this thread.
> >
> > First of all, I think PHYS_PFN_OFFSET is truly an arch-dependent
> > construct. It only appears in arm an avr32. I'll tell you only how
> > ARCH_PFN_OFFSET looks to me. My guess is that those two arches need to
> > reconcile themselves and start using ARCH_PFN_OFFSET instead.
> >
> > In the old days, we only had memory that started at physical address 0x0
> > and went up to some larger address. We allocated a mem_map[] of 'struct
> > pages' in one big chunk, one for each address. mem_map[0] was for
> > physical address 0x0 and mem_map[1] was for 0x1000, mem_map[2] was for
> > 0x2000 and so on...
> >
> > If a machine didn't have a physical address 0x0, we allocated mem_map[]
> > for it anyway and just wasted that entry. What ARCH_PFN_OFFSET does is
> > let us bias the mem_map[] structure so that mem_map[0] does not
> > represent 0x0.
> >
> > If ARCH_PFN_OFFSET is 1, then mem_map[0] actually represents the
> > physical address 0x1000. If it is 2, then mem_map[0] represents
> > physical addr 0x2000. ARCH_PFN_OFFSET means that the first physical
> > address on the machine is at ARCH_PFN_OFFSET*PAGE_SIZE. We bias all
> > lookups into the mem_map[] so that we don't waste space in it. There
> > will never be a zone_start_pfn lower than ARCH_PFN_OFFSET, for instance.
> >
> > What does that mean for walking zones? Nothing. It only has meaning
> > for how we allocate and do lookups into the mem_map[]. But, since
> > everyone uses pfn_to_page() and friends, you don't ever see this.
> >
> > I'm curious why you think you need to be concerned with it.
>
> Sorry for the delay in replying.
>
> It's because I'm looking at old patches for arm support for TuxOnIce and
> because of the way TuxOnIce records what pages need attention:
>
> My method of recording what needs doing is different to Rafael's. I use
> per zone bitmaps (constructed out of order 0 allocations) and therefore
> look at zone_start_pfn in calculating what bit within the zone needs to
> be set/cleared/tested.

Well, the mainline does pretty much the same at the moment, but the bitmaps
are probably different.

Thanks,
Rafael

2008-11-05 11:03:50

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

On Wednesday, 5 of November 2008, KAMEZAWA Hiroyuki wrote:
> On Tue, 04 Nov 2008 08:59:05 -0800
> Dave Hansen <[email protected]> wrote:
>
> > On Tue, 2008-11-04 at 17:34 +0100, Rafael J. Wysocki wrote:
> > > Now, I need to do one more thing, which is to check how much memory has to be
> > > freed before creating the image. For this purpose I need to lock memory
> > > hotplug temporarily, count pages to free and unlock it. What interface should
> > > I use for this purpose?
> > >
> > > [I'll also need to lock memory hotplug temporarily during resume.]
> >
> > We currently don't have any big switch to disable memory hotplug, like
> > lock_memory_hotplug() or something. :)
> >
> > If you are simply scanning and counting pages, I think the best thing to
> > use would be the zone_span_seq*() seqlock stuff. Do your count inside
> > the seqlock's while loop. That covers detecting a zone changing while
> > it is being scanned.
> >
> > The other case to detect is when a new zone gets added. These are
> > really rare. Rare enough that we actually use a stop_machine() call in
> > build_all_zonelists() to do it. All you would have to do is detect when
> > one of these calls gets made. I think that's a good application for a
> > new seq_lock.
> >
> > I've attached an utterly untested patch that should do the trick.
> > Yasunori and KAME should probably take a look at it since the node
> > addition code is theirs.
> >
>
> Hmm ? I think there is no real requirement for doing hibernation while
> memory is under hotplug.
>
> Assume following.
> - memory hotplug can be triggerred by
> 1. interrupt from system.
> 2. "probe" interface in sysfs.
> - ONLINE/OFFLINE is only trigerred by sysfs interface.
>
> I believe we can't block "1", but "1" cannot be raised while hibernation.
> (If it happens, it's mistake of the firmware.)
>
> "probe" interface can be triggered from userland. Then it may be worth to be
> blocked. How about to add device_pm_lock() to following place ?

This is not necessary as long as we freeze the userland before hibernation.
Still, this is one thing to remeber that the freezing is needed for. :-)

1. seems to be problematic, though, since we rely on zones remaining
unchanged while we're counting memory pages to free before hibernation
and this happens before the calling ->suspend() methods of device drivers.
Of course we can count free pages in a different way, but that will be a
substantial modification (I think).

How's the firmware supposed to be notified that hibernation is going to happen?

Rafael

2008-11-05 16:23:57

by Dave Hansen

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

On Wed, 2008-11-05 at 20:10 +1100, Nigel Cunningham wrote:
> In your example above, zone_start_pfn will be 1, won't it? If that's the
> case, I shouldn't need to subtract ARCH_PFN_OFFSET to get the right
> index within the zone and avoid the same wastage that ARCH_PFN_OFFSET
> avoids with mem_map.

Yeah, I don't think the first zone will ever start before
ARCH_PFN_OFFSET.

If the code just deals with starting at any random zone_start_pfn and
going to any other random zone_end_pfn without any waste, then it should
be fine in the presence of ARCH_PFN_OFFSET. The only trouble is if it
assumes memory to start at 0x0.

-- Dave

2008-11-06 00:15:29

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

On Wed, 5 Nov 2008 12:08:25 +0100
"Rafael J. Wysocki" <[email protected]> wrote:

> On Wednesday, 5 of November 2008, KAMEZAWA Hiroyuki wrote:
> > On Tue, 04 Nov 2008 08:59:05 -0800
> > Dave Hansen <[email protected]> wrote:
> >
> > > On Tue, 2008-11-04 at 17:34 +0100, Rafael J. Wysocki wrote:
> > > > Now, I need to do one more thing, which is to check how much memory has to be
> > > > freed before creating the image. For this purpose I need to lock memory
> > > > hotplug temporarily, count pages to free and unlock it. What interface should
> > > > I use for this purpose?
> > > >
> > > > [I'll also need to lock memory hotplug temporarily during resume.]
> > >
> > > We currently don't have any big switch to disable memory hotplug, like
> > > lock_memory_hotplug() or something. :)
> > >
> > > If you are simply scanning and counting pages, I think the best thing to
> > > use would be the zone_span_seq*() seqlock stuff. Do your count inside
> > > the seqlock's while loop. That covers detecting a zone changing while
> > > it is being scanned.
> > >
> > > The other case to detect is when a new zone gets added. These are
> > > really rare. Rare enough that we actually use a stop_machine() call in
> > > build_all_zonelists() to do it. All you would have to do is detect when
> > > one of these calls gets made. I think that's a good application for a
> > > new seq_lock.
> > >
> > > I've attached an utterly untested patch that should do the trick.
> > > Yasunori and KAME should probably take a look at it since the node
> > > addition code is theirs.
> > >
> >
> > Hmm ? I think there is no real requirement for doing hibernation while
> > memory is under hotplug.
> >
> > Assume following.
> > - memory hotplug can be triggerred by
> > 1. interrupt from system.
> > 2. "probe" interface in sysfs.
> > - ONLINE/OFFLINE is only trigerred by sysfs interface.
> >
> > I believe we can't block "1", but "1" cannot be raised while hibernation.
> > (If it happens, it's mistake of the firmware.)
> >
> > "probe" interface can be triggered from userland. Then it may be worth to be
> > blocked. How about to add device_pm_lock() to following place ?
>
> This is not necessary as long as we freeze the userland before hibernation.
> Still, this is one thing to remeber that the freezing is needed for. :-)
>
> 1. seems to be problematic, though, since we rely on zones remaining
> unchanged while we're counting memory pages to free before hibernation
> and this happens before the calling ->suspend() methods of device drivers.
> Of course we can count free pages in a different way, but that will be a
> substantial modification (I think).
>
> How's the firmware supposed to be notified that hibernation is going to happen?
>

Ok, please consider "when memory hotplug happens."

In general, it happens when
1. memory is inserted to slot.
2. the firmware notifes the system to enable already inserted memory.

To trigger "1", you have to open cover of server/pc. Do you open pc while the system
starts hibernation ? for usual people, no.

To trigger "2", the user have special console to tell firmware "enable this memory".
Such firmware console or users have to know "the system works well." And, more important,
when the system is suspended, the firmware can't do hotplug because the kernel is sleeping.
So, such firmware console or operator have to know the system status.

Am I missing some ? Current linux can know PCI/USB hotplug while the system is suspended ?


Thanks,
-Kame






2008-11-06 00:28:37

by Dave Hansen

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

On Thu, 2008-11-06 at 09:14 +0900, KAMEZAWA Hiroyuki wrote:
> Ok, please consider "when memory hotplug happens."
>
> In general, it happens when
> 1. memory is inserted to slot.
> 2. the firmware notifes the system to enable already inserted memory.
>
> To trigger "1", you have to open cover of server/pc. Do you open pc while the system
> starts hibernation ? for usual people, no.

You're right, this won't happen very often. We're trying to close a
theoretical hole that hasn't ever been observed in practice. But, we
don't exactly leave races in code just because we haven't observed them.
I think this is a classic race.

If we don't close it now, then someone doing some really weirdo hotplug
is going to run into it at some point. Who knows what tomorrow's
hardware/firmware will do?

> To trigger "2", the user have special console to tell firmware "enable this memory".
> Such firmware console or users have to know "the system works well." And, more important,
> when the system is suspended, the firmware can't do hotplug because the kernel is sleeping.
> So, such firmware console or operator have to know the system status.
>
> Am I missing some ? Current linux can know PCI/USB hotplug while the
> system is suspended ?

* echo 'disk' > /sys/power/state
* count number of pages to write to disk
* turn all interrupts off
* copy pages to disk
* power down

I think the race we're trying to close is the one between when we count
pages and when we turn interrupts off. I assume that there is a reason
that we don't do the *entire* hibernation process with interrupts off,
probably because it would "lock" the system up for too long, and can
even possibly fail.

-- Dave

2008-11-06 00:54:05

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

On Wed, 05 Nov 2008 16:28:01 -0800
Dave Hansen <[email protected]> wrote:

> On Thu, 2008-11-06 at 09:14 +0900, KAMEZAWA Hiroyuki wrote:
> > Ok, please consider "when memory hotplug happens."
> >
> > In general, it happens when
> > 1. memory is inserted to slot.
> > 2. the firmware notifes the system to enable already inserted memory.
> >
> > To trigger "1", you have to open cover of server/pc. Do you open pc while the system
> > starts hibernation ? for usual people, no.
>
> You're right, this won't happen very often. We're trying to close a
> theoretical hole that hasn't ever been observed in practice. But, we
> don't exactly leave races in code just because we haven't observed them.
> I think this is a classic race.
>
> If we don't close it now, then someone doing some really weirdo hotplug
> is going to run into it at some point. Who knows what tomorrow's
> hardware/firmware will do?
>
Hmm, people tend to make crazy hardware, oh yes. the pc may fly in the sky with rocket engine.

My answer to this was "add mutex used in hibernation to memory hotplug interface".
When the mutex blocks ONLINE/OFFLINE memory, the memory range which hibernation should save
will not change.

Possible solution for this interrupt handling is to do __add_memory() operation
in some kernel thread. Then, we can wait the mutex.

> > To trigger "2", the user have special console to tell firmware "enable this memory".
> > Such firmware console or users have to know "the system works well." And, more important,
> > when the system is suspended, the firmware can't do hotplug because the kernel is sleeping.
> > So, such firmware console or operator have to know the system status.
> >
> > Am I missing some ? Current linux can know PCI/USB hotplug while the
> > system is suspended ?
>
> * echo 'disk' > /sys/power/state
> * count number of pages to write to disk
> * turn all interrupts off
> * copy pages to disk
> * power down
>
> I think the race we're trying to close is the one between when we count
> pages and when we turn interrupts off. I assume that there is a reason
> that we don't do the *entire* hibernation process with interrupts off,
> probably because it would "lock" the system up for too long, and can
> even possibly fail.
>
Hmm, while interrupts off, lru_add_drain_all() or some smp_call_function() will be blocked.
Can't we do
while(..){
go to next mem_section
if (!section_is_available)
continue;
freeze this mem_section.
count pages should be saved.
write it to disk
}
per mem_section ? (maybe addling lock bit in mem_section->section_mem_map is enough.)

Hmm?

-Kame

2008-11-06 01:18:41

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

On Thu, 6 Nov 2008 09:14:41 +0900
KAMEZAWA Hiroyuki <[email protected]> wrote:

> Ok, please consider "when memory hotplug happens."
>
> In general, it happens when
> 1. memory is inserted to slot.
> 2. the firmware notifes the system to enable already inserted memory.
>
> To trigger "1", you have to open cover of server/pc. Do you open pc while the system
> starts hibernation ? for usual people, no.
>
> To trigger "2", the user have special console to tell firmware "enable this memory".
> Such firmware console or users have to know "the system works well." And, more important,
> when the system is suspended, the firmware can't do hotplug because the kernel is sleeping.
> So, such firmware console or operator have to know the system status.
>
> Am I missing some ? Current linux can know PCI/USB hotplug while the system is suspended ?
>
*OFFTOPIC*

I hear following answer from my friend.

- hibernate the system
=> plug USB memory
=> wake up the system
=> panic.
- hibernate the system
=> unplug USB memory
=> wake up the sytem
=> panic.

Thanks,
-Kame

2008-11-06 01:43:24

by Nigel Cunningham

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

Hi.

On Thu, 2008-11-06 at 10:17 +0900, KAMEZAWA Hiroyuki wrote:
> On Thu, 6 Nov 2008 09:14:41 +0900
> KAMEZAWA Hiroyuki <[email protected]> wrote:
>
> > Ok, please consider "when memory hotplug happens."
> >
> > In general, it happens when
> > 1. memory is inserted to slot.
> > 2. the firmware notifes the system to enable already inserted memory.
> >
> > To trigger "1", you have to open cover of server/pc. Do you open pc while the system
> > starts hibernation ? for usual people, no.
> >
> > To trigger "2", the user have special console to tell firmware "enable this memory".
> > Such firmware console or users have to know "the system works well." And, more important,
> > when the system is suspended, the firmware can't do hotplug because the kernel is sleeping.
> > So, such firmware console or operator have to know the system status.
> >
> > Am I missing some ? Current linux can know PCI/USB hotplug while the system is suspended ?
> >
> *OFFTOPIC*
>
> I hear following answer from my friend.
>
> - hibernate the system
> => plug USB memory
> => wake up the system
> => panic.
> - hibernate the system
> => unplug USB memory
> => wake up the sytem
> => panic.

We currently check that the number of physical pages is the same when
starting to load the image, so neither of these issues cause real
problems.

Regards,

Nigel

2008-11-06 01:55:40

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

On Thu, 06 Nov 2008 12:43:07 +1100
Nigel Cunningham <[email protected]> wrote:

> Hi.
>
> On Thu, 2008-11-06 at 10:17 +0900, KAMEZAWA Hiroyuki wrote:
> > On Thu, 6 Nov 2008 09:14:41 +0900
> > KAMEZAWA Hiroyuki <[email protected]> wrote:
> >
> > > Ok, please consider "when memory hotplug happens."
> > >
> > > In general, it happens when
> > > 1. memory is inserted to slot.
> > > 2. the firmware notifes the system to enable already inserted memory.
> > >
> > > To trigger "1", you have to open cover of server/pc. Do you open pc while the system
> > > starts hibernation ? for usual people, no.
> > >
> > > To trigger "2", the user have special console to tell firmware "enable this memory".
> > > Such firmware console or users have to know "the system works well." And, more important,
> > > when the system is suspended, the firmware can't do hotplug because the kernel is sleeping.
> > > So, such firmware console or operator have to know the system status.
> > >
> > > Am I missing some ? Current linux can know PCI/USB hotplug while the system is suspended ?
> > >
> > *OFFTOPIC*
> >
> > I hear following answer from my friend.
> >
> > - hibernate the system
> > => plug USB memory
> > => wake up the system
> > => panic.
> > - hibernate the system
> > => unplug USB memory
> > => wake up the sytem
> > => panic.
>
> We currently check that the number of physical pages is the same when
> starting to load the image, so neither of these issues cause real
> problems.
>
Hmm? this doesn't come from lost of hotplug interrupt ?
the memory plugged while the system is sleeping can be recognized when the system wakes up ?

My point is the firmware/operator has to know "the system is sleeping or not" to do *any* hotplug.
(I'm not sure but removing a cpu while the system is under hibernation may cause panic, too.)
In my point of view, this is operator's problem, not hibernation's.

If you want to fix the small race really, please add(or export) some mutex or notifier. like

NOTIFY_HIBERNATION_START
NOTIFY_HIBERNATION_END
NOTIFY_HIBERNATION_RESUME

other pepole will make use of this.
I think __add_memory called by interrupt can be executed in some kernel thread.

Thanks,
-Kame


2008-11-06 02:00:10

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

On Thu, 6 Nov 2008 10:54:53 +0900
KAMEZAWA Hiroyuki <[email protected]> wrote:

> On Thu, 06 Nov 2008 12:43:07 +1100
> Nigel Cunningham <[email protected]> wrote:
>
> > Hi.
> >
> > On Thu, 2008-11-06 at 10:17 +0900, KAMEZAWA Hiroyuki wrote:
> > > On Thu, 6 Nov 2008 09:14:41 +0900
> > > KAMEZAWA Hiroyuki <[email protected]> wrote:
> > >
> > > > Ok, please consider "when memory hotplug happens."
> > > >
> > > > In general, it happens when
> > > > 1. memory is inserted to slot.
> > > > 2. the firmware notifes the system to enable already inserted memory.
> > > >
> > > > To trigger "1", you have to open cover of server/pc. Do you open pc while the system
> > > > starts hibernation ? for usual people, no.
> > > >
> > > > To trigger "2", the user have special console to tell firmware "enable this memory".
> > > > Such firmware console or users have to know "the system works well." And, more important,
> > > > when the system is suspended, the firmware can't do hotplug because the kernel is sleeping.
> > > > So, such firmware console or operator have to know the system status.
> > > >
> > > > Am I missing some ? Current linux can know PCI/USB hotplug while the system is suspended ?
> > > >
> > > *OFFTOPIC*
> > >
> > > I hear following answer from my friend.
> > >
> > > - hibernate the system
> > > => plug USB memory
> > > => wake up the system
> > > => panic.
> > > - hibernate the system
> > > => unplug USB memory
> > > => wake up the sytem
> > > => panic.
> >
> > We currently check that the number of physical pages is the same when
> > starting to load the image, so neither of these issues cause real
> > problems.
> >
> Hmm? this doesn't come from lost of hotplug interrupt ?
> the memory plugged while the system is sleeping can be recognized when the system wakes up ?
>
> My point is the firmware/operator has to know "the system is sleeping or not" to do *any* hotplug.
> (I'm not sure but removing a cpu while the system is under hibernation may cause panic, too.)
> In my point of view, this is operator's problem, not hibernation's.

IOW, please just remove !HIBERNATION in Kconfig and add "WARNING" to Documentaion.
for *now*.

I hope some notifier_call_chain from hibernation will be added.


Thanks,
-Kame

2008-11-06 02:00:37

by Nigel Cunningham

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

Hi.

On Thu, 2008-11-06 at 10:54 +0900, KAMEZAWA Hiroyuki wrote:
> On Thu, 06 Nov 2008 12:43:07 +1100
> Nigel Cunningham <[email protected]> wrote:
>
> > Hi.
> >
> > On Thu, 2008-11-06 at 10:17 +0900, KAMEZAWA Hiroyuki wrote:
> > > On Thu, 6 Nov 2008 09:14:41 +0900
> > > KAMEZAWA Hiroyuki <[email protected]> wrote:
> > >
> > > > Ok, please consider "when memory hotplug happens."
> > > >
> > > > In general, it happens when
> > > > 1. memory is inserted to slot.
> > > > 2. the firmware notifes the system to enable already inserted memory.
> > > >
> > > > To trigger "1", you have to open cover of server/pc. Do you open pc while the system
> > > > starts hibernation ? for usual people, no.
> > > >
> > > > To trigger "2", the user have special console to tell firmware "enable this memory".
> > > > Such firmware console or users have to know "the system works well." And, more important,
> > > > when the system is suspended, the firmware can't do hotplug because the kernel is sleeping.
> > > > So, such firmware console or operator have to know the system status.
> > > >
> > > > Am I missing some ? Current linux can know PCI/USB hotplug while the system is suspended ?
> > > >
> > > *OFFTOPIC*
> > >
> > > I hear following answer from my friend.
> > >
> > > - hibernate the system
> > > => plug USB memory
> > > => wake up the system
> > > => panic.
> > > - hibernate the system
> > > => unplug USB memory
> > > => wake up the sytem
> > > => panic.
> >
> > We currently check that the number of physical pages is the same when
> > starting to load the image, so neither of these issues cause real
> > problems.
> >
> Hmm? this doesn't come from lost of hotplug interrupt ?
> the memory plugged while the system is sleeping can be recognized when the system wakes up ?

Remember that when we hibernate (assuming we don't then suspend to ram),
the power is fully off. Resuming starts off like a fresh boot.

> My point is the firmware/operator has to know "the system is sleeping or not" to do *any* hotplug.
> (I'm not sure but removing a cpu while the system is under hibernation may cause panic, too.)
> In my point of view, this is operator's problem, not hibernation's.

If a cpu is removed while we're hibernated, that's okay. We use
hotplugging and can therefore cope quite happily with cpus going away
while the system is powered down.

> If you want to fix the small race really, please add(or export) some mutex or notifier. like
>
> NOTIFY_HIBERNATION_START
> NOTIFY_HIBERNATION_END
> NOTIFY_HIBERNATION_RESUME
>
> other pepole will make use of this.
> I think __add_memory called by interrupt can be executed in some kernel thread.
>
> Thanks,
> -Kame

There are notifier chains for hibernation already
(PM_HIBERNATION_PREPARE, PM_RESTORE_PREPARE, PM_POST_RESTORE and
PM_POST_HIBERNATION).

Regards,

Nigel

2008-11-06 02:03:23

by Nigel Cunningham

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

Hi.

On Thu, 2008-11-06 at 09:53 +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 05 Nov 2008 16:28:01 -0800
> Dave Hansen <[email protected]> wrote:
>
> > On Thu, 2008-11-06 at 09:14 +0900, KAMEZAWA Hiroyuki wrote:
> > > Ok, please consider "when memory hotplug happens."
> > >
> > > In general, it happens when
> > > 1. memory is inserted to slot.
> > > 2. the firmware notifes the system to enable already inserted memory.
> > >
> > > To trigger "1", you have to open cover of server/pc. Do you open pc while the system
> > > starts hibernation ? for usual people, no.
> >
> > You're right, this won't happen very often. We're trying to close a
> > theoretical hole that hasn't ever been observed in practice. But, we
> > don't exactly leave races in code just because we haven't observed them.
> > I think this is a classic race.
> >
> > If we don't close it now, then someone doing some really weirdo hotplug
> > is going to run into it at some point. Who knows what tomorrow's
> > hardware/firmware will do?
> >
> Hmm, people tend to make crazy hardware, oh yes. the pc may fly in the sky with rocket engine.

It doesn't even have to be crazy. Just imagine someone bumping a button
on the case while plugging in the memory and that button being
configured to make the machine hibernate.

Regards,

Nigel

2008-11-06 02:07:57

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

On Thu, 06 Nov 2008 13:00:18 +1100
Nigel Cunningham <[email protected]> wrote:

> Hi.
>
> On Thu, 2008-11-06 at 10:54 +0900, KAMEZAWA Hiroyuki wrote:
> > On Thu, 06 Nov 2008 12:43:07 +1100
> > Nigel Cunningham <[email protected]> wrote:
> >
> > > Hi.
> > >
> > > On Thu, 2008-11-06 at 10:17 +0900, KAMEZAWA Hiroyuki wrote:
> > > > On Thu, 6 Nov 2008 09:14:41 +0900
> > > > KAMEZAWA Hiroyuki <[email protected]> wrote:
> > > >
> > > > > Ok, please consider "when memory hotplug happens."
> > > > >
> > > > > In general, it happens when
> > > > > 1. memory is inserted to slot.
> > > > > 2. the firmware notifes the system to enable already inserted memory.
> > > > >
> > > > > To trigger "1", you have to open cover of server/pc. Do you open pc while the system
> > > > > starts hibernation ? for usual people, no.
> > > > >
> > > > > To trigger "2", the user have special console to tell firmware "enable this memory".
> > > > > Such firmware console or users have to know "the system works well." And, more important,
> > > > > when the system is suspended, the firmware can't do hotplug because the kernel is sleeping.
> > > > > So, such firmware console or operator have to know the system status.
> > > > >
> > > > > Am I missing some ? Current linux can know PCI/USB hotplug while the system is suspended ?
> > > > >
> > > > *OFFTOPIC*
> > > >
> > > > I hear following answer from my friend.
> > > >
> > > > - hibernate the system
> > > > => plug USB memory
> > > > => wake up the system
> > > > => panic.
> > > > - hibernate the system
> > > > => unplug USB memory
> > > > => wake up the sytem
> > > > => panic.
> > >
> > > We currently check that the number of physical pages is the same when
> > > starting to load the image, so neither of these issues cause real
> > > problems.
> > >
> > Hmm? this doesn't come from lost of hotplug interrupt ?
> > the memory plugged while the system is sleeping can be recognized when the system wakes up ?
>
> Remember that when we hibernate (assuming we don't then suspend to ram),
> the power is fully off. Resuming starts off like a fresh boot.
>
It seems I don't study enough.

> > My point is the firmware/operator has to know "the system is sleeping or not" to do *any* hotplug.
> > (I'm not sure but removing a cpu while the system is under hibernation may cause panic, too.)
> > In my point of view, this is operator's problem, not hibernation's.
>
> If a cpu is removed while we're hibernated, that's okay. We use
> hotplugging and can therefore cope quite happily with cpus going away
> while the system is powered down.
>
Why cpu hotplugging works while the power is fully off ?
"Resuming starts off like a fresh boot." updates all necessary data strucutures ?

> > If you want to fix the small race really, please add(or export) some mutex or notifier. like
> >
> > NOTIFY_HIBERNATION_START
> > NOTIFY_HIBERNATION_END
> > NOTIFY_HIBERNATION_RESUME
> >
> > other pepole will make use of this.
> > I think __add_memory called by interrupt can be executed in some kernel thread.
> >
> > Thanks,
> > -Kame
>
> There are notifier chains for hibernation already
> (PM_HIBERNATION_PREPARE, PM_RESTORE_PREPARE, PM_POST_RESTORE and
> PM_POST_HIBERNATION).
>

Okay. then we can add "kernel thread for calling add/remove memory" and say
"PLEASE WAIT UNTIL HIBERNATION IS READY".

I can try that by myself but doesn't have suitable machine....
I think I can show you pseudo code in hours. please wait a bit.

Thanks,
-Kame

2008-11-06 02:14:21

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

On Thu, 06 Nov 2008 13:03:06 +1100
Nigel Cunningham <[email protected]> wrote:

> Hi.
>
> On Thu, 2008-11-06 at 09:53 +0900, KAMEZAWA Hiroyuki wrote:
> > On Wed, 05 Nov 2008 16:28:01 -0800
> > Dave Hansen <[email protected]> wrote:
> >
> > > On Thu, 2008-11-06 at 09:14 +0900, KAMEZAWA Hiroyuki wrote:
> > > > Ok, please consider "when memory hotplug happens."
> > > >
> > > > In general, it happens when
> > > > 1. memory is inserted to slot.
> > > > 2. the firmware notifes the system to enable already inserted memory.
> > > >
> > > > To trigger "1", you have to open cover of server/pc. Do you open pc while the system
> > > > starts hibernation ? for usual people, no.
> > >
> > > You're right, this won't happen very often. We're trying to close a
> > > theoretical hole that hasn't ever been observed in practice. But, we
> > > don't exactly leave races in code just because we haven't observed them.
> > > I think this is a classic race.
> > >
> > > If we don't close it now, then someone doing some really weirdo hotplug
> > > is going to run into it at some point. Who knows what tomorrow's
> > > hardware/firmware will do?
> > >
> > Hmm, people tend to make crazy hardware, oh yes. the pc may fly in the sky with rocket engine.
>
> It doesn't even have to be crazy. Just imagine someone bumping a button
> on the case while plugging in the memory and that button being
> configured to make the machine hibernate.
>
please don't start hibernation if cover is open....(if you can)

Thanks,
-Kame

2008-11-06 03:13:00

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

On Thu, 6 Nov 2008 11:07:09 +0900
KAMEZAWA Hiroyuki <[email protected]> wrote:
> Okay. then we can add "kernel thread for calling add/remove memory" and say
> "PLEASE WAIT UNTIL HIBERNATION IS READY".
>
> I can try that by myself but doesn't have suitable machine....
> I think I can show you pseudo code in hours. please wait a bit.
>

How about this one ? as a start step ?
I have no test environment. So please take this as a sample.
-Kame
=

Now, MEMORY_HOTPLUG and HIBERNATION is mutually exclusive in Kconfig.
That's because
- memory hotplug changes pgdat/zone/memmap range.
- hibernation countes # of memory based on pgdate/zone/memmap.

This patch adds rwsemaphore for making them not to run at the same time.

IIUC, add_memory() is called from following places.
- probe interface
- acpi handler
It seems both can sleep. (acpi handler uses sleepable memory allocator etc...)

online/offline handlers also sleeps.

Anyway, the most difficult thing is how to test this.


Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>

mm/Kconfig | 2 -
mm/memory_hotplug.c | 61 +++++++++++++++++++++++++++++++++++++++++++++-------
2 files changed, 54 insertions(+), 9 deletions(-)

Index: linux-2.6.27.4/mm/memory_hotplug.c
===================================================================
--- linux-2.6.27.4.orig/mm/memory_hotplug.c
+++ linux-2.6.27.4/mm/memory_hotplug.c
@@ -31,6 +31,13 @@

#include "internal.h"

+struct rw_semaphore memhp_mutex;
+
+static int init_memhp_mutex(void)
+{
+ init_rwsem(&memhp_mutex);
+}
+
/* add this memory to iomem resource */
static struct resource *register_memory_resource(u64 start, u64 size)
{
@@ -381,6 +388,7 @@ int online_pages(unsigned long pfn, unsi
int ret;
struct memory_notify arg;

+ down_read(&memhp_mutex);
arg.start_pfn = pfn;
arg.nr_pages = nr_pages;
arg.status_change_nid = -1;
@@ -392,6 +400,7 @@ int online_pages(unsigned long pfn, unsi
ret = memory_notify(MEM_GOING_ONLINE, &arg);
ret = notifier_to_errno(ret);
if (ret) {
+ up_read(&memhp_mutex);
memory_notify(MEM_CANCEL_ONLINE, &arg);
return ret;
}
@@ -415,6 +424,7 @@ int online_pages(unsigned long pfn, unsi
printk(KERN_DEBUG "online_pages %lx at %lx failed\n",
nr_pages, pfn);
memory_notify(MEM_CANCEL_ONLINE, &arg);
+ up_read(&memhp_mutex);
return ret;
}

@@ -436,7 +446,7 @@ int online_pages(unsigned long pfn, unsi

if (onlined_pages)
memory_notify(MEM_ONLINE, &arg);
-
+ up_read(&memhp_mutex);
return 0;
}
#endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
@@ -477,14 +487,18 @@ int add_memory(int nid, u64 start, u64 s
struct resource *res;
int ret;

+ down_read(&memhp_mutex);
res = register_memory_resource(start, size);
- if (!res)
+ if (!res) {
+ up_read(&memhp_mutex);
return -EEXIST;
-
+ }
if (!node_online(nid)) {
pgdat = hotadd_new_pgdat(nid, start);
- if (!pgdat)
+ if (!pgdat) {
+ up_read(&memhp_mutex);
return -ENOMEM;
+ }
new_pgdat = 1;
}

@@ -508,7 +522,7 @@ int add_memory(int nid, u64 start, u64 s
*/
BUG_ON(ret);
}
-
+ up_read(&memhp_mutex);
return ret;
error:
/* rollback pgdat allocation and others */
@@ -517,10 +531,12 @@ error:
if (res)
release_memory_resource(res);

+ up_read(&memhp_mutex);
return ret;
}
EXPORT_SYMBOL_GPL(add_memory);

+
#ifdef CONFIG_MEMORY_HOTREMOVE
/*
* A free page on the buddy free lists (not the per-cpu lists) has PageBuddy
@@ -750,20 +766,23 @@ int offline_pages(unsigned long start_pf
return -EINVAL;
if (!IS_ALIGNED(end_pfn, pageblock_nr_pages))
return -EINVAL;
+
/* This makes hotplug much easier...and readable.
we assume this for now. .*/
if (!test_pages_in_a_zone(start_pfn, end_pfn))
return -EINVAL;

+ down_read(&memhp_mutex);
zone = page_zone(pfn_to_page(start_pfn));
node = zone_to_nid(zone);
nr_pages = end_pfn - start_pfn;

/* set above range as isolated */
ret = start_isolate_page_range(start_pfn, end_pfn);
- if (ret)
+ if (ret) {
+ up_read(&memhp_mutex);
return ret;
-
+ }
arg.start_pfn = start_pfn;
arg.nr_pages = nr_pages;
arg.status_change_nid = -1;
@@ -838,6 +857,7 @@ repeat:
writeback_set_ratelimit();

memory_notify(MEM_OFFLINE, &arg);
+ up_read(&memhp_mutex);
return 0;

failed_removal:
@@ -846,7 +866,7 @@ failed_removal:
memory_notify(MEM_CANCEL_OFFLINE, &arg);
/* pushback to free area */
undo_isolate_page_range(start_pfn, end_pfn);
-
+ up_read(&memhp_mutex);
return ret;
}
#else
@@ -856,3 +876,34 @@ int remove_memory(u64 start, u64 size)
}
EXPORT_SYMBOL_GPL(remove_memory);
#endif /* CONFIG_MEMORY_HOTREMOVE */
+
+
+#ifdef CONFIG_HIBERNATION
+
+int memhp_hibenation_callback(struct notifier_block *self,
+ unsigned long action,
+ void *arg)
+{
+ int ret;
+
+ switch(action) {
+ case PM_HIBERNATION_PREPARE:
+ down_write(&memhp_mutex);
+ break;
+ case PM_POST_HIBERNATION:
+ up_write(&memhp_mutex);
+ break;
+ case PM_RESTORE_PREPARE:
+ down_write(&memhp_mutex);
+ break;
+ case PM_POST_RESTORE:
+ up_write(&memhp_mutex);
+ break;
+ default:
+ break;
+ }
+ return NOTIFY_OK;
+}
+#endif
+
+
Index: linux-2.6.27.4/mm/Kconfig
===================================================================
--- linux-2.6.27.4.orig/mm/Kconfig
+++ linux-2.6.27.4/mm/Kconfig
@@ -128,12 +128,9 @@ config SPARSEMEM_VMEMMAP
config MEMORY_HOTPLUG
bool "Allow for memory hot-add"
depends on SPARSEMEM || X86_64_ACPI_NUMA
- depends on HOTPLUG && !HIBERNATION && ARCH_ENABLE_MEMORY_HOTPLUG
+ depends on HOTPLUG && ARCH_ENABLE_MEMORY_HOTPLUG
depends on (IA64 || X86 || PPC64 || SUPERH || S390)

-comment "Memory hotplug is currently incompatible with Software Suspend"
- depends on SPARSEMEM && HOTPLUG && HIBERNATION
-
config MEMORY_HOTPLUG_SPARSE
def_bool y
depends on SPARSEMEM && MEMORY_HOTPLUG

2008-11-06 03:28:43

by Yasunori Goto

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug


Hmm.

I think simple mutex is enough. read/write lock is not necessary.
Memory hotplug event is very rare, and there is no requirement of
parallel execution.


> On Thu, 6 Nov 2008 11:07:09 +0900
> KAMEZAWA Hiroyuki <[email protected]> wrote:
> > Okay. then we can add "kernel thread for calling add/remove memory" and say
> > "PLEASE WAIT UNTIL HIBERNATION IS READY".
> >
> > I can try that by myself but doesn't have suitable machine....
> > I think I can show you pseudo code in hours. please wait a bit.
> >
>
> How about this one ? as a start step ?
> I have no test environment. So please take this as a sample.
> -Kame
> =
>
> Now, MEMORY_HOTPLUG and HIBERNATION is mutually exclusive in Kconfig.
> That's because
> - memory hotplug changes pgdat/zone/memmap range.
> - hibernation countes # of memory based on pgdate/zone/memmap.
>
> This patch adds rwsemaphore for making them not to run at the same time.
>
> IIUC, add_memory() is called from following places.
> - probe interface
> - acpi handler
> It seems both can sleep. (acpi handler uses sleepable memory allocator etc...)
>
> online/offline handlers also sleeps.
>
> Anyway, the most difficult thing is how to test this.
>
>
> Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
>
> mm/Kconfig | 2 -
> mm/memory_hotplug.c | 61 +++++++++++++++++++++++++++++++++++++++++++++-------
> 2 files changed, 54 insertions(+), 9 deletions(-)
>
> Index: linux-2.6.27.4/mm/memory_hotplug.c
> ===================================================================
> --- linux-2.6.27.4.orig/mm/memory_hotplug.c
> +++ linux-2.6.27.4/mm/memory_hotplug.c
> @@ -31,6 +31,13 @@
>
> #include "internal.h"
>
> +struct rw_semaphore memhp_mutex;
> +
> +static int init_memhp_mutex(void)
> +{
> + init_rwsem(&memhp_mutex);
> +}
> +
> /* add this memory to iomem resource */
> static struct resource *register_memory_resource(u64 start, u64 size)
> {
> @@ -381,6 +388,7 @@ int online_pages(unsigned long pfn, unsi
> int ret;
> struct memory_notify arg;
>
> + down_read(&memhp_mutex);
> arg.start_pfn = pfn;
> arg.nr_pages = nr_pages;
> arg.status_change_nid = -1;
> @@ -392,6 +400,7 @@ int online_pages(unsigned long pfn, unsi
> ret = memory_notify(MEM_GOING_ONLINE, &arg);
> ret = notifier_to_errno(ret);
> if (ret) {
> + up_read(&memhp_mutex);
> memory_notify(MEM_CANCEL_ONLINE, &arg);
> return ret;
> }
> @@ -415,6 +424,7 @@ int online_pages(unsigned long pfn, unsi
> printk(KERN_DEBUG "online_pages %lx at %lx failed\n",
> nr_pages, pfn);
> memory_notify(MEM_CANCEL_ONLINE, &arg);
> + up_read(&memhp_mutex);
> return ret;
> }
>
> @@ -436,7 +446,7 @@ int online_pages(unsigned long pfn, unsi
>
> if (onlined_pages)
> memory_notify(MEM_ONLINE, &arg);
> -
> + up_read(&memhp_mutex);
> return 0;
> }
> #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
> @@ -477,14 +487,18 @@ int add_memory(int nid, u64 start, u64 s
> struct resource *res;
> int ret;
>
> + down_read(&memhp_mutex);
> res = register_memory_resource(start, size);
> - if (!res)
> + if (!res) {
> + up_read(&memhp_mutex);
> return -EEXIST;
> -
> + }
> if (!node_online(nid)) {
> pgdat = hotadd_new_pgdat(nid, start);
> - if (!pgdat)
> + if (!pgdat) {
> + up_read(&memhp_mutex);
> return -ENOMEM;
> + }
> new_pgdat = 1;
> }
>
> @@ -508,7 +522,7 @@ int add_memory(int nid, u64 start, u64 s
> */
> BUG_ON(ret);
> }
> -
> + up_read(&memhp_mutex);
> return ret;
> error:
> /* rollback pgdat allocation and others */
> @@ -517,10 +531,12 @@ error:
> if (res)
> release_memory_resource(res);
>
> + up_read(&memhp_mutex);
> return ret;
> }
> EXPORT_SYMBOL_GPL(add_memory);
>
> +
> #ifdef CONFIG_MEMORY_HOTREMOVE
> /*
> * A free page on the buddy free lists (not the per-cpu lists) has PageBuddy
> @@ -750,20 +766,23 @@ int offline_pages(unsigned long start_pf
> return -EINVAL;
> if (!IS_ALIGNED(end_pfn, pageblock_nr_pages))
> return -EINVAL;
> +
> /* This makes hotplug much easier...and readable.
> we assume this for now. .*/
> if (!test_pages_in_a_zone(start_pfn, end_pfn))
> return -EINVAL;
>
> + down_read(&memhp_mutex);
> zone = page_zone(pfn_to_page(start_pfn));
> node = zone_to_nid(zone);
> nr_pages = end_pfn - start_pfn;
>
> /* set above range as isolated */
> ret = start_isolate_page_range(start_pfn, end_pfn);
> - if (ret)
> + if (ret) {
> + up_read(&memhp_mutex);
> return ret;
> -
> + }
> arg.start_pfn = start_pfn;
> arg.nr_pages = nr_pages;
> arg.status_change_nid = -1;
> @@ -838,6 +857,7 @@ repeat:
> writeback_set_ratelimit();
>
> memory_notify(MEM_OFFLINE, &arg);
> + up_read(&memhp_mutex);
> return 0;
>
> failed_removal:
> @@ -846,7 +866,7 @@ failed_removal:
> memory_notify(MEM_CANCEL_OFFLINE, &arg);
> /* pushback to free area */
> undo_isolate_page_range(start_pfn, end_pfn);
> -
> + up_read(&memhp_mutex);
> return ret;
> }
> #else
> @@ -856,3 +876,34 @@ int remove_memory(u64 start, u64 size)
> }
> EXPORT_SYMBOL_GPL(remove_memory);
> #endif /* CONFIG_MEMORY_HOTREMOVE */
> +
> +
> +#ifdef CONFIG_HIBERNATION
> +
> +int memhp_hibenation_callback(struct notifier_block *self,
> + unsigned long action,
> + void *arg)
> +{
> + int ret;
> +
> + switch(action) {
> + case PM_HIBERNATION_PREPARE:
> + down_write(&memhp_mutex);
> + break;
> + case PM_POST_HIBERNATION:
> + up_write(&memhp_mutex);
> + break;
> + case PM_RESTORE_PREPARE:
> + down_write(&memhp_mutex);
> + break;
> + case PM_POST_RESTORE:
> + up_write(&memhp_mutex);
> + break;
> + default:
> + break;
> + }
> + return NOTIFY_OK;
> +}
> +#endif
> +
> +
> Index: linux-2.6.27.4/mm/Kconfig
> ===================================================================
> --- linux-2.6.27.4.orig/mm/Kconfig
> +++ linux-2.6.27.4/mm/Kconfig
> @@ -128,12 +128,9 @@ config SPARSEMEM_VMEMMAP
> config MEMORY_HOTPLUG
> bool "Allow for memory hot-add"
> depends on SPARSEMEM || X86_64_ACPI_NUMA
> - depends on HOTPLUG && !HIBERNATION && ARCH_ENABLE_MEMORY_HOTPLUG
> + depends on HOTPLUG && ARCH_ENABLE_MEMORY_HOTPLUG
> depends on (IA64 || X86 || PPC64 || SUPERH || S390)
>
> -comment "Memory hotplug is currently incompatible with Software Suspend"
> - depends on SPARSEMEM && HOTPLUG && HIBERNATION
> -
> config MEMORY_HOTPLUG_SPARSE
> def_bool y
> depends on SPARSEMEM && MEMORY_HOTPLUG
>

--
Yasunori Goto

2008-11-06 06:04:55

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

On Thu, 06 Nov 2008 12:28:30 +0900
Yasunori Goto <[email protected]> wrote:

>
> Hmm.
>
> I think simple mutex is enough. read/write lock is not necessary.
> Memory hotplug event is very rare, and there is no requirement of
> parallel execution.
>

OK, change it to mutex and start new thread.

Thanks,
-Kame

>
> > On Thu, 6 Nov 2008 11:07:09 +0900
> > KAMEZAWA Hiroyuki <[email protected]> wrote:
> > > Okay. then we can add "kernel thread for calling add/remove memory" and say
> > > "PLEASE WAIT UNTIL HIBERNATION IS READY".
> > >
> > > I can try that by myself but doesn't have suitable machine....
> > > I think I can show you pseudo code in hours. please wait a bit.
> > >
> >
> > How about this one ? as a start step ?
> > I have no test environment. So please take this as a sample.
> > -Kame
> > =
> >
> > Now, MEMORY_HOTPLUG and HIBERNATION is mutually exclusive in Kconfig.
> > That's because
> > - memory hotplug changes pgdat/zone/memmap range.
> > - hibernation countes # of memory based on pgdate/zone/memmap.
> >
> > This patch adds rwsemaphore for making them not to run at the same time.
> >
> > IIUC, add_memory() is called from following places.
> > - probe interface
> > - acpi handler
> > It seems both can sleep. (acpi handler uses sleepable memory allocator etc...)
> >
> > online/offline handlers also sleeps.
> >
> > Anyway, the most difficult thing is how to test this.
> >
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
> >
> > mm/Kconfig | 2 -
> > mm/memory_hotplug.c | 61 +++++++++++++++++++++++++++++++++++++++++++++-------
> > 2 files changed, 54 insertions(+), 9 deletions(-)
> >
> > Index: linux-2.6.27.4/mm/memory_hotplug.c
> > ===================================================================
> > --- linux-2.6.27.4.orig/mm/memory_hotplug.c
> > +++ linux-2.6.27.4/mm/memory_hotplug.c
> > @@ -31,6 +31,13 @@
> >
> > #include "internal.h"
> >
> > +struct rw_semaphore memhp_mutex;
> > +
> > +static int init_memhp_mutex(void)
> > +{
> > + init_rwsem(&memhp_mutex);
> > +}
> > +
> > /* add this memory to iomem resource */
> > static struct resource *register_memory_resource(u64 start, u64 size)
> > {
> > @@ -381,6 +388,7 @@ int online_pages(unsigned long pfn, unsi
> > int ret;
> > struct memory_notify arg;
> >
> > + down_read(&memhp_mutex);
> > arg.start_pfn = pfn;
> > arg.nr_pages = nr_pages;
> > arg.status_change_nid = -1;
> > @@ -392,6 +400,7 @@ int online_pages(unsigned long pfn, unsi
> > ret = memory_notify(MEM_GOING_ONLINE, &arg);
> > ret = notifier_to_errno(ret);
> > if (ret) {
> > + up_read(&memhp_mutex);
> > memory_notify(MEM_CANCEL_ONLINE, &arg);
> > return ret;
> > }
> > @@ -415,6 +424,7 @@ int online_pages(unsigned long pfn, unsi
> > printk(KERN_DEBUG "online_pages %lx at %lx failed\n",
> > nr_pages, pfn);
> > memory_notify(MEM_CANCEL_ONLINE, &arg);
> > + up_read(&memhp_mutex);
> > return ret;
> > }
> >
> > @@ -436,7 +446,7 @@ int online_pages(unsigned long pfn, unsi
> >
> > if (onlined_pages)
> > memory_notify(MEM_ONLINE, &arg);
> > -
> > + up_read(&memhp_mutex);
> > return 0;
> > }
> > #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
> > @@ -477,14 +487,18 @@ int add_memory(int nid, u64 start, u64 s
> > struct resource *res;
> > int ret;
> >
> > + down_read(&memhp_mutex);
> > res = register_memory_resource(start, size);
> > - if (!res)
> > + if (!res) {
> > + up_read(&memhp_mutex);
> > return -EEXIST;
> > -
> > + }
> > if (!node_online(nid)) {
> > pgdat = hotadd_new_pgdat(nid, start);
> > - if (!pgdat)
> > + if (!pgdat) {
> > + up_read(&memhp_mutex);
> > return -ENOMEM;
> > + }
> > new_pgdat = 1;
> > }
> >
> > @@ -508,7 +522,7 @@ int add_memory(int nid, u64 start, u64 s
> > */
> > BUG_ON(ret);
> > }
> > -
> > + up_read(&memhp_mutex);
> > return ret;
> > error:
> > /* rollback pgdat allocation and others */
> > @@ -517,10 +531,12 @@ error:
> > if (res)
> > release_memory_resource(res);
> >
> > + up_read(&memhp_mutex);
> > return ret;
> > }
> > EXPORT_SYMBOL_GPL(add_memory);
> >
> > +
> > #ifdef CONFIG_MEMORY_HOTREMOVE
> > /*
> > * A free page on the buddy free lists (not the per-cpu lists) has PageBuddy
> > @@ -750,20 +766,23 @@ int offline_pages(unsigned long start_pf
> > return -EINVAL;
> > if (!IS_ALIGNED(end_pfn, pageblock_nr_pages))
> > return -EINVAL;
> > +
> > /* This makes hotplug much easier...and readable.
> > we assume this for now. .*/
> > if (!test_pages_in_a_zone(start_pfn, end_pfn))
> > return -EINVAL;
> >
> > + down_read(&memhp_mutex);
> > zone = page_zone(pfn_to_page(start_pfn));
> > node = zone_to_nid(zone);
> > nr_pages = end_pfn - start_pfn;
> >
> > /* set above range as isolated */
> > ret = start_isolate_page_range(start_pfn, end_pfn);
> > - if (ret)
> > + if (ret) {
> > + up_read(&memhp_mutex);
> > return ret;
> > -
> > + }
> > arg.start_pfn = start_pfn;
> > arg.nr_pages = nr_pages;
> > arg.status_change_nid = -1;
> > @@ -838,6 +857,7 @@ repeat:
> > writeback_set_ratelimit();
> >
> > memory_notify(MEM_OFFLINE, &arg);
> > + up_read(&memhp_mutex);
> > return 0;
> >
> > failed_removal:
> > @@ -846,7 +866,7 @@ failed_removal:
> > memory_notify(MEM_CANCEL_OFFLINE, &arg);
> > /* pushback to free area */
> > undo_isolate_page_range(start_pfn, end_pfn);
> > -
> > + up_read(&memhp_mutex);
> > return ret;
> > }
> > #else
> > @@ -856,3 +876,34 @@ int remove_memory(u64 start, u64 size)
> > }
> > EXPORT_SYMBOL_GPL(remove_memory);
> > #endif /* CONFIG_MEMORY_HOTREMOVE */
> > +
> > +
> > +#ifdef CONFIG_HIBERNATION
> > +
> > +int memhp_hibenation_callback(struct notifier_block *self,
> > + unsigned long action,
> > + void *arg)
> > +{
> > + int ret;
> > +
> > + switch(action) {
> > + case PM_HIBERNATION_PREPARE:
> > + down_write(&memhp_mutex);
> > + break;
> > + case PM_POST_HIBERNATION:
> > + up_write(&memhp_mutex);
> > + break;
> > + case PM_RESTORE_PREPARE:
> > + down_write(&memhp_mutex);
> > + break;
> > + case PM_POST_RESTORE:
> > + up_write(&memhp_mutex);
> > + break;
> > + default:
> > + break;
> > + }
> > + return NOTIFY_OK;
> > +}
> > +#endif
> > +
> > +
> > Index: linux-2.6.27.4/mm/Kconfig
> > ===================================================================
> > --- linux-2.6.27.4.orig/mm/Kconfig
> > +++ linux-2.6.27.4/mm/Kconfig
> > @@ -128,12 +128,9 @@ config SPARSEMEM_VMEMMAP
> > config MEMORY_HOTPLUG
> > bool "Allow for memory hot-add"
> > depends on SPARSEMEM || X86_64_ACPI_NUMA
> > - depends on HOTPLUG && !HIBERNATION && ARCH_ENABLE_MEMORY_HOTPLUG
> > + depends on HOTPLUG && ARCH_ENABLE_MEMORY_HOTPLUG
> > depends on (IA64 || X86 || PPC64 || SUPERH || S390)
> >
> > -comment "Memory hotplug is currently incompatible with Software Suspend"
> > - depends on SPARSEMEM && HOTPLUG && HIBERNATION
> > -
> > config MEMORY_HOTPLUG_SPARSE
> > def_bool y
> > depends on SPARSEMEM && MEMORY_HOTPLUG
> >
>
> --
> Yasunori Goto
>
>
>

2008-11-06 08:47:37

by Pavel Machek

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

Hi!

> > To trigger "2", the user have special console to tell firmware "enable this memory".
> > Such firmware console or users have to know "the system works well." And, more important,
> > when the system is suspended, the firmware can't do hotplug because the kernel is sleeping.
> > So, such firmware console or operator have to know the system status.
> >
> > Am I missing some ? Current linux can know PCI/USB hotplug while the
> > system is suspended ?
>
> * echo 'disk' > /sys/power/state
> * count number of pages to write to disk
> * turn all interrupts off
> * copy pages to disk
> * power down
>
> I think the race we're trying to close is the one between when we count
> pages and when we turn interrupts off. I assume that there is a reason
> that we don't do the *entire* hibernation process with interrupts off,
> probably because it would "lock" the system up for too long, and can
> even possibly fail.

We need interrupts to write pages to the disk.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-11-06 09:12:36

by Pavel Machek

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

> On Thu, 6 Nov 2008 09:14:41 +0900
> KAMEZAWA Hiroyuki <[email protected]> wrote:
>
> > Ok, please consider "when memory hotplug happens."
> >
> > In general, it happens when
> > 1. memory is inserted to slot.
> > 2. the firmware notifes the system to enable already inserted memory.
> >
> > To trigger "1", you have to open cover of server/pc. Do you open pc while the system
> > starts hibernation ? for usual people, no.
> >
> > To trigger "2", the user have special console to tell firmware "enable this memory".
> > Such firmware console or users have to know "the system works well." And, more important,
> > when the system is suspended, the firmware can't do hotplug because the kernel is sleeping.
> > So, such firmware console or operator have to know the system status.
> >
> > Am I missing some ? Current linux can know PCI/USB hotplug while the system is suspended ?
> >
> *OFFTOPIC*
>
> I hear following answer from my friend.
>
> - hibernate the system
> => plug USB memory
> => wake up the system
> => panic.

That should work. File a bugzilla if it does not.

> - hibernate the system
> => unplug USB memory
> => wake up the sytem
> => panic.

That should work ok as long as you unmount the drive first.
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-11-06 09:13:53

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

On Thu, 6 Nov 2008 10:12:18 +0100
Pavel Machek <[email protected]> wrote:

> > On Thu, 6 Nov 2008 09:14:41 +0900
> > KAMEZAWA Hiroyuki <[email protected]> wrote:
> >
> > > Ok, please consider "when memory hotplug happens."
> > >
> > > In general, it happens when
> > > 1. memory is inserted to slot.
> > > 2. the firmware notifes the system to enable already inserted memory.
> > >
> > > To trigger "1", you have to open cover of server/pc. Do you open pc while the system
> > > starts hibernation ? for usual people, no.
> > >
> > > To trigger "2", the user have special console to tell firmware "enable this memory".
> > > Such firmware console or users have to know "the system works well." And, more important,
> > > when the system is suspended, the firmware can't do hotplug because the kernel is sleeping.
> > > So, such firmware console or operator have to know the system status.
> > >
> > > Am I missing some ? Current linux can know PCI/USB hotplug while the system is suspended ?
> > >
> > *OFFTOPIC*
> >
> > I hear following answer from my friend.
> >
> > - hibernate the system
> > => plug USB memory
> > => wake up the system
> > => panic.
>
> That should work. File a bugzilla if it does not.
>
> > - hibernate the system
> > => unplug USB memory
> > => wake up the sytem
> > => panic.
>
> That should work ok as long as you unmount the drive first.
> Pavel
Ok. I'll confirm if I have a chance.
Sorry for rumor.

Thanks,
-Kame

2008-11-06 09:27:18

by Nigel Cunningham

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

Hi.

On Thu, 2008-11-06 at 18:12 +0900, KAMEZAWA Hiroyuki wrote:
> On Thu, 6 Nov 2008 10:12:18 +0100
> Pavel Machek <[email protected]> wrote:
>
> > > On Thu, 6 Nov 2008 09:14:41 +0900
> > > KAMEZAWA Hiroyuki <[email protected]> wrote:
> > >
> > > > Ok, please consider "when memory hotplug happens."
> > > >
> > > > In general, it happens when
> > > > 1. memory is inserted to slot.
> > > > 2. the firmware notifes the system to enable already inserted memory.
> > > >
> > > > To trigger "1", you have to open cover of server/pc. Do you open pc while the system
> > > > starts hibernation ? for usual people, no.
> > > >
> > > > To trigger "2", the user have special console to tell firmware "enable this memory".
> > > > Such firmware console or users have to know "the system works well." And, more important,
> > > > when the system is suspended, the firmware can't do hotplug because the kernel is sleeping.
> > > > So, such firmware console or operator have to know the system status.
> > > >
> > > > Am I missing some ? Current linux can know PCI/USB hotplug while the system is suspended ?
> > > >
> > > *OFFTOPIC*
> > >
> > > I hear following answer from my friend.
> > >
> > > - hibernate the system
> > > => plug USB memory
> > > => wake up the system
> > > => panic.
> >
> > That should work. File a bugzilla if it does not.
> >
> > > - hibernate the system
> > > => unplug USB memory
> > > => wake up the sytem
> > > => panic.
> >
> > That should work ok as long as you unmount the drive first.
> > Pavel
> Ok. I'll confirm if I have a chance.
> Sorry for rumor.

Oh, sorry. I read what I expected instead of what was there - thought
you were still walking about plugging and unplugging memory.

Nigel

2008-11-06 12:29:15

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] hibernation should work ok with memory hotplug

On Mon 2008-11-03 12:51:08, Andrew Morton wrote:
> On Wed, 29 Oct 2008 13:25:00 +0100
> "Rafael J. Wysocki" <[email protected]> wrote:
>
> > On Wednesday, 29 of October 2008, Pavel Machek wrote:
> > >
> > > hibernation + memory hotplug was disabled in kconfig because we could
> > > not handle hibernation + sparse mem at some point. It seems to work
> > > now, so I guess we can enable it.
> >
> > OK, if "it seems to work now" means that it has been tested and confirmed to
> > work, no objection from me.
>
> yes, that was not a terribly confidence-inspiring commit message.
>
> 3947be1969a9ce455ec30f60ef51efb10e4323d1 said "For now, disable memory
> hotplug when swsusp is enabled. There's a lot of churn there right
> now. We'll fix it up properly once it calms down." which is also
> rather rubbery.

I was not terribly confident. Sorry about that.

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-11-06 14:43:51

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

On Thu, 6 Nov 2008, KAMEZAWA Hiroyuki wrote:

> Am I missing some ? Current linux can know PCI/USB hotplug while the
> system is suspended ?

With some machines, yes, it can. Some computers provide minimal
"suspend current" to devices while the system is hibernating. This
allows the hardware to detect and remember hotplug events, which can
then be handled when the system wakes up.

Alan Stern

2008-11-06 14:47:33

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

On Thu, 6 Nov 2008, KAMEZAWA Hiroyuki wrote:

> Hmm, people tend to make crazy hardware, oh yes. the pc may fly in
> the sky with rocket engine.

This isn't crazy at all. I am currently working on an experiment
called RAISE (Rapid Acquisition Imaging Spectrograph Experiment) that
involves flying a rocket above the Earth's atmosphere to make
measurements of the Sun. The experiments in this rocket will be
controlled by a computer running Linux. See

http://www.swri.org/3pubs/ttoday/Spring06/Solar.htm

for a slightly out-of-date description.

Alan Stern

2008-11-06 14:48:32

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

On Thu, 6 Nov 2008, Nigel Cunningham wrote:

> Remember that when we hibernate (assuming we don't then suspend to ram),
> the power is fully off. Resuming starts off like a fresh boot.

That simply is not true. On ACPI systems, hibernation goes into the S4
state. Power fully off is S5.

Alan Stern

2008-11-06 20:47:01

by Nigel Cunningham

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

Hi.

On Thu, 2008-11-06 at 09:48 -0500, Alan Stern wrote:
> On Thu, 6 Nov 2008, Nigel Cunningham wrote:
>
> > Remember that when we hibernate (assuming we don't then suspend to ram),
> > the power is fully off. Resuming starts off like a fresh boot.
>
> That simply is not true. On ACPI systems, hibernation goes into the S4
> state. Power fully off is S5.

For the purposes of our discussion, it was a good enough description.
Nevertheless, you're right - if we do everything in a fully ACPI spec
compliant way, there will still be some power around. Of course we don't
always do that (can use S4 or S5).

Regards,

Nigel

2008-11-07 01:10:05

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug

On Thu, 6 Nov 2008 09:47:12 -0500 (EST)
Alan Stern <[email protected]> wrote:

> On Thu, 6 Nov 2008, KAMEZAWA Hiroyuki wrote:
>
> > Hmm, people tend to make crazy hardware, oh yes. the pc may fly in
> > the sky with rocket engine.
>
> This isn't crazy at all. I am currently working on an experiment
> called RAISE (Rapid Acquisition Imaging Spectrograph Experiment) that
> involves flying a rocket above the Earth's atmosphere to make
> measurements of the Sun. The experiments in this rocket will be
> controlled by a computer running Linux. See
>
> http://www.swri.org/3pubs/ttoday/Spring06/Solar.htm
>
> for a slightly out-of-date description.
>

interesitng :)

-Kame

> Alan Stern
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>