2004-01-23 05:00:52

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: swsusp vs pgdir

Hi !

I've been bored enough today to hack on getting the current
pmdisk/swsusp up on ppc. The required arch code should be almost
identical.

However, when looking at it, I didn't fully understand how you
actually ensure your page mappings aren't beeing blown away
behind your back during the copy operation on resume, but since
my knowledge of x86 is almost inexistant, I didn't decipher this
from the source code. Could you explain a bit ?

The thing is that you seem to point to the swapper pgdir during
the copy, that is the kernel page tables, but those are beeing
wiped out during the copy potentially, no ?

For PPC, I'm using a simple approach at first by disabling the
data translation on the MMU and using a BAT to keep the .text
mapped, though ultimately, if I want to support POWER4, I'll
have to allocate a temporary hash table in some place that
doesn't get overriden... That means a hook at a higher level in
the resume code path.

Thanks for the details,
Ben.





2004-01-23 07:43:30

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: swsusp vs pgdir


> We test that CPU has PSE feature. That means kernel is mapped using
> 4MB page tables, and I do not have to care about page tables at
> all.

Just enlighten me please: How do these 4Mb page tables work ? The pgdir
entries contain special bits ? Then you at least must make sure the
swapper_pgdir is left intact. This is the case ? (I also suppose you
mean the entire linear mapping, not just the kernel, is mapped with
4M pages)

Ben.


2004-01-23 07:34:38

by Pavel Machek

[permalink] [raw]
Subject: Re: swsusp vs pgdir

Hi!

> I've been bored enough today to hack on getting the current
> pmdisk/swsusp up on ppc. The required arch code should be almost
> identical.
>
> However, when looking at it, I didn't fully understand how you
> actually ensure your page mappings aren't beeing blown away
> behind your back during the copy operation on resume, but since
> my knowledge of x86 is almost inexistant, I didn't decipher this
> from the source code. Could you explain a bit ?
>
> The thing is that you seem to point to the swapper pgdir during
> the copy, that is the kernel page tables, but those are beeing
> wiped out during the copy potentially, no ?

We test that CPU has PSE feature. That means kernel is mapped using
4MB page tables, and I do not have to care about page tables at
all.
Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

2004-01-23 07:55:05

by Pavel Machek

[permalink] [raw]
Subject: Re: swsusp vs pgdir

Hi!

> > We test that CPU has PSE feature. That means kernel is mapped using
> > 4MB page tables, and I do not have to care about page tables at
> > all.
>
> Just enlighten me please: How do these 4Mb page tables work ? The pgdir
> entries contain special bits ? Then you at least must make sure the

The pgdir contains special bits, and there are no other levels of page
tables.

Now, I'm apparently rewriting swapper_pg_dir with itself (same
data). That's not too clean, but CPUs do not notice it...

> swapper_pgdir is left intact. This is the case ? (I also suppose you
> mean the entire linear mapping, not just the kernel, is mapped with
> 4M pages)

Yes.
Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

2004-01-23 08:21:32

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: swsusp vs pgdir


> Now, I'm apparently rewriting swapper_pg_dir with itself (same
> data). That's not too clean, but CPUs do not notice it...

Well... ok, you _hope_ it's same data :) I suppose with same kernel,
same text and same amount of RAM, you indeed have the same data, though
this is a bit hairy.

For PPC, I have to go to a different way though. I'll probably end up
allocating a small hash table for G5 like CPUs on resume outside of
the space that gets overwriten, though that is definitely a bit nasty,
since the minimum size of a hash table is 256K, so I'll need that
contiguous at least...

For now, I'm just disabling the data translation on the MMU and
assume the BAT is covering me for resume, but that's a bit hairy
too and definitely slow on some CPUs

Ben.


2004-01-23 16:10:58

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: swsusp vs pgdir

On Sat, 2004-01-24 at 03:03, Patrick Mochel wrote:
> > > swapper_pgdir is left intact. This is the case ? (I also suppose you
> > > mean the entire linear mapping, not just the kernel, is mapped with
> > > 4M pages)
> >
> > Yes.
>
> Not necessarily. Just kernel text and data. I don't have the code in front
> of me ATM, but there are simple checks you can do to determine the
> type/size of page.
>
> We don't have to care about userspace, though, once all processes are
> frozen, so we don't have to deal with the 4k pages.
>
> And the thing is, the only reason we require PSE and 4 MB pages is because
> it provides a 2-level page table instead of a 3-level, which by
> definition is easier to manage. :)

Wait... wait... If the whole linear mapping isn't mapped by this flat
pgdir, then we have a problem, since the MMU will have to go down the
kernel pagetables to actually access the pages data when copying them
around... but at this point, we are overriding the boot kernel page
tables with the loader ones, so ...

Ben.


2004-01-23 16:03:38

by Patrick Mochel

[permalink] [raw]
Subject: Re: swsusp vs pgdir


> > swapper_pgdir is left intact. This is the case ? (I also suppose you
> > mean the entire linear mapping, not just the kernel, is mapped with
> > 4M pages)
>
> Yes.

Not necessarily. Just kernel text and data. I don't have the code in front
of me ATM, but there are simple checks you can do to determine the
type/size of page.

We don't have to care about userspace, though, once all processes are
frozen, so we don't have to deal with the 4k pages.

And the thing is, the only reason we require PSE and 4 MB pages is because
it provides a 2-level page table instead of a 3-level, which by
definition is easier to manage. :)


Pat

2004-01-23 16:46:00

by Patrick Mochel

[permalink] [raw]
Subject: Re: swsusp vs pgdir


> Wait... wait... If the whole linear mapping isn't mapped by this flat
> pgdir, then we have a problem, since the MMU will have to go down the
> kernel pagetables to actually access the pages data when copying them
> around... but at this point, we are overriding the boot kernel page
> tables with the loader ones, so ...

A new pgdir is allocated on resume that does not overlap with any pages
being restored. See relocate_pagedir() in the code..

We assume that the kernel version is the same, and therefore that the code
and static data are in same locations in memory. So, even if the kernel
page tables get overwritten, we can still access the pointer to the pgdir.


Pat

2004-01-23 16:53:16

by Pavel Machek

[permalink] [raw]
Subject: Re: swsusp vs pgdir

Hi!

> > Wait... wait... If the whole linear mapping isn't mapped by this flat
> > pgdir, then we have a problem, since the MMU will have to go down the
> > kernel pagetables to actually access the pages data when copying them
> > around... but at this point, we are overriding the boot kernel page
> > tables with the loader ones, so ...
>
> A new pgdir is allocated on resume that does not overlap with any pages
> being restored. See relocate_pagedir() in the code..

Look again. relocate_pagedir() does not work with hardware page
directories. Instead, it deals with swsusp data structure telling it
what to copy where.

Okay, it could use some better name...
Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

2004-01-23 17:05:07

by Pavel Machek

[permalink] [raw]
Subject: Re: swsusp vs pgdir

Hi!

> > Wait... wait... If the whole linear mapping isn't mapped by this flat
> > pgdir, then we have a problem, since the MMU will have to go down the
> > kernel pagetables to actually access the pages data when copying them
> > around... but at this point, we are overriding the boot kernel page
> > tables with the loader ones, so ...
>
> A new pgdir is allocated on resume that does not overlap with any pages
> being restored. See relocate_pagedir() in the code..

Perhaps this should serve as a warning to people trying to understand
swsusp.c?

Pavel

--- tmp/linux/kernel/power/swsusp.c 2004-01-23 17:59:36.000000000 +0100
+++ linux/kernel/power/swsusp.c 2004-01-23 17:58:58.000000000 +0100
@@ -107,6 +107,10 @@
time of suspend, that must be freed. Second is "pagedir_nosave",
allocated at time of resume, that travels through memory not to
collide with anything.
+
+ Warning: this is even more evil than it seems. Pagedirs this files
+ talks about are completely different from page directories used by
+ MMU hardware.
*/
suspend_pagedir_t *pagedir_nosave __nosavedata = NULL;
static suspend_pagedir_t *pagedir_save;

--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

2004-01-24 01:03:45

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: swsusp vs pgdir

On Sat, 2004-01-24 at 03:45, Patrick Mochel wrote:

> A new pgdir is allocated on resume that does not overlap with any pages
> being restored. See relocate_pagedir() in the code..

Looking at the code, this is not a real HW pgdir but rather the
page copy list specific to swsusp. AFAIK, the HW pgdir is copied over.

> We assume that the kernel version is the same, and therefore that the code
> and static data are in same locations in memory. So, even if the kernel
> page tables get overwritten, we can still access the pointer to the pgdir.

Yes, the pgdir is there, but 1) it's getting overwriten, so if it
doesn't contain the same large page mapping on old and new, we are
screwed and 2) if accessing the linear mapping (when copying pages)
require going one level deeper into the page tables, then we are
possibly screwed too since those will be partly overwriten and won't
ever be in a "sane" state until the full copy is done.

Note that I don't have that problem with my current PPC hacks, as I
disable the MMU for the copy :)

Ben.