2004-06-19 00:45:46

by Ashwin Rao

[permalink] [raw]
Subject: Atomic operation for physically moving a page

I want to copy a page from one physical location to
another (taking the appr. locks). To keep the
operation of copying and updation of all ptes and
caches atomic one way proposed by my team members was
to sleep the processes accessing the page.
ptep_to_mm gives us the mm_struct but container_of
cannot help to get to task_struct as it contains a
mm_struct pointer. Is there any way of identifying the
proccess's from the pte_entry.
Is there any way out to solve my original problem of
keeping the whole operation of copying and updation
atomic as this is a bad solution for real time
processes but is there any other way out.

Ashwin




__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - Send 10MB messages!
http://promotions.yahoo.com/new_mail


2004-06-19 01:05:21

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: Atomic operation for physically moving a page

On Fri, 18 Jun 2004 17:37:12 PDT, Ashwin Rao <[email protected]> said:
> I want to copy a page from one physical location to
> another (taking the appr. locks).

At the risk of sounding stupid, what problem are you trying to solve by copying
a page? Not only (as you note) could the page be referenced by multiple
processes, it could (conceivably) belong to a kernel slab or something, or be a
buffer for an in-flight I/O request, or any number of other possibly-racy
situations.

If it's only a specific *type* of page, or explaining why you're trying to do
it, or what timing/etc constraints you have (if it's a sufficiently rare(*) case,
it might make sense to just grab the BKL and copy the page with a memcpy().)

(*) Yes, I know the BKL isn't something you want to grab if you can help it.
However, if we're on an unlikely error path or similar and other options aren't suitable...


Attachments:
(No filename) (226.00 B)

2004-06-19 02:44:00

by Dave Hansen

[permalink] [raw]
Subject: Re: Atomic operation for physically moving a page

On Fri, 2004-06-18 at 17:37, Ashwin Rao wrote:
> I want to copy a page from one physical location to
> another (taking the appr. locks). To keep the
> operation of copying and updation of all ptes and
> caches atomic one way proposed by my team members was
> to sleep the processes accessing the page.

How do you make sure that no more processes begin to access the page
while you're doing your work?

BTW, look at the swap code :)

-- Dave

2004-06-19 02:54:43

by Dave Hansen

[permalink] [raw]
Subject: Re: Atomic operation for physically moving a page

On Fri, 2004-06-18 at 18:03, [email protected] wrote:
> On Fri, 18 Jun 2004 17:37:12 PDT, Ashwin Rao <[email protected]> said:
> > I want to copy a page from one physical location to
> > another (taking the appr. locks).
>
> At the risk of sounding stupid, what problem are you trying to solve by copying
> a page? Not only (as you note) could the page be referenced by multiple
> processes, it could (conceivably) belong to a kernel slab or something, or be a
> buffer for an in-flight I/O request, or any number of other possibly-racy
> situations.

You also have to make sure that the page is something who's physical
address is allowed to change. Some stuff like DMA buffers, or a part of
a hugetlb page might not even be valid to move.

-- Dave

2004-06-19 03:15:40

by Ashwin Rao

[permalink] [raw]
Subject: Atomic operation for physically moving a page (for memory defragmentation)

--- [email protected] wrote:
> On Fri, 18 Jun 2004 17:37:12 PDT, Ashwin Rao
>said:
> > I want to copy a page from one physical location
> to
> > another (taking the appr. locks).
>
> At the risk of sounding stupid, what problem are you
> trying to solve by copying
> a page? Not only (as you note) could the page be
> referenced by multiple
> processes, it could (conceivably) belong to a kernel
> slab or something, or be a
> buffer for an in-flight I/O request, or any number
> of other possibly-racy
> situations.
>

The problem is the memory fragmentation. The code i am
writing is for the memory defragmentation as proposed
by Daniel Phillips, my project partner Alok mooley has
given mailed a simple prototype in the mid of feb.

> If it's only a specific *type* of page, or
> explaining why you're trying to do
> it, or what timing/etc constraints you have (if it's
> a sufficiently rare(*) case,
> it might make sense to just grab the BKL and copy
> the page with a memcpy().)
>

The pages in the LRU list are selected. As these pages
can be swapped they can moved to another location in
the memory.

> (*) Yes, I know the BKL isn't something you want to
> grab if you can help it.

Isnt it a bad idea to take the BKL, the performance of
SMP systems will drastically be hampered.

> However, if we're on an unlikely error path or
> similar and other options aren't suitable...
The way we work is as follows
Initially a block is selected which can be moved i.e
pages on lru or free and the pages are moved to a
suitable free pages. The main problem arises during
the copying and updation process. All the ptes are to
updates. a method similar to try_to_unmap_one is used
to identify the ptes and the physical address is
updated.

Maintaining atomicity in uniprocessor systems is easy
by preempt_enable and preempt_disable during the
operation. This implementation cannot be used for SMP
systems.
Now during the time a page is copied/updatede if a
page is accessed the copied contents become invalid,
as updation is not done. Also during updation a
similar situation might arise.
The problem we are facing is to maintain the atomicity
of this operation on SMP boxes.

Ashwin




__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - 100MB free storage!
http://promotions.yahoo.com/new_mail

2004-06-19 03:34:27

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: Atomic operation for physically moving a page (for memory defragmentation)

On Fri, 18 Jun 2004 20:15:36 PDT, Ashwin Rao said:

> The problem is the memory fragmentation. The code i am
> writing is for the memory defragmentation as proposed
> by Daniel Phillips, my project partner Alok mooley has
> given mailed a simple prototype in the mid of feb.

OK.. Now we're getting somewhere. ;) (Feel free to ignore
the rest - I'm *not* a memory management expert, but
a few thoughts come to mind - things that might help the
real experts answer the question..)

> > (*) Yes, I know the BKL isn't something you want to
> > grab if you can help it.
>
> Isnt it a bad idea to take the BKL, the performance of
> SMP systems will drastically be hampered.

As I noted - not something you *want* to grab. But sometimes,
especially when it's in error recovery, code may want to be able
to tell *everything* else to stay put for a moment while it figures
out what it needs to do next...

> The way we work is as follows
> Initially a block is selected which can be moved i.e
> pages on lru or free and the pages are moved to a

Out of curiosity, have you done any modeling to see how often
you need to move a page to coalesce holes and keep fragmentation
down? The "best" solution will quite likely be vastly different if it's
something that needs to be done only as a "last resort" (i.e. order-N
allocations are failing for non-large N), or if it's something that
works best if it's being done several times a second during normal
system operation, etc....

> suitable free pages. The main problem arises during
> the copying and updation process. All the ptes are to
> updates. a method similar to try_to_unmap_one is used
> to identify the ptes and the physical address is
> updated.

> The problem we are facing is to maintain the atomicity
> of this operation on SMP boxes.

Ahh.. Is there one thing in particular that causes the issues?
It may make sense to grab whatever lock usually controls that,
at least as a first-cut (what lock(s) are used by try_to_unmap_one,
for instance). There's probably already a suitable lock, already
grabbed by whatever code is interfering with what your code is doing..


Attachments:
(No filename) (226.00 B)

2004-06-19 04:25:54

by Dave Hansen

[permalink] [raw]
Subject: Re: Atomic operation for physically moving a page (for memory defragmentation)

On Fri, 2004-06-18 at 20:15, Ashwin Rao wrote:
> The problem is the memory fragmentation. The code i am
> writing is for the memory defragmentation as proposed
> by Daniel Phillips, my project partner Alok mooley has
> given mailed a simple prototype in the mid of feb.

Ahhh.... *That* code :) Do you have an updated version you'd like to
share? I'm curious how you integrated the suggestions from February.

> > (*) Yes, I know the BKL isn't something you want to
> > grab if you can help it.
>
> Isnt it a bad idea to take the BKL, the performance of
> SMP systems will drastically be hampered.

Only during a defragment operation. Are you planning to run the system
under constant defragmentation?

> > However, if we're on an unlikely error path or
> > similar and other options aren't suitable...
>
> Maintaining atomicity in uniprocessor systems is easy
> by preempt_enable and preempt_disable during the
> operation. This implementation cannot be used for SMP
> systems.
> Now during the time a page is copied/updatede if a
> page is accessed the copied contents become invalid,
> as updation is not done. Also during updation a
> similar situation might arise.
> The problem we are facing is to maintain the atomicity
> of this operation on SMP boxes.

I think what you really want to do is keep anybody else from making a
new pte to the page, once you've invalidated all of the existing ones,
right?

Holding a lock_page() should do the trick. Anybody that goes any pulls
the page out of the page cache has to do a lock_page() and check
page->mapping before they can establish a pte to it, so you can stop
that. Since you're invalidating page->mapping before you move the page
(you *are* doing this, right?), it will end up working itself out.

-- Dave

2004-06-23 09:04:04

by IWAMOTO Toshihiro

[permalink] [raw]
Subject: Re: Atomic operation for physically moving a page (for memory defragmentation)

At Fri, 18 Jun 2004 21:25:38 -0700,
Dave Hansen wrote:
> I think what you really want to do is keep anybody else from making a
> new pte to the page, once you've invalidated all of the existing ones,
> right?
>
> Holding a lock_page() should do the trick. Anybody that goes any pulls
> the page out of the page cache has to do a lock_page() and check
> page->mapping before they can establish a pte to it, so you can stop
> that. Since you're invalidating page->mapping before you move the page
> (you *are* doing this, right?), it will end up working itself out.

This isn't true unless the PG_uptodate bit of the page isn't cleared,
and properly doing that isn't so simple.

I'm planning to post a new version of my memory hotplug patch,
but the page migration code currently doesn't work well with
linux-2.6.7.

--
IWAMOTO Toshihiro

2004-06-23 10:36:28

by Hirokazu Takahashi

[permalink] [raw]
Subject: Re: Atomic operation for physically moving a page (for memory defragmentation)

Hi,

> > > I want to copy a page from one physical location
> > to
> > > another (taking the appr. locks).
> >
> > At the risk of sounding stupid, what problem are you
> > trying to solve by copying
> > a page? Not only (as you note) could the page be
> > referenced by multiple
> > processes, it could (conceivably) belong to a kernel
> > slab or something, or be a
> > buffer for an in-flight I/O request, or any number
> > of other possibly-racy
> > situations.
>
> The problem is the memory fragmentation. The code i am
> writing is for the memory defragmentation as proposed
> by Daniel Phillips, my project partner Alok mooley has
> given mailed a simple prototype in the mid of feb.

If you only care about anonymous memory, how do you think
about expanding the COW mechanism?

1. make all pages COW in a process space.
2. force to cause COW fault on the each page.
3. copy from the page to a new allocated page, and discard the old page.

You may preallocate new pages.


Thank you,
Hirokazu Takahashi.

2004-06-23 12:02:50

by Hirokazu Takahashi

[permalink] [raw]
Subject: Re: Atomic operation for physically moving a page (for memory defragmentation)

Hi,

> > > However, if we're on an unlikely error path or
> > > similar and other options aren't suitable...
> >
> > Maintaining atomicity in uniprocessor systems is easy
> > by preempt_enable and preempt_disable during the
> > operation. This implementation cannot be used for SMP
> > systems.
> > Now during the time a page is copied/updatede if a
> > page is accessed the copied contents become invalid,
> > as updation is not done. Also during updation a
> > similar situation might arise.
> > The problem we are facing is to maintain the atomicity
> > of this operation on SMP boxes.
>
> I think what you really want to do is keep anybody else from making a
> new pte to the page, once you've invalidated all of the existing ones,
> right?
>
> Holding a lock_page() should do the trick. Anybody that goes any pulls
> the page out of the page cache has to do a lock_page() and check
> page->mapping before they can establish a pte to it, so you can stop
> that. Since you're invalidating page->mapping before you move the page
> (you *are* doing this, right?), it will end up working itself out.

We should know that many part of kernel code will access the page
without holding a lock_page(). The lock_page() can't block them.

Thank you,
Hirokazu Takahashi.

2004-06-23 20:57:18

by Dave Hansen

[permalink] [raw]
Subject: Re: Atomic operation for physically moving a page (for memory defragmentation)

On Wed, 2004-06-23 at 04:59, Hirokazu Takahashi wrote:
> We should know that many part of kernel code will access the page
> without holding a lock_page(). The lock_page() can't block them.

No, but it will block them from establishing a new PTE to the page. You
need to:

1. make sure no new PTEs can be established to the page
2. make sure there are no valid PTEs to the page.
3. do the move

My suggestion relates to 1, only.

-- Dave

2004-06-24 07:20:18

by IWAMOTO Toshihiro

[permalink] [raw]
Subject: Re: Atomic operation for physically moving a page (for memory defragmentation)

At Wed, 23 Jun 2004 13:56:30 -0700,
Dave Hansen wrote:
>
> On Wed, 2004-06-23 at 04:59, Hirokazu Takahashi wrote:
> > We should know that many part of kernel code will access the page
> > without holding a lock_page(). The lock_page() can't block them.
>
> No, but it will block them from establishing a new PTE to the page. You
> need to:
>
> 1. make sure no new PTEs can be established to the page
> 2. make sure there are no valid PTEs to the page.
> 3. do the move
>
> My suggestion relates to 1, only.

I wonder if you are talking exclusively about swap (anonymous) pages,
where lock_page() might work.

(I wonder why lock_page() is needed in do_swap_page(), btw.)

For page caches, usually lock_page() cannot prevent accesses to them,
and there are several kernel functions which don't need PTE mappings
for access. One of such functions is do_generic_mapping_read().

--
IWAMOTO Toshihiro

2004-06-24 11:32:01

by Dave Hansen

[permalink] [raw]
Subject: Re: Atomic operation for physically moving a page (for memory defragmentation)

On Thu, 2004-06-24 at 00:19, IWAMOTO Toshihiro wrote:
> At Wed, 23 Jun 2004 13:56:30 -0700,
> Dave Hansen wrote:
> >
> > On Wed, 2004-06-23 at 04:59, Hirokazu Takahashi wrote:
> > > We should know that many part of kernel code will access the page
> > > without holding a lock_page(). The lock_page() can't block them.
> >
> > No, but it will block them from establishing a new PTE to the page. You
> > need to:
> >
> > 1. make sure no new PTEs can be established to the page
> > 2. make sure there are no valid PTEs to the page.
> > 3. do the move
> >
> > My suggestion relates to 1, only.
>
> I wonder if you are talking exclusively about swap (anonymous) pages,
> where lock_page() might work.

I was talking about access to the pages through the user page tables,
only. You can't really fully prevent other access to them, because some
other kernel user could always do something like kmap() and write to the
page. There's probably some handy-dandy way to trap these kinds of
accesses in hardware, but Linux itself certainly can't provide that
guarantee without some restructuring to check for these areas any time
that a set_pte() is done.

Remember, we don't do things like rmap for the *kernel* users of pages.

> (I wonder why lock_page() is needed in do_swap_page(), btw.)
>
> For page caches, usually lock_page() cannot prevent accesses to them,
> and there are several kernel functions which don't need PTE mappings
> for access. One of such functions is do_generic_mapping_read().

You'll also have a generic problem with anything that does DMA, or that
uses the kernel page tables of any kind (kmap, vmalloc, etc...).

The DMA problem is a lot easier when there's an IOMMU, and even easier
on a partitioned ppc64 system where we have a virtualization layer to
take care of any areas under DMA that might undergo remapping.

-- Dave