2009-04-12 15:29:24

by Ed Tomlinson

[permalink] [raw]
Subject: How movable is zone movable?

Hi,

How dependable should zone movable be? After a boot kvm is able to get enough hugepages to
back the session. After a day or two it becomes a lot less predictable. Sometimes it will swap out for 30 seconds
and then succeed other times it will fail. Interestingly, it sometimes works if I cancel the kvm session after
it tells me it cannot allocate the hugepages and immediatly restart. It there some way to determine what
is not respecting zone moveable? Or is zone moveable just a suggestion and not expected to really be
moveable?

I have the following set in sysctl.conf

# huge_pages with movablecore set to 3G
kernel.shmmax = 8589934592
vm.nr_hugepages = 128
vm.nr_overcommit_hugepages = 1408
vm.hugepages_treat_as_movable = 1
vm.hugetlb_shm_group = 1005

This is with any recient kernel release (2.6.28 and later)

Thanks,
Ed Tomlinson


2009-04-13 01:06:17

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: How movable is zone movable?

On Sun, 12 Apr 2009 11:29:09 -0400
Ed Tomlinson <[email protected]> wrote:

> Hi,
>
> How dependable should zone movable be? After a boot kvm is able to get enough hugepages to
> back the session. After a day or two it becomes a lot less predictable. Sometimes it will swap out for 30 seconds
> and then succeed other times it will fail. Interestingly, it sometimes works if I cancel the kvm session after
> it tells me it cannot allocate the hugepages and immediatly restart. It there some way to determine what
> is not respecting zone moveable? Or is zone moveable just a suggestion and not expected to really be
> moveable?
>
> I have the following set in sysctl.conf
>
> # huge_pages with movablecore set to 3G
> kernel.shmmax = 8589934592
> vm.nr_hugepages = 128
> vm.nr_overcommit_hugepages = 1408
> vm.hugepages_treat_as_movable = 1
> vm.hugetlb_shm_group = 1005
>
> This is with any recient kernel release (2.6.28 and later)
>

At first, "Movable" means that it's only includes anon/file-cache, they are migratable by page
migration (memory hotremove) and considered to be easy to be freed.
Unfortunately, "move/migrate memory" for memory recalim is not implemented yet. So, at allocating
hugepages, all necessary memory should be freed (swapped out).

Plz see /proc/meminfo before trying to allocate hugepages.
%cat /proc/meminfo
Then, ACTIVE+INACTIVE is current usage of "Movable" pages. (AnonPages means pages needs swap to be freed.)
(or plz see /proc/zoneinfo)

If ACTIVE+INACTIVE is near to 3G in your system, some amount of memory will be swapped out
at huge page allocation.

One tricky way to gain big unused chunk of memory is memory offline->online. This will use
page migration. (But I don't think it's tested widely.)


Thanks,
-Kame

2009-04-13 09:59:38

by Mel Gorman

[permalink] [raw]
Subject: Re: How movable is zone movable?

On Sun, Apr 12, 2009 at 11:29:09AM -0400, Ed Tomlinson wrote:
> Hi,
>
> How dependable should zone movable be?

It should be quite dependable for resizing the static hugepage pool,
particularly if you are willing to wait. However, with latest
libhugetlbfs, the hugeadm utility still has a --hard option incase you
need to wait a long time for the resize to occur.

> After a boot kvm is able to get enough hugepages to
> back the session. After a day or two it becomes a lot less predictable. Sometimes it will swap out for 30 seconds
> and then succeed other times it will fail.

Unfortunately, that sounds about right. If there is significant
fragmentation, it takes time to find out a contiguous area that can be
reclaimed. mlock() if it's a factor will complicate things further.

> Interestingly, it sometimes works if I cancel the kvm session after
> it tells me it cannot allocate the hugepages and immediatly restart. It there some way to determine what
> is not respecting zone moveable?

Ok, that is odd. Are base pages being mlocked()? If they are, they could be
the problem as the patches to move those pages around were never merged.

> Or is zone moveable just a suggestion and not expected to really be
> moveable?
>

They are expected to be movable, but pages that get locked do not get
moved at the moment.

> I have the following set in sysctl.conf
>
> # huge_pages with movablecore set to 3G
> kernel.shmmax = 8589934592
> vm.nr_hugepages = 128
> vm.nr_overcommit_hugepages = 1408
> vm.hugepages_treat_as_movable = 1
> vm.hugetlb_shm_group = 1005

How big did you make the movable zone and what is the hugepage size? I'm
guessing the hugepage size is 2MB but no harm in being sure. If that's
the case, the movable zone should be at least 2816MB to maximise success
rates when resizing.

>
> This is with any recient kernel release (2.6.28 and later)
>

That's pretty recent. /proc/vmstat will also tell you how many
allocations are actually failing.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2009-04-13 10:04:53

by Mel Gorman

[permalink] [raw]
Subject: Re: How movable is zone movable?

On Mon, Apr 13, 2009 at 10:59:25AM +0100, Mel Gorman wrote:
> > # huge_pages with movablecore set to 3G
>
> How big did you make the movable zone and what is the hugepage size?

Scratch that question, I missed you said it was 3G. Are pages being
mlocked()?

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2009-04-13 12:11:19

by Ed Tomlinson

[permalink] [raw]
Subject: Re: How movable is zone movable?

On Monday 13 April 2009 06:04:40 you wrote:
> On Mon, Apr 13, 2009 at 10:59:25AM +0100, Mel Gorman wrote:
> > > # huge_pages with movablecore set to 3G
> >
> > How big did you make the movable zone and what is the hugepage size?
>
> Scratch that question, I missed you said it was 3G. Are pages being
> mlocked()?

Looks like there are mlocked pages. What stopped the patches to allow these pages
to be migrated?

TIA
Ed

(from /proc/zoneinfo)
Node 0, zone Movable
pages free 35321
min 858
low 1072
high 1287
scanned 0 (aa: 2 ia: 0 af: 0 if: 0)
spanned 786432
present 775680
nr_free_pages 35321
nr_inactive_anon 29383
nr_active_anon 82452
nr_inactive_file 376826
nr_active_file 219586
nr_unevictable 816
nr_mlock 816
nr_anon_pages 110648
nr_mapped 28364
nr_file_pages 598492
nr_dirty 131
nr_writeback 0
nr_slab_reclaimable 0
nr_slab_unreclaimable 0
nr_page_table_pages 0
nr_unstable 0
nr_bounce 0
nr_vmscan_write 95
nr_writeback_temp 0
protection: (0, 0, 0, 0)
pagesets
cpu: 0
count: 7
high: 186
batch: 31
vm stats threshold: 24
cpu: 1
count: 7
high: 186
batch: 31
vm stats threshold: 24
cpu: 2
count: 14
high: 186
batch: 31
vm stats threshold: 24
all_unreclaimable: 0
prev_priority: 12
start_pfn: 1441792
inactive_ratio: 4

2009-04-14 09:18:53

by Mel Gorman

[permalink] [raw]
Subject: Re: How movable is zone movable?

On Mon, Apr 13, 2009 at 08:10:58AM -0400, Ed Tomlinson wrote:
> On Monday 13 April 2009 06:04:40 you wrote:
> > On Mon, Apr 13, 2009 at 10:59:25AM +0100, Mel Gorman wrote:
> > > > # huge_pages with movablecore set to 3G
> > >
> > > How big did you make the movable zone and what is the hugepage size?
> >
> > Scratch that question, I missed you said it was 3G. Are pages being
> > mlocked()?
>
> Looks like there are mlocked pages. What stopped the patches to allow these pages
> to be migrated?
>

My recollection is fuzzy but it was mainly down to three points.

1. I could occasionally lock up the system if under enough load using
page migration. I didn't have a reliable reproduction case to pin down
whether it was something in page migration or something wrong with the
way I was using it. There have been fixes in page migration since though
so it's possible that got fixed along the way.

2. At the time, memory partitioning and anti-fragmentation were not long in
mainline and dynamic hugepage pool resizing was very new. It was not clear
there were going to be users of dynamic hugepage pool resizing that would
need memory compaction as well. I was reasonasbly confident but that's
not users.

3. There were significant problems in hugetlbfs that needed fixing up
such as the reliability of MAP_PRIVATE. It was more important to chase
down the stuff people certainly needed than complete features that they
might need.

The patches weren't downright rejected as such but I didn't push hard as there
were almost zero users of dynamic hugepage pool resizing to be complaining
about mlock(). Improving hugetlbfs was more important and of obvious benefit
and I reckoned I'd wait and see who complained about mlock.

This has changed now. hugetlbfs is in way better state than it was and it
sounds like KVM is a user that is both interested in using dynamic pool
resizing and has mlocked pages. How much demand is there do you think?

There is someone currently working on helping the pool resizing from userspace
by temporarily adding a swap device during pool resize that will take a
while to complete. The plan was that they would revisit memory compaction
based on the prototype patches I did ages ago later in the year but maybe
that can be pushed along a bit harder if there was enough interest.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab