2010-02-03 21:11:18

by Rik van Riel

[permalink] [raw]
Subject: [PATCH] emulate accessed bit for EPT

Currently KVM pretends that pages with EPT mappings never got
accessed. This has some side effects in the VM, like swapping
out actively used guest pages and needlessly breaking up actively
used hugepages.

We can avoid those very costly side effects by emulating the
accessed bit for EPT PTEs, which should only be slightly costly
because pages pass through page_referenced infrequently.

TLB flushing is taken care of by kvm_mmu_notifier_clear_flush_young().

This seems to help prevent KVM guests from being swapped out when
they should not on my system.

Signed-off-by: Rik van Riel <[email protected]>
---
Jeff, does this patch fix the issue you saw a few months ago, with
a 256MB KVM guest in a cgroup limited to 128GB memory?

arch/x86/kvm/mmu.c | 10 ++++++++--
1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 89a49fb..6101615 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -856,9 +856,15 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
u64 *spte;
int young = 0;

- /* always return old for EPT */
+ /*
+ * Emulate the accessed bit for EPT, by checking if this page has
+ * an EPT mapping, and clearing it if it does. On the next access,
+ * a new EPT mapping will be established.
+ * This has some overhead, but not as much as the cost of swapping
+ * out actively used pages or breaking up actively used hugepages.
+ */
if (!shadow_accessed_mask)
- return 0;
+ return kvm_unmap_rmapp(kvm, rmapp, data);

spte = rmap_next(kvm, rmapp, NULL);
while (spte) {


2010-02-04 04:12:23

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] emulate accessed bit for EPT

* Rik van Riel <[email protected]> [2010-02-03 16:11:03]:

> Currently KVM pretends that pages with EPT mappings never got
> accessed. This has some side effects in the VM, like swapping
> out actively used guest pages and needlessly breaking up actively
> used hugepages.
>
> We can avoid those very costly side effects by emulating the
> accessed bit for EPT PTEs, which should only be slightly costly
> because pages pass through page_referenced infrequently.
>
> TLB flushing is taken care of by kvm_mmu_notifier_clear_flush_young().
>
> This seems to help prevent KVM guests from being swapped out when
> they should not on my system.
>
> Signed-off-by: Rik van Riel <[email protected]>
> ---
> Jeff, does this patch fix the issue you saw a few months ago, with
> a 256MB KVM guest in a cgroup limited to 128GB memory?
>
> arch/x86/kvm/mmu.c | 10 ++++++++--
> 1 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 89a49fb..6101615 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -856,9 +856,15 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
> u64 *spte;
> int young = 0;
>
> - /* always return old for EPT */
> + /*
> + * Emulate the accessed bit for EPT, by checking if this page has
> + * an EPT mapping, and clearing it if it does. On the next access,
> + * a new EPT mapping will be established.
> + * This has some overhead, but not as much as the cost of swapping
> + * out actively used pages or breaking up actively used hugepages.
> + */
> if (!shadow_accessed_mask)
> - return 0;
> + return kvm_unmap_rmapp(kvm, rmapp, data);
>

Quite a clever implementation, one side effect is that one would see a
larger number of minor faults with EPT enabled and an increase in
allocation/frees of rmap entries, but that can be easily explained.

--
Balbir

2010-02-04 13:41:00

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH] emulate accessed bit for EPT

On 02/03/2010 11:12 PM, Balbir Singh wrote:
> * Rik van Riel<[email protected]> [2010-02-03 16:11:03]:
>
>> Currently KVM pretends that pages with EPT mappings never got
>> accessed. This has some side effects in the VM, like swapping
>> out actively used guest pages and needlessly breaking up actively
>> used hugepages.
>>
>> We can avoid those very costly side effects by emulating the
>> accessed bit for EPT PTEs, which should only be slightly costly
>> because pages pass through page_referenced infrequently.

> Quite a clever implementation, one side effect is that one would see a
> larger number of minor faults with EPT enabled and an increase in
> allocation/frees of rmap entries, but that can be easily explained.

I suspect it won't be very many. I have been monitoring
/proc/meminfo on my system while testing this patch, and
it is quite typical that the size of the inactive anon
list does not change for minutes at a time.

In other words, no pages are moved onto or off of the
inactive anon list for several minutes. That corresponds
to a very small number of minor faults introduced by my
patch.

Of course, when the system is swapping, we will have more
minor faults. However, minor faults should be less of a
performance issue than major faults :)

--
All rights reversed.

2010-02-04 15:30:22

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] emulate accessed bit for EPT

* Rik van Riel <[email protected]> [2010-02-04 08:40:43]:

> On 02/03/2010 11:12 PM, Balbir Singh wrote:
> >* Rik van Riel<[email protected]> [2010-02-03 16:11:03]:
> >
> >>Currently KVM pretends that pages with EPT mappings never got
> >>accessed. This has some side effects in the VM, like swapping
> >>out actively used guest pages and needlessly breaking up actively
> >>used hugepages.
> >>
> >>We can avoid those very costly side effects by emulating the
> >>accessed bit for EPT PTEs, which should only be slightly costly
> >>because pages pass through page_referenced infrequently.
>
> >Quite a clever implementation, one side effect is that one would see a
> >larger number of minor faults with EPT enabled and an increase in
> >allocation/frees of rmap entries, but that can be easily explained.
>
> I suspect it won't be very many. I have been monitoring
> /proc/meminfo on my system while testing this patch, and
> it is quite typical that the size of the inactive anon
> list does not change for minutes at a time.
>
> In other words, no pages are moved onto or off of the
> inactive anon list for several minutes. That corresponds
> to a very small number of minor faults introduced by my
> patch.
>
> Of course, when the system is swapping, we will have more
> minor faults. However, minor faults should be less of a
> performance issue than major faults :)
>

I do agree with you.

--
Balbir

2010-02-04 15:41:40

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH] emulate accessed bit for EPT

Balbir Singh wrote:
> * Rik van Riel <[email protected]> [2010-02-04 08:40:43]:
>
>> On 02/03/2010 11:12 PM, Balbir Singh wrote:
>>> * Rik van Riel<[email protected]> [2010-02-03 16:11:03]:
>>>
>>>> Currently KVM pretends that pages with EPT mappings never got
>>>> accessed. This has some side effects in the VM, like swapping
>>>> out actively used guest pages and needlessly breaking up actively
>>>> used hugepages.
>>>>
>>>> We can avoid those very costly side effects by emulating the
>>>> accessed bit for EPT PTEs, which should only be slightly costly
>>>> because pages pass through page_referenced infrequently.
>>> Quite a clever implementation, one side effect is that one would see a
>>> larger number of minor faults with EPT enabled and an increase in
>>> allocation/frees of rmap entries, but that can be easily explained.
>> I suspect it won't be very many. I have been monitoring
>> /proc/meminfo on my system while testing this patch, and
>> it is quite typical that the size of the inactive anon
>> list does not change for minutes at a time.
>>
>> In other words, no pages are moved onto or off of the
>> inactive anon list for several minutes. That corresponds
>> to a very small number of minor faults introduced by my
>> patch.
>>
>> Of course, when the system is swapping, we will have more
>> minor faults. However, minor faults should be less of a
>> performance issue than major faults :)
>>
>
> I do agree with you.

After 20 hours of uptime, it appears that this patch has
resolved the "KVM guests get swapped while buffer and page
cache stay in memory" problem my home system was experiencing.

2010-02-04 15:52:39

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] emulate accessed bit for EPT

* Rik van Riel <[email protected]> [2010-02-04 10:41:14]:

> Balbir Singh wrote:
> >* Rik van Riel <[email protected]> [2010-02-04 08:40:43]:
> >
> >>On 02/03/2010 11:12 PM, Balbir Singh wrote:
> >>>* Rik van Riel<[email protected]> [2010-02-03 16:11:03]:
> >>>
> >>>>Currently KVM pretends that pages with EPT mappings never got
> >>>>accessed. This has some side effects in the VM, like swapping
> >>>>out actively used guest pages and needlessly breaking up actively
> >>>>used hugepages.
> >>>>
> >>>>We can avoid those very costly side effects by emulating the
> >>>>accessed bit for EPT PTEs, which should only be slightly costly
> >>>>because pages pass through page_referenced infrequently.
> >>>Quite a clever implementation, one side effect is that one would see a
> >>>larger number of minor faults with EPT enabled and an increase in
> >>>allocation/frees of rmap entries, but that can be easily explained.
> >>I suspect it won't be very many. I have been monitoring
> >>/proc/meminfo on my system while testing this patch, and
> >>it is quite typical that the size of the inactive anon
> >>list does not change for minutes at a time.
> >>
> >>In other words, no pages are moved onto or off of the
> >>inactive anon list for several minutes. That corresponds
> >>to a very small number of minor faults introduced by my
> >>patch.
> >>
> >>Of course, when the system is swapping, we will have more
> >>minor faults. However, minor faults should be less of a
> >>performance issue than major faults :)
> >>
> >
> >I do agree with you.
>
> After 20 hours of uptime, it appears that this patch has
> resolved the "KVM guests get swapped while buffer and page
> cache stay in memory" problem my home system was experiencing.

Is this with cgroups enabled as defined by the setup Jeff had?

--
Balbir

2010-02-04 16:17:59

by Jeff Dike

[permalink] [raw]
Subject: Re: [PATCH] emulate accessed bit for EPT

On Wed, Feb 03, 2010 at 04:11:03PM -0500, Rik van Riel wrote:
> Jeff, does this patch fix the issue you saw a few months ago, with
> a 256MB KVM guest in a cgroup limited to 128GB memory?

Hum, let me dust off that workload and give it a shot...

Jeff

2010-02-04 17:47:36

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH] emulate accessed bit for EPT

On Thu, Feb 04, 2010 at 08:40:43AM -0500, Rik van Riel wrote:
> I suspect it won't be very many. I have been monitoring
> /proc/meminfo on my system while testing this patch, and
> it is quite typical that the size of the inactive anon
> list does not change for minutes at a time.
>
> In other words, no pages are moved onto or off of the
> inactive anon list for several minutes. That corresponds
> to a very small number of minor faults introduced by my
> patch.

When there's light VM pressure, ideally there should be zero overhead
caused by the patch. When there is VM pressure this will avoid some
unnecessary I/O which should outweight the minor faults. It should be
a good default behavior.

2010-02-05 17:39:23

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH] emulate accessed bit for EPT

On Thu, Feb 04, 2010 at 06:47:15PM +0100, Andrea Arcangeli wrote:
> On Thu, Feb 04, 2010 at 08:40:43AM -0500, Rik van Riel wrote:
> > I suspect it won't be very many. I have been monitoring
> > /proc/meminfo on my system while testing this patch, and
> > it is quite typical that the size of the inactive anon
> > list does not change for minutes at a time.
> >
> > In other words, no pages are moved onto or off of the
> > inactive anon list for several minutes. That corresponds
> > to a very small number of minor faults introduced by my
> > patch.
>
> When there's light VM pressure, ideally there should be zero overhead
> caused by the patch. When there is VM pressure this will avoid some
> unnecessary I/O which should outweight the minor faults. It should be
> a good default behavior.

Agree.

But perhaps a module parameter to turn accessed bit emulation off might
be handy in the future?

2010-02-05 18:14:38

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH] emulate accessed bit for EPT

On Fri, Feb 05, 2010 at 03:34:23PM -0200, Marcelo Tosatti wrote:
> But perhaps a module parameter to turn accessed bit emulation off might
> be handy in the future?

Maybe, but somebody should show that this can overall become a
downside, which I doubt... I think if it does, the VM is to blame for
calling page_referenced when there is no point to do so just yet.

2010-02-07 19:22:29

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH] emulate accessed bit for EPT

On Fri, Feb 05, 2010 at 07:14:13PM +0100, Andrea Arcangeli wrote:
> On Fri, Feb 05, 2010 at 03:34:23PM -0200, Marcelo Tosatti wrote:
> > But perhaps a module parameter to turn accessed bit emulation off might
> > be handy in the future?
>
> Maybe, but somebody should show that this can overall become a
> downside, which I doubt... I think if it does, the VM is to blame for
> calling page_referenced when there is no point to do so just yet.

Agreed. ACK.

2010-02-08 10:27:35

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH] emulate accessed bit for EPT

On 02/03/2010 11:11 PM, Rik van Riel wrote:
> Currently KVM pretends that pages with EPT mappings never got
> accessed. This has some side effects in the VM, like swapping
> out actively used guest pages and needlessly breaking up actively
> used hugepages.
>
> We can avoid those very costly side effects by emulating the
> accessed bit for EPT PTEs, which should only be slightly costly
> because pages pass through page_referenced infrequently.
>
> TLB flushing is taken care of by kvm_mmu_notifier_clear_flush_young().
>
> This seems to help prevent KVM guests from being swapped out when
> they should not on my system.
>
>

Applied, thanks.

>
> - /* always return old for EPT */
> + /*
> + * Emulate the accessed bit for EPT, by checking if this page has
> + * an EPT mapping, and clearing it if it does. On the next access,
> + * a new EPT mapping will be established.
> + * This has some overhead, but not as much as the cost of swapping
> + * out actively used pages or breaking up actively used hugepages.
> + */
> if (!shadow_accessed_mask)
> - return 0;
> + return kvm_unmap_rmapp(kvm, rmapp, data);
>

This could be optimized by using a software-available bit for 'present'
and the rwx bits for young, that is:

(present, rwx) -> the page is present and recently accessed, will not
cause EPT violation
(present, !rwx) -> page is present but old, will cause EPT violation
but not rmap games and get_user_pages_fast().

However that's best done later if ever.

--
error compiling committee.c: too many arguments to function