2019-07-17 11:22:08

by Michael S. Tsirkin

[permalink] [raw]
Subject: use of shrinker in virtio balloon free page hinting

Wei, others,

ATM virtio_balloon_shrinker_scan will only get registered
when deflate on oom feature bit is set.

Not sure whether that's intentional. Assuming it is:

virtio_balloon_shrinker_scan will try to locate and free
pages that are processed by host.
The above seems broken in several ways:
- count ignores the free page list completely
- if free pages are being reported, pages freed
by shrinker will just get re-allocated again

I was unable to make this part of code behave in any reasonable
way - was shrinker usage tested? What's a good way to test that?

Thanks!

--
MST


2019-07-17 11:32:54

by David Hildenbrand

[permalink] [raw]
Subject: Re: use of shrinker in virtio balloon free page hinting

On 17.07.19 13:20, Michael S. Tsirkin wrote:
> Wei, others,
>
> ATM virtio_balloon_shrinker_scan will only get registered
> when deflate on oom feature bit is set.
>
> Not sure whether that's intentional. Assuming it is:
>
> virtio_balloon_shrinker_scan will try to locate and free
> pages that are processed by host.
> The above seems broken in several ways:
> - count ignores the free page list completely
> - if free pages are being reported, pages freed
> by shrinker will just get re-allocated again
>
> I was unable to make this part of code behave in any reasonable
> way - was shrinker usage tested? What's a good way to test that?
>
> Thanks!
>

Some companies are using deflate-on-oom for some kind of "auto
ballooning" approach, although I don't think it's a good idea.

In these scenarios, the total ramsize (cat /proc/meminfo) will not
change on inflation/deflation. So from a system POV, inflated memory is
simply allocated memory without affecting the total memory.

VMs will automatically "reclaim" inflated memory when they need it -
which is usually not what hypervisors want (especially when talking
about using ballooning for memory hotunplug).

So yes, it makes perfect sense that the shrinker is only registered for
deflate-on-oom.

--

Thanks,

David / dhildenb

2019-07-17 14:11:52

by David Hildenbrand

[permalink] [raw]
Subject: Re: use of shrinker in virtio balloon free page hinting

On 17.07.19 13:20, Michael S. Tsirkin wrote:
> Wei, others,
>
> ATM virtio_balloon_shrinker_scan will only get registered
> when deflate on oom feature bit is set.
>
> Not sure whether that's intentional. Assuming it is:
>
> virtio_balloon_shrinker_scan will try to locate and free
> pages that are processed by host.
> The above seems broken in several ways:
> - count ignores the free page list completely
> - if free pages are being reported, pages freed
> by shrinker will just get re-allocated again

Trying to answer your questions (not sure if I fully understood what you
mean)

virtio_balloon_shrinker_scan() will not be called due to inflation
requests (balloon_page_alloc()). It will be called whenever the system
is OOM, e.g., when starting a new application.

I assume you were expecting the shrinker getting called due to
balloon_page_alloc(). however, that is not the case as we pass
"__GFP_NORETRY".


To test, something like:

1. Start a VM with

-device virtio-balloon-pci,deflate-on-oom=true

2. Inflate the balloon, e.g.,

QMP: balloon 1024
QMP: info balloon
-> 1024

See how "MemTotal" in /proc/meminfo in the guest won't change

3. Run a workload that exhausts memory in the guest (OOM).

See how the balloon was automatically deflated

QMP: info balloon
-> Something bigger than 1024


Not sure if it is broken, last time I played with it, it worked, but
that was ~1-2 years ago.

--

Thanks,

David / dhildenb

2019-07-17 14:36:38

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: use of shrinker in virtio balloon free page hinting

On Wed, Jul 17, 2019 at 04:10:47PM +0200, David Hildenbrand wrote:
> On 17.07.19 13:20, Michael S. Tsirkin wrote:
> > Wei, others,
> >
> > ATM virtio_balloon_shrinker_scan will only get registered
> > when deflate on oom feature bit is set.
> >
> > Not sure whether that's intentional. Assuming it is:
> >
> > virtio_balloon_shrinker_scan will try to locate and free
> > pages that are processed by host.
> > The above seems broken in several ways:
> > - count ignores the free page list completely
> > - if free pages are being reported, pages freed
> > by shrinker will just get re-allocated again
>
> Trying to answer your questions (not sure if I fully understood what you
> mean)
>
> virtio_balloon_shrinker_scan() will not be called due to inflation
> requests (balloon_page_alloc()). It will be called whenever the system
> is OOM, e.g., when starting a new application.
>
> I assume you were expecting the shrinker getting called due to
> balloon_page_alloc(). however, that is not the case as we pass
> "__GFP_NORETRY".

Right but it's possible we exhaust all memory, then
someone else asks for a single page and that invokes
the shrinker.

>
> To test, something like:
>
> 1. Start a VM with
>
> -device virtio-balloon-pci,deflate-on-oom=true
>
> 2. Inflate the balloon, e.g.,
>
> QMP: balloon 1024
> QMP: info balloon
> -> 1024
>
> See how "MemTotal" in /proc/meminfo in the guest won't change
>
> 3. Run a workload that exhausts memory in the guest (OOM).
>
> See how the balloon was automatically deflated
>
> QMP: info balloon
> -> Something bigger than 1024
>
>
> Not sure if it is broken, last time I played with it, it worked, but
> that was ~1-2 years ago.
>
> --
>
> Thanks,
>
> David / dhildenb

Sorry I was unclear. The question was about
VIRTIO_BALLOON_F_FREE_PAGE_HINT specifically.

--
MST

2019-07-17 14:39:15

by David Hildenbrand

[permalink] [raw]
Subject: Re: use of shrinker in virtio balloon free page hinting

On 17.07.19 16:34, Michael S. Tsirkin wrote:
> On Wed, Jul 17, 2019 at 04:10:47PM +0200, David Hildenbrand wrote:
>> On 17.07.19 13:20, Michael S. Tsirkin wrote:
>>> Wei, others,
>>>
>>> ATM virtio_balloon_shrinker_scan will only get registered
>>> when deflate on oom feature bit is set.
>>>
>>> Not sure whether that's intentional. Assuming it is:
>>>
>>> virtio_balloon_shrinker_scan will try to locate and free
>>> pages that are processed by host.
>>> The above seems broken in several ways:
>>> - count ignores the free page list completely
>>> - if free pages are being reported, pages freed
>>> by shrinker will just get re-allocated again
>>
>> Trying to answer your questions (not sure if I fully understood what you
>> mean)
>>
>> virtio_balloon_shrinker_scan() will not be called due to inflation
>> requests (balloon_page_alloc()). It will be called whenever the system
>> is OOM, e.g., when starting a new application.
>>
>> I assume you were expecting the shrinker getting called due to
>> balloon_page_alloc(). however, that is not the case as we pass
>> "__GFP_NORETRY".
>
> Right but it's possible we exhaust all memory, then
> someone else asks for a single page and that invokes
> the shrinker.

Yes, I think that can happen.

>
>>
>> To test, something like:
>>
>> 1. Start a VM with
>>
>> -device virtio-balloon-pci,deflate-on-oom=true
>>
>> 2. Inflate the balloon, e.g.,
>>
>> QMP: balloon 1024
>> QMP: info balloon
>> -> 1024
>>
>> See how "MemTotal" in /proc/meminfo in the guest won't change
>>
>> 3. Run a workload that exhausts memory in the guest (OOM).
>>
>> See how the balloon was automatically deflated
>>
>> QMP: info balloon
>> -> Something bigger than 1024
>>
>>
>> Not sure if it is broken, last time I played with it, it worked, but
>> that was ~1-2 years ago.
>>
>> --
>>
>> Thanks,
>>
>> David / dhildenb
>
> Sorry I was unclear. The question was about
> VIRTIO_BALLOON_F_FREE_PAGE_HINT specifically.

Ah, I see. Never used both things together.

--

Thanks,

David / dhildenb

2019-07-17 15:49:17

by Wang, Wei W

[permalink] [raw]
Subject: RE: use of shrinker in virtio balloon free page hinting

On Wednesday, July 17, 2019 7:21 PM, Michael S. Tsirkin wrote:
>
> Wei, others,
>
> ATM virtio_balloon_shrinker_scan will only get registered when deflate on
> oom feature bit is set.
>
> Not sure whether that's intentional.

Yes, we wanted to follow the old oom behavior, which allows the oom notifier
to deflate pages only when this feature bit has been negotiated.


> Assuming it is:
>
> virtio_balloon_shrinker_scan will try to locate and free pages that are
> processed by host.
> The above seems broken in several ways:
> - count ignores the free page list completely

Do you mean virtio_balloon_shrinker_count()? It just reports to
do_shrink_slab the amount of freeable memory that balloon has.
(vb->num_pages and vb->num_free_page_blocks are all included )

> - if free pages are being reported, pages freed
> by shrinker will just get re-allocated again

fill_balloon will re-try the allocation after sleeping 200ms once allocation fails.


> I was unable to make this part of code behave in any reasonable way - was
> shrinker usage tested? What's a good way to test that?

Please see the example that I tested before : https://lkml.org/lkml/2018/8/6/29
(just the first one: *1. V3 patches)

What problem did you see?

I just tried the latest code, and find ballooning reports a #GP (seems caused by
418a3ab1e).
I'll take a look at the details in the office tomorrow.

Best,
Wei

2019-07-18 04:14:05

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: use of shrinker in virtio balloon free page hinting

On Wed, Jul 17, 2019 at 03:46:57PM +0000, Wang, Wei W wrote:
> On Wednesday, July 17, 2019 7:21 PM, Michael S. Tsirkin wrote:
> >
> > Wei, others,
> >
> > ATM virtio_balloon_shrinker_scan will only get registered when deflate on
> > oom feature bit is set.
> >
> > Not sure whether that's intentional.
>
> Yes, we wanted to follow the old oom behavior, which allows the oom notifier
> to deflate pages only when this feature bit has been negotiated.

It makes sense for pages in the balloon (requested by hypervisor).
However free page hinting can freeze up lots of memory for its own
internal reasons. It does not make sense to ask hypervisor
to set flags in order to fix internal guest issues.

> > Assuming it is:
> >
> > virtio_balloon_shrinker_scan will try to locate and free pages that are
> > processed by host.
> > The above seems broken in several ways:
> > - count ignores the free page list completely
>
> Do you mean virtio_balloon_shrinker_count()? It just reports to
> do_shrink_slab the amount of freeable memory that balloon has.
> (vb->num_pages and vb->num_free_page_blocks are all included )

Right. But that does not include the pages in the hint vq,
which could be a significant amount of memory.


> > - if free pages are being reported, pages freed
> > by shrinker will just get re-allocated again
>
> fill_balloon will re-try the allocation after sleeping 200ms once allocation fails.

Even if ballon was never inflated, if shrinker frees some memory while
we are hinting, hint vq will keep going and allocate it back without
sleeping.

>
> > I was unable to make this part of code behave in any reasonable way - was
> > shrinker usage tested? What's a good way to test that?
>
> Please see the example that I tested before : https://lkml.org/lkml/2018/8/6/29
> (just the first one: *1. V3 patches)
>
> What problem did you see?
> I just tried the latest code, and find ballooning reports a #GP (seems caused by
> 418a3ab1e).
> I'll take a look at the details in the office tomorrow.
>
> Best,
> Wei

I saw that VM hangs. Could be the above problem, let me know how it
goes.

2019-07-18 05:53:44

by Wang, Wei W

[permalink] [raw]
Subject: Re: use of shrinker in virtio balloon free page hinting

On 07/18/2019 12:13 PM, Michael S. Tsirkin wrote:
>
> It makes sense for pages in the balloon (requested by hypervisor).
> However free page hinting can freeze up lots of memory for its own
> internal reasons. It does not make sense to ask hypervisor
> to set flags in order to fix internal guest issues.

Sounds reasonable to me. Probably we could move the flag check to
shrinker_count and shrinker_scan as a reclaiming condition for
ballooning pages only?


>
> Right. But that does not include the pages in the hint vq,
> which could be a significant amount of memory.

I think it includes, as vb->num_free_page_blocks records the total number
of free page blocks that balloon has taken from mm.

For shrink_free_pages, it calls return_free_pages_to_mm, which pops pages
from vb->free_page_list (this is the list where pages get enlisted after
they
are put to the hint vq, see get_free_page_and_send).


>
>
>>> - if free pages are being reported, pages freed
>>> by shrinker will just get re-allocated again
>> fill_balloon will re-try the allocation after sleeping 200ms once allocation fails.
> Even if ballon was never inflated, if shrinker frees some memory while
> we are hinting, hint vq will keep going and allocate it back without
> sleeping.

Still see get_free_page_and_send. -EINTR is returned when page
allocation fails,
and reporting ends then.

Shrinker is called on system memory pressure. On memory pressure
get_free_page_and_send will fail memory allocation, so it stops
allocating more.


Best,
Wei

2019-07-18 06:10:05

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: use of shrinker in virtio balloon free page hinting

On Thu, Jul 18, 2019 at 01:57:06PM +0800, Wei Wang wrote:
> On 07/18/2019 12:13 PM, Michael S. Tsirkin wrote:
> >
> > It makes sense for pages in the balloon (requested by hypervisor).
> > However free page hinting can freeze up lots of memory for its own
> > internal reasons. It does not make sense to ask hypervisor
> > to set flags in order to fix internal guest issues.
>
> Sounds reasonable to me. Probably we could move the flag check to
> shrinker_count and shrinker_scan as a reclaiming condition for
> ballooning pages only?

I think so, yes. I also wonder whether we should stop reporting
at that point - otherwise we'll just allocate the freed pages again.

>
> >
> > Right. But that does not include the pages in the hint vq,
> > which could be a significant amount of memory.
>
> I think it includes, as vb->num_free_page_blocks records the total number
> of free page blocks that balloon has taken from mm.

Oh - you are right. Thanks!

> For shrink_free_pages, it calls return_free_pages_to_mm, which pops pages
> from vb->free_page_list (this is the list where pages get enlisted after
> they
> are put to the hint vq, see get_free_page_and_send).
>
>
> >
> >
> > > > - if free pages are being reported, pages freed
> > > > by shrinker will just get re-allocated again
> > > fill_balloon will re-try the allocation after sleeping 200ms once allocation fails.
> > Even if ballon was never inflated, if shrinker frees some memory while
> > we are hinting, hint vq will keep going and allocate it back without
> > sleeping.
>
> Still see get_free_page_and_send. -EINTR is returned when page allocation
> fails,
> and reporting ends then.

what if it does not fail?


>
> Shrinker is called on system memory pressure. On memory pressure
> get_free_page_and_send will fail memory allocation, so it stops allocating
> more.

Memory pressure could be triggered by an unrelated allocation
e.g. from another driver.

>
>
> Best,
> Wei

2019-07-18 06:25:27

by Wang, Wei W

[permalink] [raw]
Subject: Re: use of shrinker in virtio balloon free page hinting

On 07/18/2019 01:58 PM, Michael S. Tsirkin wrote:
>
> what if it does not fail?
>
>
>> Shrinker is called on system memory pressure. On memory pressure
>> get_free_page_and_send will fail memory allocation, so it stops allocating
>> more.
> Memory pressure could be triggered by an unrelated allocation
> e.g. from another driver.

As memory pressure is system-wide (no matter who triggers it), free page
hinting
will fail on memory pressure, same as other drivers.

As long as the page allocation succeeds, we could just think the system
is not in
the memory pressure situation, then thing could go on normally.

Also, the VIRTIO_BALLOON_FREE_PAGE_ALLOC_FLAG includes NORETRY and
NOMEMALLOC,
which makes it easier than most other drivers to fail allocation first.

Best,
Wei

2019-07-18 06:49:07

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: use of shrinker in virtio balloon free page hinting

On Thu, Jul 18, 2019 at 02:30:01PM +0800, Wei Wang wrote:
> On 07/18/2019 01:58 PM, Michael S. Tsirkin wrote:
> >
> > what if it does not fail?
> >
> >
> > > Shrinker is called on system memory pressure. On memory pressure
> > > get_free_page_and_send will fail memory allocation, so it stops allocating
> > > more.
> > Memory pressure could be triggered by an unrelated allocation
> > e.g. from another driver.
>
> As memory pressure is system-wide (no matter who triggers it), free page
> hinting
> will fail on memory pressure, same as other drivers.

That would be good. Except instead of failing it can hit a race
condition where it will reallocate memory freed by shrinker. Not good.

Yes lots of drivers do that but they do not drink up memory
quite as aggressively as page hinting.


> As long as the page allocation succeeds, we could just think the system is
> not in
> the memory pressure situation, then thing could go on normally.

Given we have a shrinker callback we can't pretend we don't
know or care.

> Also, the VIRTIO_BALLOON_FREE_PAGE_ALLOC_FLAG includes NORETRY and
> NOMEMALLOC,
> which makes it easier than most other drivers to fail allocation first.
>
> Best,
> Wei

It's a classic race condition and I don't see why do arguments
about probability matter. With a big fleet of machines
it is guaranteed to happen on some.

--
MST

2019-07-18 09:05:24

by Wang, Wei W

[permalink] [raw]
Subject: Re: use of shrinker in virtio balloon free page hinting

On 07/18/2019 02:47 PM, Michael S. Tsirkin wrote:
> On Thu, Jul 18, 2019 at 02:30:01PM +0800, Wei Wang wrote:
>> On 07/18/2019 01:58 PM, Michael S. Tsirkin wrote:
>>> what if it does not fail?
>>>
>>>
>>>> Shrinker is called on system memory pressure. On memory pressure
>>>> get_free_page_and_send will fail memory allocation, so it stops allocating
>>>> more.
>>> Memory pressure could be triggered by an unrelated allocation
>>> e.g. from another driver.
>> As memory pressure is system-wide (no matter who triggers it), free page
>> hinting
>> will fail on memory pressure, same as other drivers.
> That would be good. Except instead of failing it can hit a race
> condition where it will reallocate memory freed by shrinker. Not good.

OK..I could see this when another module does allocation, which triggers
kswapd
to have balloon's shrinker release some memory, which could be eaten by
balloon
quickly again before that module takes it, and this could happen repeatedly
in theory.

So add a vb->stop_free_page_report boolean, set it in shrinker_count,
and clear it in
virtio_balloon_queue_free_page_work?

Best,
Wei