Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67;
Date:   Mon, 18 Feb 2019 14:47:00 -0500
From:   "Michael S. Tsirkin" <mst@redhat.com>
To:     David Hildenbrand <david@redhat.com>
Cc:     Nitesh Narayan Lal <nitesh@redhat.com>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, pbonzini@redhat.com,
        lcapitulino@redhat.com, pagupta@redhat.com, wei.w.wang@intel.com,
        yang.zhang.wz@gmail.com, riel@surriel.com, dodgen@google.com,
        konrad.wilk@oracle.com, dhildenb@redhat.com, aarcange@redhat.com,
        Alexander Duyck <alexander.duyck@gmail.com>
Subject: Re: [RFC][Patch v8 0/7] KVM: Guest Free Page Hinting
Message-ID: <20190218143819-mutt-send-email-mst@kernel.org>
References: <20190204201854.2328-1-nitesh@redhat.com>
 <af4430fd-0a2c-8f35-b767-3f93fc6db270@redhat.com>
 <20190218114601-mutt-send-email-mst@kernel.org>
 <44740a29-bb14-e6e6-2992-98d0ae58e994@redhat.com>
 <20190218122636-mutt-send-email-mst@kernel.org>
 <f396836d-9bbd-e3e2-c7b0-d921bf1c9ed1@redhat.com>
 <20190218140947-mutt-send-email-mst@kernel.org>
 <4039c2e8-5db4-cddd-b997-2fdbcc6f529f@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4039c2e8-5db4-cddd-b997-2fdbcc6f529f@redhat.com>
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk

On Mon, Feb 18, 2019 at 08:35:36PM +0100, David Hildenbrand wrote:
> On 18.02.19 20:16, Michael S. Tsirkin wrote:
> > On Mon, Feb 18, 2019 at 07:29:44PM +0100, David Hildenbrand wrote:
> >>>
> >>>>>
> >>>>> But really what business has something that is supposedly
> >>>>> an optimization blocking a VCPU? We are just freeing up
> >>>>> lots of memory why is it a good idea to slow that
> >>>>> process down?
> >>>>
> >>>> I first want to know that it is a problem before we declare it a
> >>>> problem. I provided an example (s390x) where it does not seem to be a
> >>>> problem. One hypercall ~every 512 frees. As simple as it can get.
> >>>>
> >>>> No trying to deny that it could be a problem on x86, but then I assume
> >>>> it is only a problem in specific setups.
> >>>
> >>> But which setups? How are we going to identify them?
> >>
> >> I guess is simple (I should be carefuly with this word ;) ): As long as
> >> you don't isolate + pin your CPUs in the hypervisor, you can expect any
> >> kind of sudden hickups. We're in a virtualized world. Real time is one
> >> example.
> >>
> >> Using kernel threads like Nitesh does right now? It can be scheduled
> >> anytime by the hypervisor on the exact same cpu. Unless you isolate +
> >> pin in the hypervor. So the same problem applies.
> > 
> > Right but we know how to handle this. Many deployments already use tools
> > to detect host threads kicking VCPUs out.
> > Getting VCPU blocked by a kfree call would be something new.
> > 
> 
> Yes, and for s390x we already have some kfree's taking longer than
> others. We have to identify when it is not okay.

Right even if the problem exists elsewhere this does not make it go away
or ensure that someone will work to address it :)


> > 
> >>> So I'm fine with a simple implementation but the interface needs to
> >>> allow the hypervisor to process hints in parallel while guest is
> >>> running.  We can then fix any issues on hypervisor without breaking
> >>> guests.
> >>
> >> Yes, I am fine with defining an interface that theoretically let's us
> >> change the implementation in the guest later.
> >> I consider this even a
> >> prerequisite. IMHO the interface shouldn't be different, it will be
> >> exactly the same.
> >>
> >> It is just "who" calls the batch freeing and waits for it. And as I
> >> outlined here, doing it without additional threads at least avoids us
> >> for now having to think about dynamic data structures and that we can
> >> sometimes not report "because the thread is still busy reporting or
> >> wasn't scheduled yet".
> > 
> > Sorry I wasn't clear. I think we need ability to change the
> > implementation in the *host* later. IOW don't rely on
> > host being synchronous.
> > 
> > 
> I actually misread it :) . In any way, there has to be a mechanism to
> synchronize.
> 
> If we are going via a bare hypercall (like s390x, like what Alexander
> proposes), it is going to be a synchronous interface either way. Just a
> bare hypercall, there will not really be any blocking on the guest side.

It bothers me that we are now tied to interface being synchronous. We
won't be able to fix it if there's an issue as that would break guests.

> Via virtio, I guess it is waiting for a response to a requests, right?

For the buffer to be used, yes. And it could mean putting some pages
aside until hypervisor is done with them. Then you don't need timers or
tricks like this, you can get an interrupt and start using the memory.


> -- 
> 
> Thanks,
> 
> David / dhildenb