Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932473AbdIYC6O (ORCPT ); Sun, 24 Sep 2017 22:58:14 -0400 Received: from mx1.redhat.com ([209.132.183.28]:47408 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932076AbdIYC6M (ORCPT ); Sun, 24 Sep 2017 22:58:12 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 1767AC04B92E Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=mtosatti@redhat.com Date: Sun, 24 Sep 2017 23:57:53 -0300 From: Marcelo Tosatti To: Paolo Bonzini Cc: Peter Zijlstra , Konrad Rzeszutek Wilk , mingo@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Thomas Gleixner Subject: Re: [patch 3/3] x86: kvm guest side support for KVM_HC_RT_PRIO hypercall Message-ID: <20170925025751.GB30813@amt.cnet> References: <20170921113835.031375194@redhat.com> <20170922011039.GB20133@amt.cnet> <20170922100004.ydmaxvgpc2zx7j25@hirez.programming.kicks-ass.net> <20170922105609.deln6kylvvpaijg7@hirez.programming.kicks-ass.net> <20170922123305.GB29608@amt.cnet> <20170922125556.cyzybj6c7jqypbmo@hirez.programming.kicks-ass.net> <951aaa3f-b20d-6f67-9454-f193f4445fc7@redhat.com> <20170923134114.qdfdegrd6afqrkut@hirez.programming.kicks-ass.net> <855950672.7912001.1506258344142.JavaMail.zimbra@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <855950672.7912001.1506258344142.JavaMail.zimbra@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Mon, 25 Sep 2017 02:58:12 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4791 Lines: 119 On Sun, Sep 24, 2017 at 09:05:44AM -0400, Paolo Bonzini wrote: > > > ----- Original Message ----- > > From: "Peter Zijlstra" > > To: "Paolo Bonzini" > > Cc: "Marcelo Tosatti" , "Konrad Rzeszutek Wilk" , mingo@redhat.com, > > kvm@vger.kernel.org, linux-kernel@vger.kernel.org, "Thomas Gleixner" > > Sent: Saturday, September 23, 2017 3:41:14 PM > > Subject: Re: [patch 3/3] x86: kvm guest side support for KVM_HC_RT_PRIO hypercall > > > > On Sat, Sep 23, 2017 at 12:56:12PM +0200, Paolo Bonzini wrote: > > > On 22/09/2017 14:55, Peter Zijlstra wrote: > > > > You just explained it yourself. If the thread that needs to complete > > > > what you're waiting on has lower priority, it will _never_ get to run if > > > > you're busy waiting on it. > > > > > > > > This is _trivial_. > > > > > > > > And even for !RT it can be quite costly, because you can end up having > > > > to burn your entire slot of CPU time before you run the other task. > > > > > > > > Userspace spinning is _bad_, do not do this. > > > > > > This is not userspace spinning, it is guest spinning---which has > > > effectively the same effect but you cannot quite avoid. > > > > So I'm virt illiterate and have no clue on how all this works; but > > wasn't this a vmexit ? (that's what marcelo traced). And once you've > > done a vmexit you're a regular task again, not a vcpu. > > His trace simply shows that the timer tick happened and the SCHED_NORMAL > thread was preempted. Bumping the vCPU thread to SCHED_FIFO drops > the scheduler tick (the system is NOHZ_FULL) and thus 1) the frequency > of EXTERNAL_INTERRUPT vmexits drops to 1 second 2) the thread is not > preempted anymore. > > > > But I agree that the solution is properly prioritizing threads that can > > > interrupt the VCPU, and using PI mutexes. Thats exactly what the patch does, the prioritization is not fixed in time, and depends on whether or not vcpu-0 is in spinlock protected section. Are you suggesting a different prioritization? Can you describe it please, even if incomplete? > > > > Right, if you want to run RT VCPUs the whole emulator/vcpu interaction > > needs to be designed for RT. > > > > > I'm not a priori opposed to paravirt scheduling primitives, but I am not > > > at all sure that it's required. > > > > Problem is that the proposed thing doesn't solve anything. There is > > nothing that prohibits the guest from triggering a vmexit while holding > > a spinlock and landing in the self-same problems. > > Well, part of configuring virt for RT is (at all levels: host hypervisor+QEMU > and guest kernel+userspace) is that vmexits while holding a spinlock are either > confined to one vCPU or are handled in the host hypervisor very quickly, like > less than 2000 clock cycles. > > So I'm not denying that Marcelo's approach solves the problem, but it's very > heavyweight and it masks an important misconfiguration (as you write above, > everything needs to be RT and the priorities must be designed carefully). I think you are missing the following point: "vcpu0 can be interrupted when its not in a spinlock protected section, otherwise it can't." So you _have_ to communicate to the host when the guest enters/leaves a critical section. So this point of "everything needs to be RT and the priorities must be designed carefully", is this: WHEN in spinlock protected section (more specifically, when spinlock protected section _shared with realtime vcpus_), priority of vcpu0 > priority of emulator thread OTHERWISE priority of vcpu0 < priority of emulator thread. (*) So emulator thread can interrupt and inject interrupts to vcpu0. > > _However_, even if you do this, you may want to put the less important vCPUs > and the emulator threads on the same physical CPU. In that case, the vCPU > can be placed at SCHED_RR to avoid starvation (while the emulator thread needs > to stay at SCHED_FIFO and higher priority). Some kind of trick that bumps > spinlock critical sections in that vCPU to SCHED_FIFO, for a limited time only, > might still be useful. Anything that violates (*) above is going to cause excessive latencies in realtime vcpus, via: PCPU-0: * vcpu-0 grabs spinlock A. * event wakes up emulator thread, vcpu-0 sched out, vcpu-0 sched in. PCPU-1: * realtime vcpu grabs spinlock-A, busy spins on emulator threads completion. So its more than useful, its necessary. I'm open to suggestions as better ways to solve this problem while sharing emulator thread with vcpu-0 (which is something users are interested in, for obvious economical reasons), but: 1) Don't get the point of Peters rejection. 2) Don't get how SCHED_RR can help the situation.