Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752254AbdIXNFs (ORCPT ); Sun, 24 Sep 2017 09:05:48 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58300 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751962AbdIXNFq (ORCPT ); Sun, 24 Sep 2017 09:05:46 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 12B6683F3D Authentication-Results: ext-mx03.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx03.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=pbonzini@redhat.com Date: Sun, 24 Sep 2017 09:05:44 -0400 (EDT) From: Paolo Bonzini To: Peter Zijlstra Cc: Marcelo Tosatti , Konrad Rzeszutek Wilk , mingo@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Thomas Gleixner Message-ID: <855950672.7912001.1506258344142.JavaMail.zimbra@redhat.com> In-Reply-To: <20170923134114.qdfdegrd6afqrkut@hirez.programming.kicks-ass.net> References: <20170921113835.031375194@redhat.com> <20170922011039.GB20133@amt.cnet> <20170922100004.ydmaxvgpc2zx7j25@hirez.programming.kicks-ass.net> <20170922105609.deln6kylvvpaijg7@hirez.programming.kicks-ass.net> <20170922123305.GB29608@amt.cnet> <20170922125556.cyzybj6c7jqypbmo@hirez.programming.kicks-ass.net> <951aaa3f-b20d-6f67-9454-f193f4445fc7@redhat.com> <20170923134114.qdfdegrd6afqrkut@hirez.programming.kicks-ass.net> Subject: Re: [patch 3/3] x86: kvm guest side support for KVM_HC_RT_PRIO hypercall MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [78.12.246.117, 10.4.196.17, 10.4.195.18] Thread-Topic: x86: kvm guest side support for KVM_HC_RT_PRIO hypercall Thread-Index: YVV798Sj7QXQLLydb9pH0R//hHu+0A== X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Sun, 24 Sep 2017 13:05:46 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3102 Lines: 66 ----- Original Message ----- > From: "Peter Zijlstra" > To: "Paolo Bonzini" > Cc: "Marcelo Tosatti" , "Konrad Rzeszutek Wilk" , mingo@redhat.com, > kvm@vger.kernel.org, linux-kernel@vger.kernel.org, "Thomas Gleixner" > Sent: Saturday, September 23, 2017 3:41:14 PM > Subject: Re: [patch 3/3] x86: kvm guest side support for KVM_HC_RT_PRIO hypercall > > On Sat, Sep 23, 2017 at 12:56:12PM +0200, Paolo Bonzini wrote: > > On 22/09/2017 14:55, Peter Zijlstra wrote: > > > You just explained it yourself. If the thread that needs to complete > > > what you're waiting on has lower priority, it will _never_ get to run if > > > you're busy waiting on it. > > > > > > This is _trivial_. > > > > > > And even for !RT it can be quite costly, because you can end up having > > > to burn your entire slot of CPU time before you run the other task. > > > > > > Userspace spinning is _bad_, do not do this. > > > > This is not userspace spinning, it is guest spinning---which has > > effectively the same effect but you cannot quite avoid. > > So I'm virt illiterate and have no clue on how all this works; but > wasn't this a vmexit ? (that's what marcelo traced). And once you've > done a vmexit you're a regular task again, not a vcpu. His trace simply shows that the timer tick happened and the SCHED_NORMAL thread was preempted. Bumping the vCPU thread to SCHED_FIFO drops the scheduler tick (the system is NOHZ_FULL) and thus 1) the frequency of EXTERNAL_INTERRUPT vmexits drops to 1 second 2) the thread is not preempted anymore. > > But I agree that the solution is properly prioritizing threads that can > > interrupt the VCPU, and using PI mutexes. > > Right, if you want to run RT VCPUs the whole emulator/vcpu interaction > needs to be designed for RT. > > > I'm not a priori opposed to paravirt scheduling primitives, but I am not > > at all sure that it's required. > > Problem is that the proposed thing doesn't solve anything. There is > nothing that prohibits the guest from triggering a vmexit while holding > a spinlock and landing in the self-same problems. Well, part of configuring virt for RT is (at all levels: host hypervisor+QEMU and guest kernel+userspace) is that vmexits while holding a spinlock are either confined to one vCPU or are handled in the host hypervisor very quickly, like less than 2000 clock cycles. So I'm not denying that Marcelo's approach solves the problem, but it's very heavyweight and it masks an important misconfiguration (as you write above, everything needs to be RT and the priorities must be designed carefully). _However_, even if you do this, you may want to put the less important vCPUs and the emulator threads on the same physical CPU. In that case, the vCPU can be placed at SCHED_RR to avoid starvation (while the emulator thread needs to stay at SCHED_FIFO and higher priority). Some kind of trick that bumps spinlock critical sections in that vCPU to SCHED_FIFO, for a limited time only, might still be useful. Paolo