Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S968782AbdIZXW5 (ORCPT ); Tue, 26 Sep 2017 19:22:57 -0400 Received: from mx1.redhat.com ([209.132.183.28]:54888 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965710AbdIZXWz (ORCPT ); Tue, 26 Sep 2017 19:22:55 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 00409C057F91 Authentication-Results: ext-mx08.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx08.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=mtosatti@redhat.com Date: Tue, 26 Sep 2017 19:49:28 -0300 From: Marcelo Tosatti To: Paolo Bonzini Cc: Peter Zijlstra , Konrad Rzeszutek Wilk , mingo@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Thomas Gleixner Subject: Re: [patch 3/3] x86: kvm guest side support for KVM_HC_RT_PRIO hypercall\ Message-ID: <20170926224925.GA9119@amt.cnet> References: <20170922100004.ydmaxvgpc2zx7j25@hirez.programming.kicks-ass.net> <20170922105609.deln6kylvvpaijg7@hirez.programming.kicks-ass.net> <20170922123305.GB29608@amt.cnet> <20170922125556.cyzybj6c7jqypbmo@hirez.programming.kicks-ass.net> <951aaa3f-b20d-6f67-9454-f193f4445fc7@redhat.com> <20170923134114.qdfdegrd6afqrkut@hirez.programming.kicks-ass.net> <855950672.7912001.1506258344142.JavaMail.zimbra@redhat.com> <20170925025751.GB30813@amt.cnet> <20170925091316.bnwpiscs2bvpdxk5@hirez.programming.kicks-ass.net> <00ff8cbf-4e41-a950-568c-3bd95e155d4b@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <00ff8cbf-4e41-a950-568c-3bd95e155d4b@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Tue, 26 Sep 2017 23:22:55 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3396 Lines: 97 On Mon, Sep 25, 2017 at 05:12:42PM +0200, Paolo Bonzini wrote: > On 25/09/2017 11:13, Peter Zijlstra wrote: > > On Sun, Sep 24, 2017 at 11:57:53PM -0300, Marcelo Tosatti wrote: > >> I think you are missing the following point: > >> > >> "vcpu0 can be interrupted when its not in a spinlock protected section, > >> otherwise it can't." > > Who says that? Certainly a driver can dedicate a single VCPU to > periodic polling of the device, in such a way that the polling does not > require a spinlock. This sequence: VCPU-0 VCPU-1 (running realtime workload) takes spinlock A scheduled out spinlock(A) (busy spins until VCPU-0 is scheduled back in) scheduled in finishes execution of code under protected section releases spinlock(A) takes spinlock(A) You get that point, right? (*) > >> So you _have_ to communicate to the host when the guest enters/leaves a > >> critical section. > >> > >> So this point of "everything needs to be RT and the priorities must be > >> designed carefully", is this: > >> > >> WHEN in spinlock protected section (more specifically, when > >> spinlock protected section _shared with realtime vcpus_), > >> > >> priority of vcpu0 > priority of emulator thread > >> > >> OTHERWISE > >> > >> priority of vcpu0 < priority of emulator thread. > > This is _not_ designed carefully, this is messy. This is very precise to me. What is "messy" about it? (its clearly defined). > The emulator thread can interrupt the VCPU thread, so it has to be at > higher RT priority (+ priority inheritance of mutexes). It can only do that _when_ the VCPU thread is not running a critical section which a higher priority task depends on. > Once you have > done that we can decide on other approaches that e.g. let you get more > sharing by placing housekeeping VCPUs at SCHED_NORMAL or SCHED_RR. Well, if someone looks at (*) he sees that if the interruption delay (the length between "scheduled out" and "scheduled in" in that diagram) exceeds a given threshold, that causes the realtime vcpu1 to also exceed processing of the realtime task for a given threshold. So when you say "The emulator thread can interrupt the VCPU thread", you're saying that it has to be modified to interrupt for a maximum amount of time (say 15us). Is that what you are suggesting? > >> So emulator thread can interrupt and inject interrupts to vcpu0. > > > > spinlock protected regions are not everything. What about lock-free > > constructs where CPU's spin-wait on one another (there's plenty). > > > > And I'm clearly ignorant of how this emulation thread works, but why > > would it run for a long time? Either it is needed for forward progress > > of the VCPU or its not. If its not, it shouldn't run. > > The emulator thread 1) should not run for long period of times indeed, > and 2) it is needed for forward progress of the VCPU. So it has to be > at higher RT priority. I agree with Peter, sorry. Spinlocks are a red > herring here. > > Paolo Paolo, you don't control how many interruptions of the emulator thread happen per second. So if you let the emulator thread interrupt the emulator thread at all times, without some kind of bounding of these interruptions per time unit, you have a similar problem as (*) (where the realtime task is scheduled). Another approach to the problem was suggested to OpenStack.