Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933978AbdIYI6o (ORCPT ); Mon, 25 Sep 2017 04:58:44 -0400 Received: from bombadil.infradead.org ([65.50.211.133]:39306 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932107AbdIYI6l (ORCPT ); Mon, 25 Sep 2017 04:58:41 -0400 Date: Mon, 25 Sep 2017 10:58:35 +0200 From: Peter Zijlstra To: Marcelo Tosatti Cc: Konrad Rzeszutek Wilk , mingo@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Thomas Gleixner Subject: Re: [patch 3/3] x86: kvm guest side support for KVM_HC_RT_PRIO hypercall\ Message-ID: <20170925085835.4zxze42mmi4emyj4@hirez.programming.kicks-ass.net> References: <20170921114039.466130276@redhat.com> <20170921133653.GO26248@char.us.oracle.com> <20170921140628.zliqlz7mrlqs5pzz@hirez.programming.kicks-ass.net> <20170922011039.GB20133@amt.cnet> <20170922100004.ydmaxvgpc2zx7j25@hirez.programming.kicks-ass.net> <20170922121640.GA29589@amt.cnet> <20170922123107.fjh2yfwnej73trim@hirez.programming.kicks-ass.net> <20170922124005.GA30393@amt.cnet> <20170922130141.tz6f4gktihmbhqli@hirez.programming.kicks-ass.net> <20170925022238.GB5140@amt.cnet> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170925022238.GB5140@amt.cnet> User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4722 Lines: 131 On Sun, Sep 24, 2017 at 11:22:38PM -0300, Marcelo Tosatti wrote: > On Fri, Sep 22, 2017 at 03:01:41PM +0200, Peter Zijlstra wrote: > > On Fri, Sep 22, 2017 at 09:40:05AM -0300, Marcelo Tosatti wrote: > > > > > Are you arguing its invalid for the following application to execute on > > > housekeeping vcpu of a realtime system: > > > > > > void main(void) > > > { > > > > > > submit_IO(); > > > do { > > > computation(); > > > } while (!interrupted()); > > > } > > > > > > Really? > > > > No. Nobody cares about random crap tasks. > > Nobody has control over all code that runs in userspace Peter. And not > supporting a valid sequence of steps because its "crap" (whatever your > definition of crap is) makes no sense. > > It might be that someone decides to do the above (i really can't see > any actual reasoning i can follow and agree on your "its crap" > argument), this truly seems valid to me. We don't care what other tasks do. This isn't a hard thing to understand. You're free to run whatever junk on your CPUs. This doesn't (much) affect the correct functioning of RT tasks that you also run there. > So lets follow the reasoning steps: > > 1) "NACK, because you didnt understand the problem". > > OK thats an invalid NACK, you did understand the problem > later and now your argument is the following. It was a NACK because you wrote a shit changelog that didn't explain the problem. But yes. > 2) "NACK, because all VCPUs should be SCHED_FIFO all the time". Very much, if you want a RT guest, all VCPU's should run at RT prio and the interaction between the VCPUs and all supporting threads should be designed for RT. > But the existence of this code path from userspace: > > submit_IO(); > do { > computation(); > } while (!interrupted()); > > Its a supported code sequence, and works fine in a non-RT environment. Who cares about that chunk of code? Have you forgotten to mention that this is the form of the emulation thread? > Therefore it should work on an -RT environment. No, this is where you're wrong. That code works on -RT as long as you don't expect it to be a valid RT program. -RT kernels will run !RT stuff just fine. But the moment you run a program as RT (FIFO/RR/DEADLINE) it had better damn well be a valid RT program, and that excludes a lot of code. > So please give me some logical reasoning for the NACK (people can live with > it, but it has to be good enough to justify the decreasing packing of > guests in pCPUs): > > 1) "Voodoo programming" (its hard for me to parse what you mean with > that... do you mean you foresee this style of priority boosting causing > problems in the future? Can you give an example?). Your 'solution' only works if you sacrifice a goat on a full moon, because only that ensures the guest doesn't VM_EXIT and cause the self-same problem while you've boosted it. Because you've _not_ fixed the actual problem! > Is there fundamentally wrong about priority boosting in spinlock > sections, or this particular style of priority boosting is wrong? Yes, its fundamentally crap, because it doesn't guarantee anything. RT is about making guarantees. An RT program needs a provable forward progress guarantee at the very least. It including a priority inversion disqualifies it from being sane. > 2) "Pollution of the kernel code path". That makes sense to me, if thats > whats your concerned about. Also.. > 3) "Reduction of spinlock performance". Its true, but for NFV workloads > people don't care about. I've no idea what an NFV is. > 4) "All vcpus should be SCHED_FIFO all the time". OK, why is that? > What dictates that to be true? Solid engineering. Does the guest kernel function as a bunch of independent CPUs or does it assume all CPUs are equal and have strong inter-cpu connections? Linux is the latter, therefore if one VCPU is RT they all should be. Dammit, you even recognise this in the spin-owner preemption issue you're hacking around, but then go arse-about-face 'solving' it. > What the patch does is the following: > It reduces the window where SCHED_FIFO is applied vcpu0 > to those were a spinlock is shared between -RT vcpus and vcpu0 > (why: because otherwise, when the emulator thread is sharing a > pCPU with vcpu0, its unable to generate interrupts vcpu0). > > And its being rejected because: Its not fixing the actual problem. The real problem is the prio inversion between the VCPU and the emulation thread, _That_ is what needs fixing. Rewrite that VCPU/emulator interaction to be a proper RT construct. Then you can run the VCPU at RT prio as you should, and the guest can issue all the VM_EXIT things it wants at any time and still function correctly.