Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753705AbdIYC3C (ORCPT ); Sun, 24 Sep 2017 22:29:02 -0400 Received: from mx1.redhat.com ([209.132.183.28]:34676 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753200AbdIYC3A (ORCPT ); Sun, 24 Sep 2017 22:29:00 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com AE7514E4C6 Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=mtosatti@redhat.com Date: Sun, 24 Sep 2017 22:52:58 -0300 From: Marcelo Tosatti To: Peter Zijlstra Cc: Konrad Rzeszutek Wilk , mingo@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Thomas Gleixner Subject: Re: [patch 3/3] x86: kvm guest side support for KVM_HC_RT_PRIO hypercall Message-ID: <20170925015256.GA5140@amt.cnet> References: <20170921113835.031375194@redhat.com> <20170921114039.466130276@redhat.com> <20170921133653.GO26248@char.us.oracle.com> <20170921140628.zliqlz7mrlqs5pzz@hirez.programming.kicks-ass.net> <20170922011039.GB20133@amt.cnet> <20170922100004.ydmaxvgpc2zx7j25@hirez.programming.kicks-ass.net> <20170922121640.GA29589@amt.cnet> <20170922123107.fjh2yfwnej73trim@hirez.programming.kicks-ass.net> <20170922123639.GB29589@amt.cnet> <20170922125951.siaci6yp3vec4i3i@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170922125951.siaci6yp3vec4i3i@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Mon, 25 Sep 2017 02:29:00 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2736 Lines: 66 On Fri, Sep 22, 2017 at 02:59:51PM +0200, Peter Zijlstra wrote: > On Fri, Sep 22, 2017 at 09:36:39AM -0300, Marcelo Tosatti wrote: > > On Fri, Sep 22, 2017 at 02:31:07PM +0200, Peter Zijlstra wrote: > > > On Fri, Sep 22, 2017 at 09:16:40AM -0300, Marcelo Tosatti wrote: > > > > On Fri, Sep 22, 2017 at 12:00:05PM +0200, Peter Zijlstra wrote: > > > > > On Thu, Sep 21, 2017 at 10:10:41PM -0300, Marcelo Tosatti wrote: > > > > > > When executing guest vcpu-0 with FIFO:1 priority, which is necessary > > > > > > to > > > > > > deal with the following situation: > > > > > > > > > > > > VCPU-0 (housekeeping VCPU) VCPU-1 (realtime VCPU) > > > > > > > > > > > > raw_spin_lock(A) > > > > > > interrupted, schedule task T-1 raw_spin_lock(A) (spin) > > > > > > > > > > > > raw_spin_unlock(A) > > > > > > > > > > > > Certain operations must interrupt guest vcpu-0 (see trace below). > > > > > > > > > > Those traces don't make any sense. All they include is kvm_exit and you > > > > > can't tell anything from that. > > > > > > > > Hi Peter, > > > > > > > > OK lets describe whats happening: > > > > > > > > With QEMU emulator thread and vcpu-0 sharing a physical CPU > > > > (which is a request from several NFV customers, to improve > > > > guest packing), the following occurs when the guest generates > > > > the following pattern: > > > > > > > > 1. submit IO. > > > > 2. busy spin. > > > > > > User-space spinning is a bad idea in general and terminally broken in > > > a RT setup. Sounds like you need to go fix qemu to not suck. > > > > One can run whatever application they want on the housekeeping > > vcpus. This is why rteval exists. > > Nobody cares about other tasks. The problem is between the VCPU and > emulator thread. They get a priority inversion and live-lock because of > spin-waiting. > > > This is not the realtime vcpu we are talking about. > > You're being confused, its a RT _guest_, all VCPUs _must_ be RT. > Because, as you ran into, the guest functions as a whole, not as a bunch > of individual CPUs. > > > We can fix the BIOS, which is hanging now, but userspace can > > do whatever it wants, on non realtime vcpus (again, this is why > > rteval test exists and is used by the -RT community as > > a testcase). > > But nobody cares what other tasks on the system do, all you care about > is that the VCPUs make deterministic forward progress. > > > I haven't understood what is the wrong with the patch? Are you trying > > to avoid pollution of the spinlock codepath to keep it simple? > > Your patch is voodoo programming. You don't solve the actual problem, > you try and paper over it. Priority boosting on a particular section of code is voodoo programming?