Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752562AbbDCV01 (ORCPT ); Fri, 3 Apr 2015 17:26:27 -0400 Received: from mail-by2on0103.outbound.protection.outlook.com ([207.46.100.103]:1760 "EHLO na01-by2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751256AbbDCV0Z (ORCPT ); Fri, 3 Apr 2015 17:26:25 -0400 X-Greylist: delayed 80057 seconds by postgrey-1.27 at vger.kernel.org; Fri, 03 Apr 2015 17:26:25 EDT Message-ID: <1428096375.22867.369.camel@freescale.com> Subject: Re: [PATCH 0/2] powerpc/kvm: Enable running guests on RT Linux From: Scott Wood To: Purcareata Bogdan CC: Sebastian Andrzej Siewior , Paolo Bonzini , Alexander Graf , Bogdan Purcareata , , , , , Thomas Gleixner Date: Fri, 3 Apr 2015 16:26:15 -0500 In-Reply-To: <551E4A41.1080705@freescale.com> References: <1424251955-308-1-git-send-email-bogdan.purcareata@freescale.com> <54E73A6C.9080500@suse.de> <54E740E7.5090806@redhat.com> <54E74A8C.30802@linutronix.de> <1424734051.4698.17.camel@freescale.com> <54EF196E.4090805@redhat.com> <54EF2025.80404@linutronix.de> <1424999159.4698.78.camel@freescale.com> <55158E6D.40304@freescale.com> <1428016310.22867.289.camel@freescale.com> <551E4A41.1080705@freescale.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.12.10-0ubuntu1~14.10.1 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Originating-IP: [2601:2:5800:3f7:12bf:48ff:fe84:c9a0] X-ClientProxiedBy: BLUPR11CA0073.namprd11.prod.outlook.com (10.141.30.41) To CY1PR03MB1485.namprd03.prod.outlook.com (25.163.17.158) Authentication-Results: freescale.com; dkim=none (message not signed) header.d=none; X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:CY1PR03MB1485; X-Forefront-Antispam-Report: BMV:1;SFV:NSPM;SFS:(10019020)(6009001)(377424004)(479174004)(51704005)(377454003)(93886004)(40100003)(23676002)(5820100001)(33646002)(76176999)(36756003)(50226001)(42186005)(110136001)(2950100001)(92566002)(122386002)(47776003)(50986999)(103116003)(50466002)(46102003)(86362001)(62966003)(77156002);DIR:OUT;SFP:1102;SCL:1;SRVR:CY1PR03MB1485;H:[IPv6:2601:2:5800:3f7:12bf:48ff:fe84:c9a0];FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(5002010)(5005006);SRVR:CY1PR03MB1485;BCL:0;PCL:0;RULEID:;SRVR:CY1PR03MB1485; X-Forefront-PRVS: 05352A48BE X-OriginatorOrg: freescale.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Apr 2015 21:26:21.5301 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1PR03MB1485 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4046 Lines: 83 On Fri, 2015-04-03 at 11:07 +0300, Purcareata Bogdan wrote: > On 03.04.2015 02:11, Scott Wood wrote: > > On Fri, 2015-03-27 at 19:07 +0200, Purcareata Bogdan wrote: > >> On 27.02.2015 03:05, Scott Wood wrote: > >>> On Thu, 2015-02-26 at 14:31 +0100, Sebastian Andrzej Siewior wrote: > >>>> On 02/26/2015 02:02 PM, Paolo Bonzini wrote: > >>>>> > >>>>> > >>>>> On 24/02/2015 00:27, Scott Wood wrote: > >>>>>> This isn't a host PIC driver. It's guest PIC emulation, some of which > >>>>>> is indeed not suitable for a rawlock (in particular, openpic_update_irq > >>>>>> which loops on the number of vcpus, with a loop body that calls > >>>>>> IRQ_check() which loops over all pending IRQs). > >>>>> > >>>>> The question is what behavior is wanted of code that isn't quite > >>>>> RT-ready. What is preferred, bugs or bad latency? > >>>>> > >>>>> If the answer is bad latency (which can be avoided simply by not running > >>>>> KVM on a RT kernel in production), patch 1 can be applied. If the > >>>> can be applied *but* makes no difference if applied or not. > >>>> > >>>>> answer is bugs, patch 1 is not upstream material. > >>>>> > >>>>> I myself prefer to have bad latency; if something takes a spinlock in > >>>>> atomic context, that spinlock should be raw. If it hurts (latency), > >>>>> don't do it (use the affected code). > >>>> > >>>> The problem, that is fixed by this s/spin_lock/raw_spin_lock/, exists > >>>> only in -RT. There is no change upstream. In general we fix such things > >>>> in -RT first and forward the patches upstream if possible. This convert > >>>> thingy would be possible. > >>>> Bug fixing comes before latency no matter if RT or not. Converting > >>>> every lock into a rawlock is not always the answer. > >>>> Last thing I read from Scott is that he is not entirely sure if this is > >>>> the right approach or not and patch #1 was not acked-by him either. > >>>> > >>>> So for now I wait for Scott's feedback and maybe a backtrace :) > >>> > >>> Obviously leaving it in a buggy state is not what we want -- but I lean > >>> towards a short term "fix" of putting "depends on !PREEMPT_RT" on the > >>> in-kernel MPIC emulation (which is itself just an optimization -- you > >>> can still use KVM without it). This way people don't enable it with RT > >>> without being aware of the issue, and there's more of an incentive to > >>> fix it properly. > >>> > >>> I'll let Bogdan supply the backtrace. > >> > >> So about the backtrace. Wasn't really sure how to "catch" this, so what > >> I did was to start a 24 VCPUs guest on a 24 CPU board, and in the guest > >> run 24 netperf flows with an external back to back board of the same > >> kind. I assumed this would provide the sufficient VCPUs and external > >> interrupt to expose an alleged culprit. > >> > >> With regards to measuring the latency, I thought of using ftrace, > >> specifically the preemptirqsoff latency histogram. Unfortunately, I > >> wasn't able to capture any major differences between running a guest > >> with in-kernel MPIC emulation (with the openpic raw_spinlock_conversion > >> applied) vs. no in-kernel MPIC emulation. Function profiling > >> (trace_stat) shows that in the second case there's a far greater time > >> spent in kvm_handle_exit (100x), but overall, the maximum latencies for > >> preemptirqsoff don't look that much different. > >> > >> Here are the max numbers (preemptirqsoff) for the 24 CPUs, on the host > >> RT Linux, sorted in descending order, expressed in microseconds: > >> > >> In-kernel MPIC QEMU MPIC > >> 3975 5105 > > > > What are you measuring? Latency in the host, or in the guest? > > This is in the host kernel. Those are terrible numbers in both cases. Can you use those tracing tools to find out what the code path is for QEMU MPIC? -Scott -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/