Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932097AbbD0Gpj (ORCPT ); Mon, 27 Apr 2015 02:45:39 -0400 Received: from mail-bl2on0126.outbound.protection.outlook.com ([65.55.169.126]:2849 "EHLO na01-bl2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752368AbbD0Gpg (ORCPT ); Mon, 27 Apr 2015 02:45:36 -0400 Authentication-Results: freescale.com; dkim=none (message not signed) header.d=none; Message-ID: <553DDAF5.6030005@freescale.com> Date: Mon, 27 Apr 2015 09:45:09 +0300 From: Purcareata Bogdan User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: Scott Wood CC: Sebastian Andrzej Siewior , Paolo Bonzini , Alexander Graf , Bogdan Purcareata , , , , , Thomas Gleixner , Laurentiu Tudor Subject: Re: [PATCH 0/2] powerpc/kvm: Enable running guests on RT Linux References: <1424251955-308-1-git-send-email-bogdan.purcareata@freescale.com> <54E73A6C.9080500@suse.de> <54E740E7.5090806@redhat.com> <54E74A8C.30802@linutronix.de> <1424734051.4698.17.camel@freescale.com> <54EF196E.4090805@redhat.com> <54EF2025.80404@linutronix.de> <1424999159.4698.78.camel@freescale.com> <55158E6D.40304@freescale.com> <1428016310.22867.289.camel@freescale.com> <551E4A41.1080705@freescale.com> <1428096375.22867.369.camel@freescale.com> <55262DD3.2050707@freescale.com> <1428623611.22867.561.camel@freescale.com> <5534DAA4.3050809@freescale.com> <1429577566.4352.68.camel@freescale.com> <55378EC4.2080302@freescale.com> <1429749001.16357.7.camel@freescale.com> <5538E624.8080904@freescale.com> <1429824418.16357.26.camel@freescale.com> In-Reply-To: <1429824418.16357.26.camel@freescale.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [192.88.166.1] X-ClientProxiedBy: DB4PR05CA0024.eurprd05.prod.outlook.com (25.160.40.34) To BY2PR03MB189.namprd03.prod.outlook.com (10.242.36.140) X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BY2PR03MB189; X-Forefront-Antispam-Report: BMV:1;SFV:NSPM;SFS:(10019020)(6009001)(6049001)(51704005)(377424004)(47776003)(65956001)(65806001)(66066001)(122386002)(64126003)(40100003)(77096005)(83506001)(4001350100001)(110136001)(19580405001)(59896002)(46102003)(33656002)(2950100001)(77156002)(62966003)(92566002)(5001920100001)(23676002)(86362001)(87976001)(50466002)(65816999)(36756003)(87266999)(93886004)(50986999)(54356999)(42186005)(76176999)(4001450100001)(4001430100001)(42262002);DIR:OUT;SFP:1102;SCL:1;SRVR:BY2PR03MB189;H:[10.171.74.161];FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(5002010)(5005006)(3002001);SRVR:BY2PR03MB189;BCL:0;PCL:0;RULEID:;SRVR:BY2PR03MB189; X-Forefront-PRVS: 0559FB9674 X-OriginatorOrg: freescale.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 27 Apr 2015 06:45:30.7846 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY2PR03MB189 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6003 Lines: 117 On 24.04.2015 00:26, Scott Wood wrote: > On Thu, 2015-04-23 at 15:31 +0300, Purcareata Bogdan wrote: >> On 23.04.2015 03:30, Scott Wood wrote: >>> On Wed, 2015-04-22 at 15:06 +0300, Purcareata Bogdan wrote: >>>> On 21.04.2015 03:52, Scott Wood wrote: >>>>> On Mon, 2015-04-20 at 13:53 +0300, Purcareata Bogdan wrote: >>>>>> There was a weird situation for .kvmppc_mpic_set_epr - its corresponding inner >>>>>> function is kvmppc_set_epr, which is a static inline. Removing the static inline >>>>>> yields a compiler crash (Segmentation fault (core dumped) - >>>>>> scripts/Makefile.build:441: recipe for target 'arch/powerpc/kvm/kvm.o' failed), >>>>>> but that's a different story, so I just let it be for now. Point is the time may >>>>>> include other work after the lock has been released, but before the function >>>>>> actually returned. I noticed this was the case for .kvm_set_msi, which could >>>>>> work up to 90 ms, not actually under the lock. This made me change what I'm >>>>>> looking at. >>>>> >>>>> kvm_set_msi does pretty much nothing outside the lock -- I suspect >>>>> you're measuring an interrupt that happened as soon as the lock was >>>>> released. >>>> >>>> That's exactly right. I've seen things like a timer interrupt occuring right >>>> after the spinlock_irqrestore, but before kvm_set_msi actually returned. >>>> >>>> [...] >>>> >>>>>> Or perhaps a different stress scenario involving a lot of VCPUs >>>>>> and external interrupts? >>>>> >>>>> You could instrument the MPIC code to find out how many loop iterations >>>>> you maxed out on, and compare that to the theoretical maximum. >>>> >>>> Numbers are pretty low, and I'll try to explain based on my observations. >>>> >>>> The problematic section in openpic_update_irq is this [1], since it loops >>>> through all VCPUs, and IRQ_local_pipe further calls IRQ_check, which loops >>>> through all pending interrupts for a VCPU [2]. >>>> >>>> The guest interfaces are virtio-vhostnet, which are based on MSI >>>> (/proc/interrupts in guest shows they are MSI). For external interrupts to the >>>> guest, the irq_source destmask is currently 0, and last_cpu is 0 (unitialized), >>>> so [1] will go on and deliver the interrupt directly and unicast (no VCPUs loop). >>>> >>>> I activated the pr_debugs in arch/powerpc/kvm/mpic.c, to see how many interrupts >>>> are actually pending for the destination VCPU. At most, there were 3 interrupts >>>> - n_IRQ = {224,225,226} - even for 24 flows of ping flood. I understand that >>>> guest virtio interrupts are cascaded over 1 or a couple of shared MSI interrupts. >>>> >>>> So worst case, in this scenario, was checking the priorities for 3 pending >>>> interrupts for 1 VCPU. Something like this (some of my prints included): >>>> >>>> [61010.582033] openpic_update_irq: destmask 1 last_cpu 0 >>>> [61010.582034] openpic_update_irq: Only one CPU is allowed to receive this IRQ >>>> [61010.582036] IRQ_local_pipe: IRQ 224 active 0 was 1 >>>> [61010.582037] IRQ_check: irq 226 set ivpr_pr=8 pr=-1 >>>> [61010.582038] IRQ_check: irq 225 set ivpr_pr=8 pr=-1 >>>> [61010.582039] IRQ_check: irq 224 set ivpr_pr=8 pr=-1 >>>> >>>> It would be really helpful to get your comments regarding whether these are >>>> realistical number for everyday use, or they are relevant only to this >>>> particular scenario. >>> >>> RT isn't about "realistic numbers for everyday use". It's about worst >>> cases. >>> >>>> - Can these interrupts be used in directed delivery, so that the destination >>>> mask can include multiple VCPUs? >>> >>> The Freescale MPIC does not support multiple destinations for most >>> interrupts, but the (non-FSL-specific) emulation code appears to allow >>> it. >>> >>>> The MPIC manual states that timer and IPI >>>> interrupts are supported for directed delivery, altough I'm not sure how much of >>>> this is used in the emulation. I know that kvmppc uses the decrementer outside >>>> of the MPIC. >>>> >>>> - How are virtio interrupts cascaded over the shared MSI interrupts? >>>> /proc/device-tree/soc@e0000000/msi@41600/interrupts in the guest shows 8 values >>>> - 224 - 231 - so at most there might be 8 pending interrupts in IRQ_check, is >>>> that correct? >>> >>> It looks like that's currently the case, but actual hardware supports >>> more than that, so it's possible (albeit unlikely any time soon) that >>> the emulation eventually does as well. >>> >>> But it's possible to have interrupts other than MSIs... >> >> Right. >> >> So given that the raw spinlock conversion is not suitable for all the scenarios >> supported by the OpenPIC emulation, is it ok that my next step would be to send >> a patch containing both the raw spinlock conversion and a mandatory disable of >> the in-kernel MPIC? This is actually the last conclusion we came up with some >> time ago, but I guess it was good to get some more insight on how things >> actually work (at least for me). > > Fine with me. Have you given any thought to ways to restructure the > code to eliminate the problem? My first thought would be to create a separate lock for each VCPU pending interrupts queue, so that we make the whole openpic_irq_update more granular. However, this is just a very preliminary thought. Before I can come up with anything worthy of consideration, I must read the OpenPIC specification and the current KVM emulated OpenPIC implementation thoroughly. I currently have other things on my hands, and will come back to this once I have some time. Meanwhile, I've sent a v2 on the PPC and RT mailing lists for this raw_spinlock conversion, alongside disabling the in-kernel MPIC emulation for PREEMPT_RT. I would be grateful to hear your feedback on that, so that it can get applied. Thank you, Bogdan P. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/