Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp1979482pxu; Tue, 24 Nov 2020 13:45:31 -0800 (PST) X-Google-Smtp-Source: ABdhPJyvMuB2DsXH2h65Mfq8IpyD1YTm7gLTGtya25IIieWUFqBBBa1aJospj7CnExpejB46tZyx X-Received: by 2002:aa7:d286:: with SMTP id w6mr532001edq.93.1606254330964; Tue, 24 Nov 2020 13:45:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1606254330; cv=none; d=google.com; s=arc-20160816; b=C2fIrsOCb+xSsSHVrkmOmQXmDBmeUBWpJP/hp5ghXCmtCkflAJvjwIX34FOl8OJUMx K7gcQ7Z+yjKy+HL+s5QlmPb13VdBPMPOase7Ozmg++wCiXIY2xX2qvb8Y+pv2IHixUWW 5loddnwv2+CGx04e6A0dUPNGWc6rzGp2aX3CUnz9aW16Xqch+hI56ZmUi9HhQY4pLZMR 7NxTTXVaGhL1BXb9vdJF5l1Ubfd0yIrR/vXYSJA9HAWOI45S81QnXBsInefmD7ECxqpr xmeqW56eNdCFZFQOuNMXw2CeEM36lOQaTdkvLuyKhqcO99+p9jcSjgiN7fEoxnOt5wjA iPnA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=ggEtg/brMe8p80GJbuoaw6boE9r+n6QqNh8SUI7SQn8=; b=cSc1nytihK8LTmU6HqWS+dSdTZ3hyS57Lxs+hECqUAXRFtgsYdeWB+bJ6Kv9bPJU3T CzDq9zUC+lEPdZyiFUMWTdxZoNLCkNLF2xBx0bXCXiZ1GFOWUxQ9jmnvRRKmISZHWwIN DVr2Lv02ATLx7tgecVmdpaVoEGNYJazxxciqVsmlx4CY8pLWI1C8Va+BG4mFCHh7kpyK y09zq2Dqzp8C4OdT8A9DVX2AbVVgrR3ShxOoqRMvaBjIkIcS/Fq+/7121zxpIUGY55Vf w4r1DKTIDBxKJamQ9DOkKkPkEVHV8ubzRFx7mcO4JuAAItyEfr2omnnbmh3bV5Z7u4GN rIXQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id k26si2240528eds.421.2020.11.24.13.45.08; Tue, 24 Nov 2020 13:45:30 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387799AbgKXNLI (ORCPT + 99 others); Tue, 24 Nov 2020 08:11:08 -0500 Received: from szxga04-in.huawei.com ([45.249.212.190]:7674 "EHLO szxga04-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1733262AbgKXNLH (ORCPT ); Tue, 24 Nov 2020 08:11:07 -0500 Received: from DGGEMS404-HUB.china.huawei.com (unknown [172.30.72.59]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4CgPXG64DNz15Q2Q; Tue, 24 Nov 2020 21:10:38 +0800 (CST) Received: from [10.174.187.74] (10.174.187.74) by DGGEMS404-HUB.china.huawei.com (10.3.19.204) with Microsoft SMTP Server id 14.3.487.0; Tue, 24 Nov 2020 21:10:49 +0800 Subject: Re: [RFC PATCH v1 2/4] KVM: arm64: GICv4.1: Try to save hw pending state in save_pending_tables To: Marc Zyngier CC: James Morse , Julien Thierry , Suzuki K Poulose , Eric Auger , , , , , Christoffer Dall , Alex Williamson , Kirti Wankhede , Cornelia Huck , Neo Jia , , References: <20201123065410.1915-1-lushenming@huawei.com> <20201123065410.1915-3-lushenming@huawei.com> <90f04f50-c1ba-55b2-0f93-1e755b40b487@huawei.com> <4e2b87897485e38e251c447b9ad70eb6@kernel.org> From: Shenming Lu Message-ID: <86c2b9ad-7214-caef-0924-ec71b43aa003@huawei.com> Date: Tue, 24 Nov 2020 21:10:38 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.2.2 MIME-Version: 1.0 In-Reply-To: <4e2b87897485e38e251c447b9ad70eb6@kernel.org> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.187.74] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020/11/24 16:26, Marc Zyngier wrote: > On 2020-11-24 07:40, Shenming Lu wrote: >> On 2020/11/23 17:18, Marc Zyngier wrote: >>> On 2020-11-23 06:54, Shenming Lu wrote: >>>> After pausing all vCPUs and devices capable of interrupting, in order >>>         ^^^^^^^^^^^^^^^^^ >>> See my comment below about this. >>> >>>> to save the information of all interrupts, besides flushing the pending >>>> states in kvm’s vgic, we also try to flush the states of VLPIs in the >>>> virtual pending tables into guest RAM, but we need to have GICv4.1 and >>>> safely unmap the vPEs first. >>>> >>>> Signed-off-by: Shenming Lu >>>> --- >>>>  arch/arm64/kvm/vgic/vgic-v3.c | 62 +++++++++++++++++++++++++++++++---- >>>>  1 file changed, 56 insertions(+), 6 deletions(-) >>>> >>>> diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c >>>> index 9cdf39a94a63..e1b3aa4b2b12 100644 >>>> --- a/arch/arm64/kvm/vgic/vgic-v3.c >>>> +++ b/arch/arm64/kvm/vgic/vgic-v3.c >>>> @@ -1,6 +1,8 @@ >>>>  // SPDX-License-Identifier: GPL-2.0-only >>>> >>>>  #include >>>> +#include >>>> +#include >>>>  #include >>>>  #include >>>>  #include >>>> @@ -356,6 +358,39 @@ int vgic_v3_lpi_sync_pending_status(struct kvm >>>> *kvm, struct vgic_irq *irq) >>>>      return 0; >>>>  } >>>> >>>> +/* >>>> + * With GICv4.1, we can get the VLPI's pending state after unmapping >>>> + * the vPE. The deactivation of the doorbell interrupt will trigger >>>> + * the unmapping of the associated vPE. >>>> + */ >>>> +static void get_vlpi_state_pre(struct vgic_dist *dist) >>>> +{ >>>> +    struct irq_desc *desc; >>>> +    int i; >>>> + >>>> +    if (!kvm_vgic_global_state.has_gicv4_1) >>>> +        return; >>>> + >>>> +    for (i = 0; i < dist->its_vm.nr_vpes; i++) { >>>> +        desc = irq_to_desc(dist->its_vm.vpes[i]->irq); >>>> +        irq_domain_deactivate_irq(irq_desc_get_irq_data(desc)); >>>> +    } >>>> +} >>>> + >>>> +static void get_vlpi_state_post(struct vgic_dist *dist) >>> >>> nit: the naming feels a bit... odd. Pre/post what? >> >> My understanding is that the unmapping is a preparation for get_vlpi_state... >> Maybe just call it unmap/map_all_vpes? > > Yes, much better. > > [...] > >>>> +        if (irq->hw) { >>>> +            WARN_RATELIMIT(irq_get_irqchip_state(irq->host_irq, >>>> +                        IRQCHIP_STATE_PENDING, &is_pending), >>>> +                       "IRQ %d", irq->host_irq); >>> >>> Isn't this going to warn like mad on a GICv4.0 system where this, by definition, >>> will generate an error? >> >> As we have returned an error in save_its_tables if hw && !has_gicv4_1, we don't >> have to warn this here? > > Are you referring to the check in vgic_its_save_itt() that occurs in patch 4? > Fair enough, though I think the use of irq_get_irqchip_state() isn't quite > what we want, as per my comments on patch #1. > >>> >>>> +        } >>>> + >>>> +        if (stored == is_pending) >>>>              continue; >>>> >>>> -        if (irq->pending_latch) >>>> +        if (is_pending) >>>>              val |= 1 << bit_nr; >>>>          else >>>>              val &= ~(1 << bit_nr); >>>> >>>>          ret = kvm_write_guest_lock(kvm, ptr, &val, 1); >>>>          if (ret) >>>> -            return ret; >>>> +            goto out; >>>>      } >>>> -    return 0; >>>> + >>>> +out: >>>> +    get_vlpi_state_post(dist); >>> >>> This bit worries me: you have unmapped the VPEs, so any interrupt that has been >>> generated during that phase is now forever lost (the GIC doesn't have ownership >>> of the pending tables). >> >> In my opinion, during this phase, the devices capable of interrupting >> should have  already been paused (prevent from sending interrupts), >> such as VFIO migration protocol has already realized it. > > Is that a hard guarantee? Pausing devices *may* be possible for a limited > set of endpoints, but I'm not sure that is universally possible to restart > them and expect a consistent state (you have just dropped a bunch of network > packets on the floor...). No, as far as I know, if the VFIO device does not support pause, the migration would fail early... And the specific action is decided by the vendor driver. In fact, the VFIO migration is still in an experimental phase... I will pay attention to the follow-up development. > >>> Do you really expect the VM to be restartable from that point? I don't see how >>> this is possible. >>> >> >> If the migration has encountered an error, the src VM might be >> restarted, so we have to map the vPEs back. > > As I said above, I doubt it is universally possible to do so, but > after all, this probably isn't worse that restarting on the target... > >         M.