Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp3326356pxb; Mon, 16 Nov 2020 11:27:11 -0800 (PST) X-Google-Smtp-Source: ABdhPJze5mikXB8dbGcvgPbwg5DdjSyOCteZXFbwH33YPmBjWRYAZHyRhfgp7Cj0guhxpXSBL34Z X-Received: by 2002:aa7:d5d7:: with SMTP id d23mr17475247eds.203.1605554831452; Mon, 16 Nov 2020 11:27:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1605554831; cv=none; d=google.com; s=arc-20160816; b=vudc8wmqRVx5dyPgxu2LwynkgN0ruMN6INgZpTHgAmisKTZIawfET7IVl6DkRjKjrT 8S3KPBmO7MM8g11UxR5YHg2yZTPURaKqIUnJhydS6iabrVT70WxWlfbC9GOv/+BCLbYS jmFY4VN4KuBZdn5ssxjpwEP5tis+9mfXj5iRb8r7KJCGyPp/q+KYC9jJxAY/ZOd8yb0g s/usunNIgh4lSNPHfvvRoahwRMyKQWbW9UVLiL4BWsYNh7zoh+im9SMk+LbFd+bQrL8g 6CSdDJtzIY6fyC9YvaGO7QIedusvziJVu5EyNnRCOmxj/RhlpkrbTBKpfdBw+dCgk9il IgsA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=OS5g+nqf+kdvfui4NOVb9amcUXXNzqRpLmFOawaHIsA=; b=R3snYKqqeaIzl/LMIN33PtTdZNB/vVKwDQ3vmzrIWXbLisfWI4TwIGwH/W6hWQAN/8 VnuKQnXgabsAC/w7oezLM0NsD5vzohVAbgfJN+kcpkz/6nyW+uJxQUctmSP/KKAfBQh3 3Rk2u7K+LNZhRvFpnTsmrDg9hgt6R0uoj6WBBgROiGAwAKRn5WLEmsQaFlrERTRbtKxF jbK9P4n3lvlIbQSSwFWPvgJTFA4wa7W4AcN6Ci6J62698q4Q/OmNh4aY51gDZdBCVbMb DOMXhJfGPnajylbMH8Ysy2QSQBBPkqBsklmKpeh35ECLSG5M5JvmNYgQN3qslnWxqsfH mRvQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org ([23.128.96.18]) by mx.google.com with ESMTP id v15si12391286eda.117.2020.11.16.11.26.46; Mon, 16 Nov 2020 11:27:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727531AbgKPHXB (ORCPT + 99 others); Mon, 16 Nov 2020 02:23:01 -0500 Received: from szxga06-in.huawei.com ([45.249.212.32]:7502 "EHLO szxga06-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727412AbgKPHXB (ORCPT ); Mon, 16 Nov 2020 02:23:01 -0500 Received: from DGGEMS401-HUB.china.huawei.com (unknown [172.30.72.60]) by szxga06-in.huawei.com (SkyGuard) with ESMTP id 4CZLBZ1jQfzhbcL; Mon, 16 Nov 2020 15:22:46 +0800 (CST) Received: from [10.174.187.179] (10.174.187.179) by DGGEMS401-HUB.china.huawei.com (10.3.19.201) with Microsoft SMTP Server id 14.3.487.0; Mon, 16 Nov 2020 15:22:47 +0800 Subject: Re: [PATCH] irqchip/gic-v4.1: Optimize the wait for the completion of the analysis of the VPT To: Thomas Gleixner , Jason Cooper , Marc Zyngier , , , , , James Morse , Julien Thierry , Suzuki K Poulose , Catalin Marinas , Will Deacon , Eric Auger , Christoffer Dall CC: , References: <20200923063543.1920-1-lushenming@huawei.com> From: Shenming Lu Message-ID: <5e09e050-071d-5a74-ec2b-aa6afd1480b9@huawei.com> Date: Mon, 16 Nov 2020 15:22:46 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.2.2 MIME-Version: 1.0 In-Reply-To: <20200923063543.1920-1-lushenming@huawei.com> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.187.179] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Marc, Friendly ping, it is some time since I sent this patch according to your last advice... Besides, recently we found that the mmio delay on GICv4.1 system is about 10 times higher than that on GICv4.0 system in kvm-unit-tests (the specific data is as follows). By the way, HiSilicon GICv4.1 has already been implemented and will be released with our next-generation server, which is almost the only implementation of GICv4.1 at present. | GICv4.1 emulator | GICv4.0 emulator mmio_read_user (ns) | 12811 | 1598 After analysis, this is mainly caused by the 10 us delay in its_wait_vpt_parse_complete() (the above difference is just about 10 us)... What's your opinion about this? Thanks, Shenming On 2020/9/23 14:35, Shenming Lu wrote: > Right after a vPE is made resident, the code starts polling the > GICR_VPENDBASER.Dirty bit until it becomes 0, where the delay_us > is set to 10. But in our measurement, it takes only hundreds of > nanoseconds, or 1~2 microseconds, to finish parsing the VPT in most > cases. And we also measured the time from vcpu_load() (include it) > to __guest_enter() on Kunpeng 920. On average, it takes 2.55 microseconds > (not first run && the VPT is empty). So 10 microseconds delay might > really hurt performance. > > To avoid this, we can set the delay_us to 1, which is more appropriate > in this situation and universal. Besides, we can delay the execution > of its_wait_vpt_parse_complete() (call it from kvm_vgic_flush_hwstate() > corresponding to vPE resident), giving the GIC a chance to work in > parallel with the CPU on the entry path. > > Signed-off-by: Shenming Lu > --- > arch/arm64/kvm/vgic/vgic-v4.c | 18 ++++++++++++++++++ > arch/arm64/kvm/vgic/vgic.c | 2 ++ > drivers/irqchip/irq-gic-v3-its.c | 14 +++++++++++--- > drivers/irqchip/irq-gic-v4.c | 11 +++++++++++ > include/kvm/arm_vgic.h | 3 +++ > include/linux/irqchip/arm-gic-v4.h | 4 ++++ > 6 files changed, 49 insertions(+), 3 deletions(-) > > diff --git a/arch/arm64/kvm/vgic/vgic-v4.c b/arch/arm64/kvm/vgic/vgic-v4.c > index b5fa73c9fd35..1d5d2d6894d3 100644 > --- a/arch/arm64/kvm/vgic/vgic-v4.c > +++ b/arch/arm64/kvm/vgic/vgic-v4.c > @@ -353,6 +353,24 @@ int vgic_v4_load(struct kvm_vcpu *vcpu) > return err; > } > > +void vgic_v4_wait_vpt(struct kvm_vcpu *vcpu) > +{ > + struct its_vpe *vpe; > + > + if (kvm_vgic_global_state.type == VGIC_V2 || !vgic_supports_direct_msis(vcpu->kvm)) > + return; > + > + vpe = &vcpu->arch.vgic_cpu.vgic_v3.its_vpe; > + > + if (vpe->vpt_ready) > + return; > + > + if (its_wait_vpt(vpe)) > + return; > + > + vpe->vpt_ready = true; > +} > + > static struct vgic_its *vgic_get_its(struct kvm *kvm, > struct kvm_kernel_irq_routing_entry *irq_entry) > { > diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c > index c3643b7f101b..ed810a80cda2 100644 > --- a/arch/arm64/kvm/vgic/vgic.c > +++ b/arch/arm64/kvm/vgic/vgic.c > @@ -915,6 +915,8 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu) > > if (can_access_vgic_from_kernel()) > vgic_restore_state(vcpu); > + > + vgic_v4_wait_vpt(vcpu); > } > > void kvm_vgic_load(struct kvm_vcpu *vcpu) > diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c > index 548de7538632..b7cbc9bcab9d 100644 > --- a/drivers/irqchip/irq-gic-v3-its.c > +++ b/drivers/irqchip/irq-gic-v3-its.c > @@ -3803,7 +3803,7 @@ static void its_wait_vpt_parse_complete(void) > WARN_ON_ONCE(readq_relaxed_poll_timeout_atomic(vlpi_base + GICR_VPENDBASER, > val, > !(val & GICR_VPENDBASER_Dirty), > - 10, 500)); > + 1, 500)); > } > > static void its_vpe_schedule(struct its_vpe *vpe) > @@ -3837,7 +3837,7 @@ static void its_vpe_schedule(struct its_vpe *vpe) > val |= GICR_VPENDBASER_Valid; > gicr_write_vpendbaser(val, vlpi_base + GICR_VPENDBASER); > > - its_wait_vpt_parse_complete(); > + vpe->vpt_ready = false; > } > > static void its_vpe_deschedule(struct its_vpe *vpe) > @@ -3881,6 +3881,10 @@ static int its_vpe_set_vcpu_affinity(struct irq_data *d, void *vcpu_info) > its_vpe_schedule(vpe); > return 0; > > + case WAIT_VPT: > + its_wait_vpt_parse_complete(); > + return 0; > + > case DESCHEDULE_VPE: > its_vpe_deschedule(vpe); > return 0; > @@ -4047,7 +4051,7 @@ static void its_vpe_4_1_schedule(struct its_vpe *vpe, > > gicr_write_vpendbaser(val, vlpi_base + GICR_VPENDBASER); > > - its_wait_vpt_parse_complete(); > + vpe->vpt_ready = false; > } > > static void its_vpe_4_1_deschedule(struct its_vpe *vpe, > @@ -4118,6 +4122,10 @@ static int its_vpe_4_1_set_vcpu_affinity(struct irq_data *d, void *vcpu_info) > its_vpe_4_1_schedule(vpe, info); > return 0; > > + case WAIT_VPT: > + its_wait_vpt_parse_complete(); > + return 0; > + > case DESCHEDULE_VPE: > its_vpe_4_1_deschedule(vpe, info); > return 0; > diff --git a/drivers/irqchip/irq-gic-v4.c b/drivers/irqchip/irq-gic-v4.c > index 0c18714ae13e..36be42569872 100644 > --- a/drivers/irqchip/irq-gic-v4.c > +++ b/drivers/irqchip/irq-gic-v4.c > @@ -258,6 +258,17 @@ int its_make_vpe_resident(struct its_vpe *vpe, bool g0en, bool g1en) > return ret; > } > > +int its_wait_vpt(struct its_vpe *vpe) > +{ > + struct its_cmd_info info = { }; > + > + WARN_ON(preemptible()); > + > + info.cmd_type = WAIT_VPT; > + > + return its_send_vpe_cmd(vpe, &info); > +} > + > int its_invall_vpe(struct its_vpe *vpe) > { > struct its_cmd_info info = { > diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h > index a8d8fdcd3723..b55a835d28a8 100644 > --- a/include/kvm/arm_vgic.h > +++ b/include/kvm/arm_vgic.h > @@ -402,6 +402,9 @@ int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int irq, > struct kvm_kernel_irq_routing_entry *irq_entry); > > int vgic_v4_load(struct kvm_vcpu *vcpu); > + > +void vgic_v4_wait_vpt(struct kvm_vcpu *vcpu); > + > int vgic_v4_put(struct kvm_vcpu *vcpu, bool need_db); > > #endif /* __KVM_ARM_VGIC_H */ > diff --git a/include/linux/irqchip/arm-gic-v4.h b/include/linux/irqchip/arm-gic-v4.h > index 6976b8331b60..68ac2b7b9309 100644 > --- a/include/linux/irqchip/arm-gic-v4.h > +++ b/include/linux/irqchip/arm-gic-v4.h > @@ -75,6 +75,8 @@ struct its_vpe { > u16 vpe_id; > /* Pending VLPIs on schedule out? */ > bool pending_last; > + /* VPT parse complete */ > + bool vpt_ready; > }; > > /* > @@ -103,6 +105,7 @@ enum its_vcpu_info_cmd_type { > PROP_UPDATE_VLPI, > PROP_UPDATE_AND_INV_VLPI, > SCHEDULE_VPE, > + WAIT_VPT, > DESCHEDULE_VPE, > INVALL_VPE, > PROP_UPDATE_VSGI, > @@ -128,6 +131,7 @@ struct its_cmd_info { > int its_alloc_vcpu_irqs(struct its_vm *vm); > void its_free_vcpu_irqs(struct its_vm *vm); > int its_make_vpe_resident(struct its_vpe *vpe, bool g0en, bool g1en); > +int its_wait_vpt(struct its_vpe *vpe); > int its_make_vpe_non_resident(struct its_vpe *vpe, bool db); > int its_invall_vpe(struct its_vpe *vpe); > int its_map_vlpi(int irq, struct its_vlpi_map *map); >