Received: by 2002:a05:6a10:9afc:0:0:0:0 with SMTP id t28csp282437pxm; Fri, 25 Feb 2022 07:57:28 -0800 (PST) X-Google-Smtp-Source: ABdhPJwZERRlD9aYJvuHL+KdxIXEl0eDgjvzfj2b0G80rp1BnaPT+osdHmqob+tXwj0hAGLMday+ X-Received: by 2002:aa7:c251:0:b0:404:769b:adb3 with SMTP id y17-20020aa7c251000000b00404769badb3mr7798334edo.98.1645804648421; Fri, 25 Feb 2022 07:57:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645804648; cv=none; d=google.com; s=arc-20160816; b=KVGpZ3R9dMBpeFfdQUe7ufrmH9R8YE4DqYaImD+FK3nroUkWqbtFZflccFPjzgCr1D e95XS4eomLo1xNTm99hOqr66nuJg5plebeaRrCFWuLFlDhkUDXrZxzzgr7cXm6aodfqY /k8tutUG8PqdRZB2fUGy8Yj8pjnsYDF7mAH43FMMt10jOGJwm/7rq5IQgvx9gCmoKKPt vcRUBze/v2qAwaRbVl1Bp+p1LbBXedFZEv01eW1eIv4DKCmRLvnDb+dKcfkexJk3LmHX 21EDfCFr+P2AAau9vvEhEpT8uZ8B5JbuYPsdbD0vcfYiiea9xoMGizhGIfQebGUppFnB sVhA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:dkim-signature; bh=vhdaWsgc8pUfwK/nTPSWJVbbpkwucFw4giQzlRhkFMo=; b=09A6eZO20OnMfBseXtsGBK53xgsE4FYhXG7K8f2whRmUvaP18GHBKv7nDzZ83BNf0H lum+94LkjQo7JhunCY8plZeRSlq5S+E7mqOqJQbAK+F/s4GIAgOUBqau1KHjkR7V4wUb Ep8FRu/4owwMRCDTB8+RhVULQrBiBt6vGFI10JLAA6iMqrrnnycD/RIw8hJmWibITohS aTwHJH5lmrdPFpgYT4EJD7BnIZRRdsTGt/JkegksbRyb6TY/5J4q6lg4JEpioAsEFYaK XJUdM05Qz0Sxz9i80wG1mHy578FkY3C+FTDhyrGHDcKirxA3o2kROnQTKfKN6rx+NaH/ xKXg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=PXM9FWFL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d13-20020a1709064c4d00b006ce98eea718si1548709ejw.923.2022.02.25.07.57.04; Fri, 25 Feb 2022 07:57:28 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=PXM9FWFL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230429AbiBYNOn (ORCPT + 99 others); Fri, 25 Feb 2022 08:14:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41524 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229512AbiBYNOh (ORCPT ); Fri, 25 Feb 2022 08:14:37 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 707D01EDA28 for ; Fri, 25 Feb 2022 05:14:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1645794842; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=vhdaWsgc8pUfwK/nTPSWJVbbpkwucFw4giQzlRhkFMo=; b=PXM9FWFL29QIgdODaV950tqeQfS9VJa45PNIdZ/I4zMJpA3THwqGkfzoLk6gccKrOPKluc 9NOOwZd4OEfLhvigIbgDX59UQ+onoIAhfuuZZD+lnjjjqOiOllBds7CNvHjJvOHdbutnb+ PoLb3bUwn0sUv3ZQMfFG3auGnrsPogk= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-365-ICU3eu9lMialpVb-EUq3ag-1; Fri, 25 Feb 2022 08:14:01 -0500 X-MC-Unique: ICU3eu9lMialpVb-EUq3ag-1 Received: by mail-wr1-f69.google.com with SMTP id w8-20020a5d4b48000000b001ef708e7f71so388691wrs.7 for ; Fri, 25 Feb 2022 05:14:00 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version; bh=vhdaWsgc8pUfwK/nTPSWJVbbpkwucFw4giQzlRhkFMo=; b=19AqGv6u0rHt6rG2FGyNNqIX8ESdgOwKLoxb0OnoLjG0zGWeTzDkVJYLgwfy2cIuw3 vsJ6S3UK7iR8qp3MMxqYZUBY4XevSf1DmTHUGmtF6/2gq6U3LXDujKJCvbF53UXxZB1H O2nNMKi4Pc0Ed0mEJxxHGPGMVXMoRT6tm3OZlkDOpmRdzANISzt4DH1Dax9bpsl3d90X KjHGr3J7YSkOaPJcXVVIw1XfFUVADAEd+GQMknFPjOUB4D9AtVtRHpZjLnktZy7Zss1L Didh4E908uGK14YwuWymcS2RrrlZDVrzuO9Sz7MxWPUr4Lb6Y+BOVK090GqM1wdKj4ln 2u+g== X-Gm-Message-State: AOAM5315bGsFaRtXeP8DwdxTHcWFl1v/dcj3R9p3u21xNwd7rBSp6XqT FMRvR25rylMmTEnU5+nrzTopF8pZVuyUb/+6GHsRqqJNTXZZH362RR0z0bWSmyA0vqVi8eZFumk ENL6L47ZISni0JXFKb4Unj0+ObdI7q2q5eAu3CAjwcYhpBcxMlPXIwOMBb+9b2YjPi1N44Ejy1p RX X-Received: by 2002:a5d:52c8:0:b0:1ed:e591:be70 with SMTP id r8-20020a5d52c8000000b001ede591be70mr5861911wrv.436.1645794839931; Fri, 25 Feb 2022 05:13:59 -0800 (PST) X-Received: by 2002:a5d:52c8:0:b0:1ed:e591:be70 with SMTP id r8-20020a5d52c8000000b001ede591be70mr5861892wrv.436.1645794839589; Fri, 25 Feb 2022 05:13:59 -0800 (PST) Received: from fedora (nat-2.ign.cz. [91.219.240.2]) by smtp.gmail.com with ESMTPSA id k4-20020adfe8c4000000b001e68c92af35sm2295773wrn.30.2022.02.25.05.13.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Feb 2022 05:13:59 -0800 (PST) From: Vitaly Kuznetsov To: Paolo Bonzini , kvm@vger.kernel.org Cc: Sean Christopherson , Wanpeng Li , Jim Mattson , Siddharth Chandrasekaran , linux-kernel@vger.kernel.org Subject: Re: [PATCH 0/4] KVM: x86: hyper-v: XMM fast hypercalls fixes In-Reply-To: References: <20220222154642.684285-1-vkuznets@redhat.com> Date: Fri, 25 Feb 2022 14:13:58 +0100 Message-ID: <871qzrdr6x.fsf@redhat.com> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Paolo Bonzini writes: > On 2/22/22 16:46, Vitaly Kuznetsov wrote: >> While working on some Hyper-V TLB flush improvements and Direct TLB flush >> feature for Hyper-V on KVM I experienced Windows Server 2019 crashes on >> boot when XMM fast hypercall input feature is advertised. Turns out, >> HVCALL_SEND_IPI_EX is also an XMM fast hypercall and returning an error >> kills the guest. This is fixed in PATCH4. PATCH3 fixes erroneous capping >> of sparse CPU banks for XMM fast TLB flush hypercalls. The problem should >> be reproducible with >360 vCPUs. >> >> Vitaly Kuznetsov (4): >> KVM: x86: hyper-v: Drop redundant 'ex' parameter from >> kvm_hv_send_ipi() >> KVM: x86: hyper-v: Drop redundant 'ex' parameter from >> kvm_hv_flush_tlb() >> KVM: x86: hyper-v: Fix the maximum number of sparse banks for XMM fast >> TLB flush hypercalls >> KVM: x86: hyper-v: HVCALL_SEND_IPI_EX is an XMM fast hypercall >> >> arch/x86/kvm/hyperv.c | 84 +++++++++++++++++++++++-------------------- >> 1 file changed, 45 insertions(+), 39 deletions(-) >> > > Merging this in 5.18 is a bit messy. Please check that the below > patch against kvm/next makes sense: Something is wrong with the diff as it doesn't apply :-( > > diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c > index 653e08c993c4..98fb998c31ce 100644 > --- a/arch/x86/kvm/hyperv.c > +++ b/arch/x86/kvm/hyperv.c > @@ -1770,9 +1770,11 @@ struct kvm_hv_hcall { > }; > > static u64 kvm_get_sparse_vp_set(struct kvm *kvm, struct kvm_hv_hcall *hc, > + int consumed_xmm_halves, > u64 *sparse_banks, gpa_t offset) > { > u16 var_cnt; > + int i; > > if (hc->var_cnt > 64) > return -EINVAL; > @@ -1780,13 +1782,29 @@ static u64 kvm_get_sparse_vp_set(struct kvm *kvm, struct kvm_hv_hcall *hc, > /* Ignore banks that cannot possibly contain a legal VP index. */ > var_cnt = min_t(u16, hc->var_cnt, KVM_HV_MAX_SPARSE_VCPU_SET_BITS); > > + if (hc->fast) { > + /* > + * Each XMM holds two sparse banks, but do not count halves that > + * have already been consumed for hypercall parameters. > + */ > + if (hc->var_cnt > 2 * HV_HYPERCALL_MAX_XMM_REGISTERS - consumed_xmm_halves) > + return HV_STATUS_INVALID_HYPERCALL_INPUT; > + for (i = 0; i < var_cnt; i++) { > + int j = i + consumed_xmm_halves; > + if (j % 2) > + sparse_banks[i] = sse128_lo(hc->xmm[j / 2]); > + else > + sparse_banks[i] = sse128_hi(hc->xmm[j / 2]); Let's say we have 1 half of XMM0 consumed. Now: i = 0; j = 1; if (1) sparse_banks[0] = sse128_lo(hc->xmm[0]); This doesn't look right as we need to get the upper half of XMM0. I guess it should be reversed, if (j % 2) sparse_banks[i] = sse128_hi(hc->xmm[j / 2]); else sparse_banks[i] = sse128_lo(hc->xmm[j / 2]); > + } > + return 0; > + } > + > return kvm_read_guest(kvm, hc->ingpa + offset, sparse_banks, > var_cnt * sizeof(*sparse_banks)); > } > > -static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc, bool ex) > +static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc) > { > - int i; > struct kvm *kvm = vcpu->kvm; > struct hv_tlb_flush_ex flush_ex; > struct hv_tlb_flush flush; > @@ -1803,7 +1821,8 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc, bool > */ > BUILD_BUG_ON(KVM_HV_MAX_SPARSE_VCPU_SET_BITS > 64); > > - if (!ex) { > + if (hc->code == HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST || > + hc->code == HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE) { In case you're trying to come up with a smaller patch for 5.18, we can certainly drop these 'ex'/'non-ex' changes as these are merely cosmetic. > if (hc->fast) { > flush.address_space = hc->ingpa; > flush.flags = hc->outgpa; > @@ -1859,17 +1878,7 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc, bool > if (!hc->var_cnt) > goto ret_success; > > - if (hc->fast) { > - if (hc->var_cnt > HV_HYPERCALL_MAX_XMM_REGISTERS - 1) > - return HV_STATUS_INVALID_HYPERCALL_INPUT; > - for (i = 0; i < hc->var_cnt; i += 2) { > - sparse_banks[i] = sse128_lo(hc->xmm[i / 2 + 1]); > - sparse_banks[i + 1] = sse128_hi(hc->xmm[i / 2 + 1]); > - } > - goto do_flush; > - } > - > - if (kvm_get_sparse_vp_set(kvm, hc, sparse_banks, > + if (kvm_get_sparse_vp_set(kvm, hc, 2, sparse_banks, > offsetof(struct hv_tlb_flush_ex, > hv_vp_set.bank_contents))) I like your idea to put 'consumed_xmm_halves' into kvm_get_sparse_vp_set() as kvm_hv_flush_tlb is getting too big. > return HV_STATUS_INVALID_HYPERCALL_INPUT; > @@ -1913,7 +1922,7 @@ static void kvm_send_ipi_to_many(struct kvm *kvm, u32 vector, > } > } > > -static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc, bool ex) > +static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc) > { > struct kvm *kvm = vcpu->kvm; > struct hv_send_ipi_ex send_ipi_ex; > @@ -1924,7 +1933,7 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc, bool > u32 vector; > bool all_cpus; > > - if (!ex) { > + if (hc->code == HVCALL_SEND_IPI) { > if (!hc->fast) { > if (unlikely(kvm_read_guest(kvm, hc->ingpa, &send_ipi, > sizeof(send_ipi)))) > @@ -1943,9 +1952,15 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc, bool > > trace_kvm_hv_send_ipi(vector, sparse_banks[0]); > } else { > - if (unlikely(kvm_read_guest(kvm, hc->ingpa, &send_ipi_ex, > - sizeof(send_ipi_ex)))) > - return HV_STATUS_INVALID_HYPERCALL_INPUT; > + if (!hc->fast) { > + if (unlikely(kvm_read_guest(kvm, hc->ingpa, &send_ipi_ex, > + sizeof(send_ipi_ex)))) > + return HV_STATUS_INVALID_HYPERCALL_INPUT; > + } else { > + send_ipi_ex.vector = (u32)hc->ingpa; > + send_ipi_ex.vp_set.format = hc->outgpa; > + send_ipi_ex.vp_set.valid_bank_mask = sse128_lo(hc->xmm[0]); > + } > > trace_kvm_hv_send_ipi_ex(send_ipi_ex.vector, > send_ipi_ex.vp_set.format, > @@ -1964,7 +1979,7 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc, bool > if (!hc->var_cnt) > goto ret_success; > > - if (kvm_get_sparse_vp_set(kvm, hc, sparse_banks, > + if (kvm_get_sparse_vp_set(kvm, hc, 1, sparse_banks, > offsetof(struct hv_send_ipi_ex, > vp_set.bank_contents))) > return HV_STATUS_INVALID_HYPERCALL_INPUT; > @@ -2126,6 +2141,7 @@ static bool is_xmm_fast_hypercall(struct kvm_hv_hcall *hc) > case HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE: > case HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX: > case HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX: > + case HVCALL_SEND_IPI_EX: > return true; > } > > @@ -2283,46 +2299,43 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu) > kvm_hv_hypercall_complete_userspace; > return 0; > case HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST: > - if (unlikely(!hc.rep_cnt || hc.rep_idx || hc.var_cnt)) { > + if (unlikely(hc.var_cnt)) { > ret = HV_STATUS_INVALID_HYPERCALL_INPUT; > break; > } > - ret = kvm_hv_flush_tlb(vcpu, &hc, false); > - break; > - case HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE: > - if (unlikely(hc.rep || hc.var_cnt)) { > - ret = HV_STATUS_INVALID_HYPERCALL_INPUT; > - break; > - } > - ret = kvm_hv_flush_tlb(vcpu, &hc, false); > - break; > + fallthrough; > case HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX: > if (unlikely(!hc.rep_cnt || hc.rep_idx)) { > ret = HV_STATUS_INVALID_HYPERCALL_INPUT; > break; > } > - ret = kvm_hv_flush_tlb(vcpu, &hc, true); > + ret = kvm_hv_flush_tlb(vcpu, &hc); > break; > + case HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE: > + if (unlikely(hc.var_cnt)) { > + ret = HV_STATUS_INVALID_HYPERCALL_INPUT; > + break; > + } > + fallthrough; > case HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX: > if (unlikely(hc.rep)) { > ret = HV_STATUS_INVALID_HYPERCALL_INPUT; > break; > } > - ret = kvm_hv_flush_tlb(vcpu, &hc, true); > + ret = kvm_hv_flush_tlb(vcpu, &hc); > break; > case HVCALL_SEND_IPI: > - if (unlikely(hc.rep || hc.var_cnt)) { > + if (unlikely(hc.var_cnt)) { > ret = HV_STATUS_INVALID_HYPERCALL_INPUT; > break; > } > - ret = kvm_hv_send_ipi(vcpu, &hc, false); > - break; > + fallthrough; > case HVCALL_SEND_IPI_EX: > - if (unlikely(hc.fast || hc.rep)) { > + if (unlikely(hc.rep)) { > ret = HV_STATUS_INVALID_HYPERCALL_INPUT; > break; > } > - ret = kvm_hv_send_ipi(vcpu, &hc, true); > + ret = kvm_hv_send_ipi(vcpu, &hc); > break; > case HVCALL_POST_DEBUG_DATA: > case HVCALL_RETRIEVE_DEBUG_DATA: I've smoke tested this (with the change I've mentioned above) and WS2019 booted with 65 vCPUs. This is a good sign) > > > The resulting merge commit is already in kvm/queue shortly (which should > become the next kvm/next as soon as tests complete). > I see, please swap sse128_lo()/sse128_hi() there too :-) -- Vitaly