Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp2139197imm; Wed, 16 May 2018 08:27:06 -0700 (PDT) X-Google-Smtp-Source: AB8JxZq2u7gzeyUivx1WIRFjcd9fHXYx/kMRJc4S2i7B8uX/MNAuNObAP5F69feyCDrfMgMEZ8Yt X-Received: by 2002:aa7:84c7:: with SMTP id x7-v6mr1398213pfn.195.1526484426840; Wed, 16 May 2018 08:27:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526484426; cv=none; d=google.com; s=arc-20160816; b=AQ9ElAH/Z9LCkhOwyqUFSehXQKDfE4/VT5VEy7+beovhRG1r4lixcNmhg9GDX16my0 bcRf51dUQ0HXO/77V3unlVGUIBYWXUW3qNGYn8z8cZjChrHNSirixdQAFCdtoB8wZ3jv XrAXQ6wnlMkl377TQ9rTtsE2FUa9dlR1d9siIaK6B+PsqnZlzZOly1j4za94kaliv3b5 lYQBDGh5zaXGouQlCOJ4S/tKEQnxcT4zmJAqEfl/ZmScMbXJbsHL676L9dfW4EFnOga8 CrjzkVFqn5KJEENQa2A8eIf0S8agnlI1I5yHVWVv6+ZYCl2LSrwysVHK72OFsKFxjp0h YQIA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :arc-authentication-results; bh=kxavnvgGAoiHKXLWK7Vy5rQOgKVrBrcXKXVoF+7CmAE=; b=jvTXVIyw1poaE9r51SKnY8FMyo9AHeeDENkld/1IN2x99ZlSfMq2NzW+T3XjgQGAjv MQLuvqR3v9XUNZlhVYdUK1alukBO1EzgSYz96nHbdvh8ITStMZYbn+KAsJgATd1JgF9b okMSUjEvMtVknfvfC3Orf5pcE5d0yLCmJKkYaRSMSXu9Ferp/+YUdWn1ehFhNgaPppPS uzOWGVBtQMhrt80UCzWIUv66cDavNA8DvnWb73rm0oNxVlqxMV0Iyfb8n9a6RA5TPW6V mRqnmucvit4ZANWMj9vUI+mt2q8LIMeaZiOUzoitAsUHs51tk/GXt2Cg3gE2IlJ1cyTX w4qA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q3-v6si2980062pfg.298.2018.05.16.08.26.52; Wed, 16 May 2018 08:27:06 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751370AbeEPPVg (ORCPT + 99 others); Wed, 16 May 2018 11:21:36 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:34232 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750772AbeEPPVf (ORCPT ); Wed, 16 May 2018 11:21:35 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 7DC58859AE; Wed, 16 May 2018 15:21:34 +0000 (UTC) Received: from vitty.brq.redhat.com (unknown [10.43.2.155]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9FB292166BAD; Wed, 16 May 2018 15:21:32 +0000 (UTC) From: Vitaly Kuznetsov To: kvm@vger.kernel.org Cc: x86@kernel.org, Paolo Bonzini , =?UTF-8?q?Radim=20Kr=C4=8Dm=C3=A1=C5=99?= , Roman Kagan , "K. Y. Srinivasan" , Haiyang Zhang , Stephen Hemminger , "Michael Kelley (EOSG)" , Mohammed Gamal , Cathy Avery , linux-kernel@vger.kernel.org Subject: [PATCH v4 0/8] KVM: x86: hyperv: PV TLB flush for Windows guests Date: Wed, 16 May 2018 17:21:23 +0200 Message-Id: <20180516152131.30689-1-vkuznets@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Wed, 16 May 2018 15:21:34 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Wed, 16 May 2018 15:21:34 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'vkuznets@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Changes since v3 [Radim Krcmar]: - PATCH2 fixing 'HV_GENERIC_SET_SPARCE_4K' typo added. - PATCH5 introducing kvm_make_vcpus_request_mask() API added. - Fix undefined behavior for hv->vp_index >= 64. - Merge kvm_hv_flush_tlb() and kvm_hv_flush_tlb_ex() - For -ex case preload all banks with a single kvm_read_guest(). Description: This is both a new feature and a bugfix. Bugfix description: It was found that Windows 2016 guests on KVM crash when they have > 64 vCPUs, non-flat topology (>1 core/thread per socket; in case it has >64 sockets Windows just ignores vCPUs above 64) and Hyper-V enlightenments (any) are enabled. The most common error reported is "PAGE FAULT IN NONPAGED AREA" but I saw different messages. Apparently, Windows doesn't expect to run on a Hyper-V server without PV TLB flush support as there's no such Hyper-V servers out there (it's only WS2016 supporting > 64 vCPUs AFAIR). Adding PV TLB flush support to KVM helps, Windows 2016 guests now boot normally (I tried '-smp 128,sockets=64,cores=1,threads=2' and '-smp 128,sockets=8,cores=16,threads=1' but other topologies should work too). Feature description: PV TLB flush helps a lot when running overcommited. KVM gained support for it recently but it is only available for Linux guests. Windows guests use emulated Hyper-V interface and PV TLB flush needs to be added there. I tested WS2016 guest with 128 vCPUs running on a 12 pCPU server. The test was running 65 threads doing 50 mmap()/munmap() for 16384 pages with a tiny random nanosleep in between (I used Cygwin. It would be great if someone could point me to a good Windows-native TLB trashing test). The average results are: Before: real 0m22.464s user 0m0.990s sys 1m26.3276s After: real 0m19.304s user 0m0.908s sys 0m36.249s When running without overcommit the results of the same test are very close so the feature can be enabled by default. Implementation details. The implementation is very simplistic and straightforward. We ignore 'address space' argument of the hypercalls (as there is no good way to figure out what's currently in CR3 of a running vCPU as generally we don't VMEXIT on guest CR3 write) and do full TLB flush on specified vCPUs. In case said vCPUs are not running TLB flush will be performed upon guest enter. Qemu (and other userspaces) need to enable CPUID feature bits to make Windows aware the feature is supported. I'll post Qemu enablement patch separately. Patches are based on the current kvm/queue branch. Vitaly Kuznetsov (8): x86/hyper-v: move struct hv_flush_pcpu{,ex} definitions to common header x86/hyperv: fix typo in 'HV_GENERIC_SET_SPARCE_4K' definition KVM: x86: hyperv: use defines when parsing hypercall parameters KVM: x86: hyperv: do rep check for each hypercall separately KVM: introduce kvm_make_vcpus_request_mask() API KVM: x86: hyperv: simplistic HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} implementation KVM: x86: hyperv: simplistic HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE}_EX implementation KVM: x86: hyperv: declare KVM_CAP_HYPERV_TLBFLUSH capability Documentation/virtual/kvm/api.txt | 9 ++ arch/x86/hyperv/mmu.c | 42 +++------ arch/x86/include/asm/hyperv-tlfs.h | 22 ++++- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/hyperv.c | 171 ++++++++++++++++++++++++++++++++++--- arch/x86/kvm/trace.h | 51 +++++++++++ arch/x86/kvm/x86.c | 1 + include/linux/kvm_host.h | 3 + include/uapi/linux/kvm.h | 1 + virt/kvm/kvm_main.c | 34 ++++++-- 10 files changed, 282 insertions(+), 53 deletions(-) -- 2.14.3