Received: by 10.213.65.68 with SMTP id h4csp2533759imn; Mon, 2 Apr 2018 09:14:37 -0700 (PDT) X-Google-Smtp-Source: AIpwx49t5TV12w+t4c+qrhMlYCjus0IH3CmC9BjC2CHlq0ztbKEhMGvNfPgyFBjunSiJLrpJzh3Z X-Received: by 10.98.198.7 with SMTP id m7mr4291800pfg.66.1522685677421; Mon, 02 Apr 2018 09:14:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522685677; cv=none; d=google.com; s=arc-20160816; b=ZgEzrY6qhOwZEcaMG/0Eb+/PUCxhnWKQRNdf8gv9ZQy6tshLYIm+eJ0wYQnMJ60C8H Ljqk4V7E/B19Qs292DfgaGLhK7KoozfwfkAn0po+TtZfoHUu2KL3I8RcDJp5OOQXvJcE 0Py9uTIQssNJaPXG12kfnreU46AYttrZiUhSlG3UMDhca8+OUSv31/dKDSDizkJQzziY gFASygYZilNPfthAeOSe0i50v/aKvUphTstL9iQ0VWCUZtqCj/V0Uf1IT1Yn0a5emcid UR/OOvAf7kDkwgrTHeekCCBdh9pD0uZCNr9LUD+CwzIBc2thceF5AlcbPbE0my7AhAip 44Hg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :arc-authentication-results; bh=msy871ecWJVWU+Ms0mMketxm8GiaA5nGkO9oDi/zTX4=; b=WGjq26EOQWWDfehthZkcgivBcXI0jfKKlOPGITeUTckIWGaSz5fmInfwdi756znULr c6A9saZV740Koobv+62+x/Sl+v8RZOtMg9Q+91Or44W/ShicVumHnk301IhVtvxJF5z4 nGz8ByW7BIEsQ9pSOAUQDXaR2mfBkt4ctCcqEABDYugbPKz06RV3SeaTbeSvIXWgcyfD bBg+CyLOopB9eZuYvKZl5u7e6PHykpdqdCBPl84Rz2rfV67IDK1wXNkKbiViU9AEjPZz L3unUlVskHGRnQzwYBA6bqU/4niuz+Z0QGhXLCa2qMCYSjBAz6LIbjcNU+fO4VWhrm7t xhDg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x24si415901pgc.780.2018.04.02.09.14.22; Mon, 02 Apr 2018 09:14:37 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752785AbeDBQLD (ORCPT + 99 others); Mon, 2 Apr 2018 12:11:03 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:42150 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752105AbeDBQLB (ORCPT ); Mon, 2 Apr 2018 12:11:01 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D3BE08151D47; Mon, 2 Apr 2018 16:11:00 +0000 (UTC) Received: from vitty.brq.redhat.com (ovpn-204-121.brq.redhat.com [10.40.204.121]) by smtp.corp.redhat.com (Postfix) with ESMTP id B6497D7DF8; Mon, 2 Apr 2018 16:10:58 +0000 (UTC) From: Vitaly Kuznetsov To: kvm@vger.kernel.org Cc: x86@kernel.org, Paolo Bonzini , =?UTF-8?q?Radim=20Kr=C4=8Dm=C3=A1=C5=99?= , Roman Kagan , "K. Y. Srinivasan" , Haiyang Zhang , Stephen Hemminger , "Michael Kelley (EOSG)" , Mohammed Gamal , Cathy Avery , linux-kernel@vger.kernel.org Subject: [PATCH 0/5] KVM: x86: hyperv: PV TLB flush for Windows guests Date: Mon, 2 Apr 2018 18:10:54 +0200 Message-Id: <20180402161059.8488-1-vkuznets@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Mon, 02 Apr 2018 16:11:00 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Mon, 02 Apr 2018 16:11:00 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'vkuznets@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is both a new feature and a bugfix. Bugfix description: It was found that Windows 2016 guests on KVM crash when they have > 64 vCPUs, non-flat topology (>1 core/thread per socket; in case it has >64 sockets Windows just ignores vCPUs above 64) and Hyper-V enlightenments (any) are enabled. The most common error reported is "PAGE FAULT IN NONPAGED AREA" but I saw different messages. Apparently, Windows doesn't expect to run on a Hyper-V server without PV TLB flush support as there's no such Hyper-V servers out there (it's only WS2016 supporting > 64 vCPUs AFAIR). Adding PV TLB flush support to KVM helps, Windows 2016 guests now boot normally (I tried '-smp 128,sockets=64,cores=1,threads=2' and '-smp 128,sockets=8,cores=16,threads=1' but other topologies should work too). Feature description: PV TLB flush helps a lot when running overcommited. KVM gained support for it recently but it is only available for Linux guests. Windows guests use emulated Hyper-V interface and PV TLB flush needs to be added there. I tested WS2016 guest with 128 vCPUs running on a 12 pCPU server. The test was running 64 threads doing 100 mmap()/munmap() for 16384 pages with a tiny random nanosleep in between (I used Cygwin. It would be great if someone could point me to a good Windows-native TLB trashing test). The results are: Before: real 0m44.362s user 0m1.796s sys 6m43.218s After: real 0m24.425s user 0m1.811s sys 0m40.625s When running without overcommit (single 12 vCPU guest on 12 pCPU server) the results of the same test are very close: Before: real 0m21.237s user 0m1.531s sys 0m19.984s After: real 0m21.082s user 0m1.546s sys 0m20.030s Implementation details. The implementation is very simplistic and straightforward. We ignore 'address space' argument of the hypercalls (as there is no good way to figure out what's currently in CR3 of a running vCPU as generally we don't VMEXIT on guest CR3 write) and do full TLB flush on specified vCPUs. In case said vCPUs are not running TLB flush will be performed upon guest enter. Qemu (and other userspaces) need to enable CPUID feature bits to make Windows aware the feature is supported. I'll post Qemu enablement patch separately. Patches are based on the current kvm/queue branch. Vitaly Kuznetsov (5): x86/hyper-v: move struct hv_flush_pcpu{,ex} definitions to common header KVM: x86: hyperv: use defines when parsing hypercall parameters KVM: x86: hyperv: simplistic HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} implementation KVM: x86: hyperv: simplistic HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE}_EX implementation KVM: x86: hyperv: declare KVM_CAP_HYPERV_TLBFLUSH capability Documentation/virtual/kvm/api.txt | 9 +++ arch/x86/hyperv/mmu.c | 40 +++------- arch/x86/include/asm/hyperv-tlfs.h | 20 +++++ arch/x86/kvm/hyperv.c | 154 ++++++++++++++++++++++++++++++++++--- arch/x86/kvm/trace.h | 51 ++++++++++++ arch/x86/kvm/x86.c | 1 + include/uapi/linux/kvm.h | 1 + 7 files changed, 236 insertions(+), 40 deletions(-) -- 2.14.3