Received: by 2002:ac0:e34a:0:0:0:0:0 with SMTP id g10csp746475imn; Tue, 26 Jul 2022 08:13:16 -0700 (PDT) X-Google-Smtp-Source: AGRyM1uaPQnkvAU/oTNLwAzU816D/OsiJqj9JlhMaPs+WDa7bdjKRTEqCj35YA59Ie5bkSbUu3D+ X-Received: by 2002:aa7:c9d3:0:b0:43a:67b9:6eea with SMTP id i19-20020aa7c9d3000000b0043a67b96eeamr18625092edt.94.1658848396199; Tue, 26 Jul 2022 08:13:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658848396; cv=none; d=google.com; s=arc-20160816; b=rMIeyw3T5brfxkO5liTnM8uTSwHeD8aEkevK4Z06dOto0ydNqGz8g36T8iimw5EY1y NQeBgpVvXYx8kf+7fLDaYq9qNSAD0qKLkr67JzM5ANXfDVkKHT4ZsN/muLOknkbQsWgn iIPBe74TMHV62HTNC9ku/y04QAPxhQd1o7zDBnIRfQ6wP2bw7t4HPXWPZpA+uU7Pjdl6 VwiOGJfXTTm1UOb81EF8ZKZYnNyU2Flz0l7YRna23TK/N3xiSMVa6pr+3Xfzo5YeghbH qa9kVguF08j1FD2RZfWDK8zRifc2xoDqAgJpG6b/PQVfAeVdxk2O0PX3HLhJ+3lpUQfr Raxw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=nEq1IaNhdkodBDzozbLgFdlJtMKKyeEDpgF5SJMm+Y8=; b=AxVVmt6FHyCtPQDH8G1kBInBDkfLUltNl3zdSplKpElU30Q+GpGnc35y9Q+lW0B0Ya FyRuIzQWRAwGLumzd2erHg+Q9OuTx7Vcf/dofoSPq4u+IS28clIeki/dRao6ZJ8rc4/K dmh3yajDh1KOrPC+0KcKW8ZhhSJjQ/MquUToR0pqOfB3LQcpPfgRwWnW+4qci5cQ9EZg xPFLBlxxFLdjZWdgFrYC6RciB6kWvSu0ahOJ8AePoRsBuQVQdDlPKWOjzlA1+haxBzN7 FbsUNhsqW2/wofip1SSle5sQHA3MGZ/d+jU8EdGQiHqg+vLmSU/Quxm9P9cku3cyAyyo JAHw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=OWEnJwcM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z17-20020a056402275100b0043a99ce7f70si2958571edd.22.2022.07.26.08.12.50; Tue, 26 Jul 2022 08:13:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=OWEnJwcM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229804AbiGZPKq (ORCPT + 99 others); Tue, 26 Jul 2022 11:10:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44580 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229976AbiGZPKk (ORCPT ); Tue, 26 Jul 2022 11:10:40 -0400 Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com [IPv6:2607:f8b0:4864:20::62f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AD626657C for ; Tue, 26 Jul 2022 08:10:39 -0700 (PDT) Received: by mail-pl1-x62f.google.com with SMTP id d7so13585961plr.9 for ; Tue, 26 Jul 2022 08:10:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=nEq1IaNhdkodBDzozbLgFdlJtMKKyeEDpgF5SJMm+Y8=; b=OWEnJwcMRH6J/QXCuwh5ay2U50ZpDhtt9UbnMucbCaA8Hn/Gf1tn7OIPJZvJJlYZph LW8ZsOCCLqtJX6LzYMwSle6gPv8wQUpMXg/MKvdNr6Jw/YVHewJywz77TGt6RYKsrv7s SaYVVhrt32cGZ4Vqk4SVa2I7QE52NbeI2bAhNurJ2VvJc9eEaSBJ74CCOhEq5pj6Lfjc GH7pedy/4PRnTzw3rnxOmJYwi3M2qip3MhP3oY2TjKi1A/z/1XkDjDb80k1rsXKkYTKQ 1rS339PAXGkNqzjwoqOFEuEedlCoeFDrxQRsqQt1kR5gbNkNRVUSH9SwC4ovtagB6bu7 dNxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=nEq1IaNhdkodBDzozbLgFdlJtMKKyeEDpgF5SJMm+Y8=; b=dhoJCQZdNJ04Gwk0btfunlSPYcjsLVlL6NQPCHEGoyePD7E7DYSzmNvoUEcnnq9pI/ 2EJWmObq+BQD3TCH9okI2gh9i0Hh+PEHZ1l+d0bKN8d+vdoiWrrQbWtHoi+vUGOeQPEd BWun0/dDJpP6oaJmQ2nZpWupJggk8AMbw+IEzpM4BRdixCC1/5Dy/8vFKVMIe4MSiWeF mD6sZnenRqCzoScRzYPVqoPQh6EneJ38QDvA/WDEuxHek4azXqSQWLuf6zWkzJRdj3da 8EKd0TmDxMxe/0grxrx/Gsu+UwbDU9k/iXXFT84C+CO21MOEKORDrfp6pNGso1+0OObn P8rQ== X-Gm-Message-State: AJIora+ebP5BRZNGbtAqY6bV1+n1oPFI0SenyQzSWoktYhpVORUbV3Vb S2Cu7FY4dD3Er0SmSUBOl3R6CctQsb1wRw== X-Received: by 2002:a17:902:d2d1:b0:16c:223e:a3db with SMTP id n17-20020a170902d2d100b0016c223ea3dbmr17861909plc.37.1658848238620; Tue, 26 Jul 2022 08:10:38 -0700 (PDT) Received: from google.com (7.104.168.34.bc.googleusercontent.com. [34.168.104.7]) by smtp.gmail.com with ESMTPSA id j1-20020a654d41000000b003fadd680908sm10389894pgt.83.2022.07.26.08.10.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Jul 2022 08:10:37 -0700 (PDT) Date: Tue, 26 Jul 2022 15:10:34 +0000 From: Sean Christopherson To: Andrei Vagin Cc: Paolo Bonzini , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Wanpeng Li , Vitaly Kuznetsov , Jianfeng Tan , Adin Scannell , Konstantin Bogomolov , Etienne Perot , Andy Lutomirski , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" Subject: Re: [PATCH 0/5] KVM/x86: add a new hypercall to execute host system Message-ID: References: <20220722230241.1944655-1-avagin@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 26, 2022, Andrei Vagin wrote: > On Fri, Jul 22, 2022 at 4:41 PM Sean Christopherson wrote: > > > > +x86 maintainers, patch 1 most definitely needs acceptance from folks beyond KVM. > > > > On Fri, Jul 22, 2022, Andrei Vagin wrote: > > > Another option is the KVM platform. In this case, the Sentry (gVisor > > > kernel) can run in a guest ring0 and create/manage multiple address > > > spaces. Its performance is much better than the ptrace one, but it is > > > still not great compared with the native performance. This change > > > optimizes the most critical part, which is the syscall overhead. > > > > What exactly is the source of the syscall overhead, > > Here are perf traces for two cases: when "guest" syscalls are executed via > hypercalls and when syscalls are executed by the user-space VMM: > https://gist.github.com/avagin/f50a6d569440c9ae382281448c187f4e > > And here are two tests that I use to collect these traces: > https://github.com/avagin/linux-task-diag/commit/4e19c7007bec6a15645025c337f2e85689b81f99 > > If we compare these traces, we can find that in the second case, we spend extra > time in vmx_prepare_switch_to_guest, fpu_swap_kvm_fpstate, vcpu_put, > syscall_exit_to_user_mode. So of those, I think the only path a robust implementation can actually avoid, without significantly whittling down the allowed set of syscalls, is syscall_exit_to_user_mode(). The bulk of vcpu_put() is vmx_prepare_switch_to_host(), and KVM needs to run through that before calling out of KVM. E.g. prctrl(ARCH_GET_GS) will read the wrong GS.base if MSR_KERNEL_GS_BASE isn't restored. And that necessitates calling vmx_prepare_switch_to_guest() when resuming the vCPU. FPU state, i.e. fpu_swap_kvm_fpstate() is likely a similar story, there's bound to be a syscall that accesses user FPU state and will do the wrong thing if guest state is loaded. For gVisor, that's all presumably a non-issue because it uses a small set of syscalls (or has guest==host state?), but for a common KVM feature it's problematic. > > and what alternatives have been explored? Making arbitrary syscalls from > > within KVM is mildly terrifying. > > "mildly terrifying" is a good sentence in this case:). If I were in your place, > I would think about it similarly. > > I understand these concerns about calling syscalls from the KVM code, and this > is why I hide this feature under a separate capability that can be enabled > explicitly. > > We can think about restricting the list of system calls that this hypercall can > execute. In the user-space changes for gVisor, we have a list of system calls > that are not executed via this hypercall. Can you provide that list? > But it has downsides: > * Each sentry system call trigger the full exit to hr3. > * Each vmenter/vmexit requires to trigger a signal but it is expensive. Can you explain this one? I didn't quite follow what this is referring to. > * It doesn't allow to support Confidential Computing (SEV-ES/SGX). The Sentry > has to be fully enclosed in a VM to be able to support these technologies. Speaking of SGX, this reminds me a lot of Graphene, SCONEs, etc..., which IIRC tackled the "syscalls are crazy expensive" problem by using a message queue and a dedicated task outside of the enclave to handle syscalls. Would something like that work, or is having to burn a pCPU (or more) to handle syscalls in the host a non-starter?