Received: by 2002:ac0:e34a:0:0:0:0:0 with SMTP id g10csp557621imn; Tue, 26 Jul 2022 03:28:10 -0700 (PDT) X-Google-Smtp-Source: AGRyM1t0bZHx8A7+M1xepiG3FtB5hLBU4MzpU2+oQU7g5pLYTJja5X1AutQurZIkMeR3MiF3DbV7 X-Received: by 2002:a17:902:7795:b0:16c:b506:d41b with SMTP id o21-20020a170902779500b0016cb506d41bmr16687411pll.72.1658831290433; Tue, 26 Jul 2022 03:28:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658831290; cv=none; d=google.com; s=arc-20160816; b=jHl/ugRrI6VCxl5BC4MOyIymFDLix9j3CAdl1b6AY/veFmHWRkcjeclljNFg1n/jXY 0EhhyJMigo1amOzdI7h1zM29msGJ81TFpKpIF5EY5thmvowLbTCDS/C3fftMbLl27bU8 6Rr1cl4c7JwVU6PUkisdt7+0qe8uTlElMelDS0drp3r5JrvfCZKVWH48dbuuohZmQ3dU vtI6bIYp5eaABuZlQBjlV59Zoco/ogfo7/xTMmvx9bhNP9eojld7Ya0GLTO52FTh9nXV KEF/LqcscZEtkusLMR0CoMlS4Q1Xt36ikz9JmWmqKqO+tSqcfdcyWRkvil+7VN8gmXxg zOQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:subject :from:references:cc:to:content-language:user-agent:mime-version:date :message-id:dkim-signature; bh=4qHufDszC3S/UXoGPmLgr+LT8K7L0fqHbq2F1Jr6Ad0=; b=hU2J5FWFpo44VsEBQMrxJgXttCl7sqZz/aprRNsfgrENiLvJRipKf1RPTb/dc+ZNP6 cXKxZg1lwbA2MpXkieqUF5cVJCu5jS1XYOL3RNzpvQsM0jLccWMVT25TGnHtT6T2HS3H xsuaDDEulAGkbtdKY/eZ4mQz+c64vwvmkar9kwEAVEDZ3hDHH8AtNZp/E2LxccHJMigS /2YkA6IlUL4KEtJdAh7KH3mjmNJ+2bsocl1jeQM6QpgnX4p3MfvUTaBXQmBhQs/vVdJo fKhxD3QlGAOGpt/p9SQ8d2FiccCT+/OL/4YU3mfHUzVKekTXieANG8vTBUfVp2KPnWX9 Gm3g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=DdpNvhVS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j10-20020a17090276ca00b0016d31bb7d11si9619171plt.473.2022.07.26.03.27.55; Tue, 26 Jul 2022 03:28:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=DdpNvhVS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238545AbiGZK1N (ORCPT + 99 others); Tue, 26 Jul 2022 06:27:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56686 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231345AbiGZK1M (ORCPT ); Tue, 26 Jul 2022 06:27:12 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 6D1B8B7D6 for ; Tue, 26 Jul 2022 03:27:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1658831230; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4qHufDszC3S/UXoGPmLgr+LT8K7L0fqHbq2F1Jr6Ad0=; b=DdpNvhVSiRu0QgZDWeCamM0qQJwy5HSXnVlr1yaXYvyNvGkaNCknIl11liC9/BF7xNBKxj uyLyvRAyRYHBDgp/zGKMMA1b0vyP+h8XJqZ3KryPtJo3ShNMv1bCA2DuJsT3ltgeJOSTkO YCXo4PxbMQW9Ar0g8O9BoTQ76AlukLk= Received: from mail-ed1-f71.google.com (mail-ed1-f71.google.com [209.85.208.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-590-lZFieREVMRaMTRzfEC7TQQ-1; Tue, 26 Jul 2022 06:27:09 -0400 X-MC-Unique: lZFieREVMRaMTRzfEC7TQQ-1 Received: by mail-ed1-f71.google.com with SMTP id n8-20020a05640205c800b00434fb0c150cso8570656edx.19 for ; Tue, 26 Jul 2022 03:27:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:subject:in-reply-to :content-transfer-encoding; bh=4qHufDszC3S/UXoGPmLgr+LT8K7L0fqHbq2F1Jr6Ad0=; b=St1Hok7x9XIKhPO9cbd/TK+gohshVO+2dJFBTc1MuUH3LEh3kp7nsInN2gcXkIRQW5 OZ+zNUFea4rbecdZ5hyoDkpDK6U6F8YdNIMq6zL6h6uYhqMDsM18j3iE4RXQym0KHSY0 hXX7nQs5DGumZCQ5jGoeM2FXbmfUL5/XjXeIrMMDYujfpMZ8AWqw2uEnHhaXscKOAz3i wu5QqfZ7Thhy9BXFg9RGMtgCBg80joZb3Vksv2xS/Y1Eiez77gD+uARPwuYHRFBHugcg xsf9rbJDiA7lz1Efq9fweZeKP3kdKiAEnxPhW/+TC9q5cSAaamj3Qcp9Oy180ZQPI9QA iY1Q== X-Gm-Message-State: AJIora+l2J0la6qUjPf4h+SR4209PW2b7cmeY5V6yXM5KW1UDq1TyMky qfk/cYj0IUEQ9TWnqWek881UalR9+Vb2bNfXsrWw1d1rIlOzpP2qa7/MY9vjobtpINOXAKa8A0o QKxzrJ5ivrhkwpqo9FtJEPGM+ X-Received: by 2002:a17:907:9810:b0:72f:36e5:266c with SMTP id ji16-20020a170907981000b0072f36e5266cmr13370129ejc.105.1658831228077; Tue, 26 Jul 2022 03:27:08 -0700 (PDT) X-Received: by 2002:a17:907:9810:b0:72f:36e5:266c with SMTP id ji16-20020a170907981000b0072f36e5266cmr13370102ejc.105.1658831227663; Tue, 26 Jul 2022 03:27:07 -0700 (PDT) Received: from ?IPV6:2001:b07:6468:f312:9af8:e5f5:7516:fa89? ([2001:b07:6468:f312:9af8:e5f5:7516:fa89]) by smtp.googlemail.com with ESMTPSA id dk20-20020a0564021d9400b0043a71775903sm8368247edb.39.2022.07.26.03.27.06 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 26 Jul 2022 03:27:06 -0700 (PDT) Message-ID: <69b45487-ce0e-d643-6c48-03c5943ce2e6@redhat.com> Date: Tue, 26 Jul 2022 12:27:05 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Content-Language: en-US To: Andrei Vagin , Sean Christopherson Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Wanpeng Li , Vitaly Kuznetsov , Jianfeng Tan , Adin Scannell , Konstantin Bogomolov , Etienne Perot , Andy Lutomirski , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" References: <20220722230241.1944655-1-avagin@google.com> From: Paolo Bonzini Subject: Re: [PATCH 0/5] KVM/x86: add a new hypercall to execute host system In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 7/26/22 10:33, Andrei Vagin wrote: > We can think about restricting the list of system calls that this hypercall can > execute. In the user-space changes for gVisor, we have a list of system calls > that are not executed via this hypercall. For example, sigprocmask is never > executed by this hypercall, because the kvm vcpu has its signal mask. Another > example is the ioctl syscall, because it can be one of kvm ioctl-s. The main issue I have is that the system call addresses are not translated. On one hand, I understand why it's done like this; it's pretty much impossible to do it without duplicating half of the sentry in the host kernel. And the KVM API you're adding is certainly sensible. On the other hand this makes the hypercall even more specialized, as it depends on the guest's memslot layout, and not self-sufficient, in the sense that the sandbox isn't secure without prior copying and validation of arguments in guest ring0. > == Host Ring3/Guest ring0 mixed mode == > > This is how the gVisor KVM platform works right now. We don’t have a separate > hypervisor, and the Sentry does its functions. The Sentry creates a KVM virtual > machine instance, sets it up, and handles VMEXITs. As a result, the Sentry runs > in the host ring3 and the guest ring0 and can transparently switch between > these two contexts. In this scheme, the sentry syscall time is 3600ns. > This is for the case when a system call is called from gr0. > > The benefit of this way is that only a first system call triggers vmexit and > all subsequent syscalls are executed on the host natively. > > But it has downsides: > * Each sentry system call trigger the full exit to hr3. > * Each vmenter/vmexit requires to trigger a signal but it is expensive. > * It doesn't allow to support Confidential Computing (SEV-ES/SGX). The Sentry > has to be fully enclosed in a VM to be able to support these technologies. > > == Execute system calls from a user-space VMM == > > In this case, the Sentry is always running in VM, and a syscall handler in GR0 > triggers vmexit to transfer control to VMM (user process that is running in > hr3), VMM executes a required system call, and transfers control back to the > Sentry. We can say that it implements the suggested hypercall in the > user-space. > > The sentry syscall time is 2100ns in this case. > > The new hypercall does the same but without switching to the host ring 3. It > reduces the sentry syscall time to 1000ns. Yeah, ~3000 clock cycles is what I would expect. What does it translate to in terms of benchmarks? For example a simple netperf/UDP_RR benchmark. Paolo