Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp9242848rwi; Tue, 25 Oct 2022 17:23:20 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4jGgqqtRXaAL2QfbGhg00Sbra9/d4HL7y4DOhOHawOZh6xg3/7GNIN46wjW3LTqvYyL3rL X-Received: by 2002:a05:6a00:14cc:b0:56b:9969:823 with SMTP id w12-20020a056a0014cc00b0056b99690823mr17969763pfu.36.1666743800216; Tue, 25 Oct 2022 17:23:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666743800; cv=none; d=google.com; s=arc-20160816; b=Zs75wrvi981m/vYB2e3EemR5zFywkID6h7+5qs3+7MOHOqG7RkBJurvhr6B3qiNkGy g0BnxwS74fSzuLN1gND1VvxCVsy946ucJYIlCKPvdrWWoj3XR4M+XU5/EpxCm2XZvAlb Cd8bkQnEfeWKja2ZI7yUCknQKugrArITrwc/+VqQkMOhw6fftW3hVDmqRLK8d4zgN1vj OICgVhbfDRi1SIjkGnpEUyMdk6/Pr2S9Ys8fqSbTtwkriWGmczG7be2BtXh3AQbUIQBb eo5TPmuTExexL7+trmBb78gGqLzyMVyITdVG9V14hHsMowDq985VYVI/7/pUbXAFK4qN iYxA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=A7kKcrW8+Rtrlgds+v4H23IM/wohgMjvlalkGtkGE48=; b=taR50Nzs+ezUYU8wydNQ1zHI2ePwtAGMdw2auu9RiNuFgTLdyL5FKckQwJ7pfPHliM imFX5W3icabJ2rK3hzm68VJ7Wnyk5RCtL6oYaR3kCr+XE6bRkFkHFemX8690NiQ2A0ve zQ2/WeorMwpDzZNrcjuo/DuSbWX6C2uCz8lM7UbkbptUOceprkkPpVi1p+GgmDzWI6Zh x9IX/j3SnSTPD4AHMSn4a8R3jm7f4ZwWP2htIOAcE81LhhMJFapTOutOWjSoLQk+27QS E5oCfx0a/Std6D25P0lB9n8qixw+hH9oqIEjlLbe1DT4oNfqVHr8k6QmO9u5uwUssr6V Czfw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=i9+bLN5F; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j3-20020a170902758300b0017e0ca906c8si4130034pll.568.2022.10.25.17.23.08; Tue, 25 Oct 2022 17:23:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=i9+bLN5F; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232617AbiJYXHS (ORCPT + 99 others); Tue, 25 Oct 2022 19:07:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52166 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232187AbiJYXHQ (ORCPT ); Tue, 25 Oct 2022 19:07:16 -0400 Received: from mail-pf1-x431.google.com (mail-pf1-x431.google.com [IPv6:2607:f8b0:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 765D3E985F for ; Tue, 25 Oct 2022 16:07:15 -0700 (PDT) Received: by mail-pf1-x431.google.com with SMTP id y13so8928360pfp.7 for ; Tue, 25 Oct 2022 16:07:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=A7kKcrW8+Rtrlgds+v4H23IM/wohgMjvlalkGtkGE48=; b=i9+bLN5FonRCnXyaeAZ/jEHxOllBLcuLRG6g7djDO2ymq1eoILB4dhajH6DdkL9Ypw dWBExwNgYe7L3rF6XlQLTuNRvlRhUlKYjHp/L1iVjmBzSGExzuACU05FIwWhOkZaZxhW SlFEj6y2aedc8tdZPGfzSyQiWabdMrxN8vyn13231CbpBxAP9sT5+IQhyoFu0+oYUWQp PhX+L5oqnTXUtVtqktZxw/YivgWoFURGH7AMjlXoM6O1hKn1gn0MTGBodUdfBHCYYrk0 knLwhuacFx5zJW48cTVCrNO9wqrHItHkkxk/aDLvtlYfLl6DtxNw1JoLy41IaidlmKLW zOMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=A7kKcrW8+Rtrlgds+v4H23IM/wohgMjvlalkGtkGE48=; b=LyDa1pqOdxmfAaZ798CfUc0YszKarvjGhMMxhLxq3woAOTKcs1ZMLKOTb4cPalt1qo Otv9QKotnicEb7UILX7TSx/tWGxU8V6OcsVG137CyyetX9nUqHiYkiLyMR16G/iCl1Dj uoFr6ddoOwpCHvOGZKeJG0z36gglHK0o9mp98eecd47wPTsmkS1LV5JXszQWG4gLO7sO YrGLSJkcRDAek+Hcy6pATiwh2HPk+yyHBn9PJy6iOKlpE+ih6yXJm3hS01LmWfLZ94W/ /JmqNINrlMMYipzDbMRPx/K0kTdN5fFs1dvmdqBujWJZXdHok8zCHRCFwZdVjNm5ib8H d5Eg== X-Gm-Message-State: ACrzQf3p+D7eMg+6XyDHEZmKh/eBg44jNZOoGYIs8f23H8rjqzwvgfwI ef/QLJ0h345FniC6W9YkCvR3Eg== X-Received: by 2002:a63:85c6:0:b0:46e:ffdb:2e77 with SMTP id u189-20020a6385c6000000b0046effdb2e77mr11866913pgd.439.1666739234843; Tue, 25 Oct 2022 16:07:14 -0700 (PDT) Received: from google.com (7.104.168.34.bc.googleusercontent.com. [34.168.104.7]) by smtp.gmail.com with ESMTPSA id x1-20020aa79a41000000b0056c32be99dcsm1164090pfj.117.2022.10.25.16.07.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Oct 2022 16:07:14 -0700 (PDT) Date: Tue, 25 Oct 2022 23:07:10 +0000 From: Sean Christopherson To: Paolo Bonzini Cc: Christian Borntraeger , Emanuele Giuseppe Esposito , kvm@vger.kernel.org, Jonathan Corbet , Maxim Levitsky , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , David Hildenbrand , x86@kernel.org, "H. Peter Anvin" , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 0/4] KVM: API to block and resume all running vcpus in a vm Message-ID: References: <20221022154819.1823133-1-eesposit@redhat.com> <2701ce67-bfff-8c0c-4450-7c4a281419de@redhat.com> <384b2622-8d7f-ce02-1452-84a86e3a5697@linux.ibm.com> <5a26c107-9ab5-60ee-0e9c-a9955dfe313d@redhat.com> <02c910bb-3ea0-fa84-7a1c-92fb9e8b03de@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <02c910bb-3ea0-fa84-7a1c-92fb9e8b03de@redhat.com> X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 25, 2022, Paolo Bonzini wrote: > On 10/25/22 17:55, Sean Christopherson wrote: > > On Tue, Oct 25, 2022, Paolo Bonzini wrote: > > - to prevent _all_ ioctls() because it's not just KVM_RUN that consumes memslots > > This is perhaps an occasion to solve another disagreement: I still think > that accessing memory outside KVM_RUN (for example KVM_SET_NESTED_STATE > loading the APICv pages from VMCS12) is a bug, on the other hand we > disagreed on that and you wanted to kill KVM_REQ_GET_NESTED_STATE_PAGES. I don't think it's realistic to make accesses outside of KVM_RUN go away, e.g. see the ARM ITS discussion in the dirty ring thread. kvm_xen_set_evtchn() also explicitly depends on writing guest memory without going through KVM_RUN (and apparently can be invoked from a kernel thread?!?). In theory, I do actually like the idea of restricting memory access to KVM_RUN, but in reality I just think that forcing everything into KVM_RUN creates far more problems than it solves. E.g. my complaint with KVM_REQ_GET_NESTED_STATE_PAGES is that instead of syncrhonously telling userspace it has a problem, KVM chugs along as if everything is fine and only fails at later point in time. I doubt userspace would actually do anything differently, i.e. the VM is likely hosed no matter what, but deferring work adds complexity in KVM and makes it more difficult to debug problems when they occur. > > - to stop anything else in the system that consumes KVM memslots, e.g. KVM GT > > Is this true if you only look at the KVM_GET_DIRTY_LOG case and consider it > a guest bug to access the memory (i.e. ignore the strange read-only changes > which only happen at boot, and which I agree are QEMU-specific)? Yes? I don't know exactly what "the KVM_GET_DIRTY_LOG case" is. > > - to signal vCPU tasks so that the system doesn't livelock if a vCPU is stuck > > outside of KVM, e.g. in get_user_pages_unlocked() (Peter Xu's series) > > This is the more important one but why would it livelock? Livelock may not be the right word. Peter's series is addressing a scenario where a vCPU gets stuck faulting in a page because the page never arrives over the network. The solution is to recognize non-fatal signals while trying to fault in the page. KVM_KICK_ALL_RUNNING_VCPUS doesn't handle that case because it's obviously not realistic to check for pending KVM requests while buried deep in mm/ code. I.e. userspace also needs to send SIGUSR1 or whatever to ensure all vCPUs get kicked out of non-KVM code. That's not the end of the world, and they probably end up being orthogonal things in userspace code, but it yields a weird API because KVM_KICK_ALL_RUNNING_VCPUS ends up with the caveat of "oh, by the way, userspace also needs to signal all vCPU tasks too, otherwise KVM_KICK_ALL_RUNNING_VCPUS might hang". > > And because of the nature of KVM, to support this API on all architectures, KVM > > needs to make change on all architectures, whereas userspace should be able to > > implement a generic solution. > > Yes, I agree that this is essentially just a more efficient kill(). > Emanuele, perhaps you can put together a patch to x86/vmexit.c in > kvm-unit-tests, where CPU0 keeps changing memslots and the other CPUs are in > a for(;;) busy wait, to measure the various ways to do it? I'm a bit confused. Is the goal of this to simplify QEMU, dedup VMM code, provide a more performant solution, something else entirely? I.e. why measure the performance of x86/vmexit.c? I have a hard time believing the overhead of pausing vCPUs is going to be the long pole when it comes to memslot changes. I assume rebuilding KVM's page tables because of the "zap all" behavior seems like would completely dwarf any overhead from pausing vCPUs.