Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp2307369rwb; Thu, 29 Sep 2022 08:36:55 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6HuLRJ4h9d9O7o5sAqBrOrEELU40rzFggAItAI0Q1rqSeEEemm+Or0pbiRc9WvDxfuW9p0 X-Received: by 2002:a05:6a00:1989:b0:546:410f:bdd2 with SMTP id d9-20020a056a00198900b00546410fbdd2mr4080633pfl.83.1664465815067; Thu, 29 Sep 2022 08:36:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664465815; cv=none; d=google.com; s=arc-20160816; b=M3QM+9RLvSb/FKr2o7VaybjGyVuIM0BqzJ589FpJJCxew6brAhcrBQp94g1EcVMUxx aolzsVRED00Pv53hiPcV8RH/W+HruJmjlI4qTgk7r6NHQ7JOM49IrIOGTXmlgpzXrEoy TiOe4MQzE7dMgAPHWcx34Raq0BtzW0WANW0i6N+SzOZ/1VznJPzk1rA6UVaMlhyN6vPL F8fEeUvEVa3NdueDeb/ib8RCHKfC8KB6IMzpKbEhJW6XC5+6P+bhqHTLODTAqLojo9jE RAhGjsGF1sV1fsAr9WxZIFxKo2hQTzI0XdEKqfqKmvpGq94alivmM1nBW8K8vRK7A0si avzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:subject :from:references:cc:to:content-language:user-agent:mime-version:date :message-id:dkim-signature; bh=dovtu1XlKkEDvNR0VywsQZbH64mld5MIvsTiAroU6wc=; b=I41/RDLlldne6Bo/pMZ9rMd8iajUjta5IKC+XSXdkbka+mcJWcsp0s8EbpYWuSeV8l thtHN4dCdfya5aoKqnPKODTD4pTARpMPasrsHoY2RnF84ZhzhXd+/HGGlYnqD4AU2ChL JWiaO4mH+Gp1DxNFDDoL6jdOCHNG0rV1a1pkbx4GLpKMARISzqfpDRTVrJxK2V2Z6JWW 3MKoO5abZwJwnE40JShohtLn3j00PM/39HL5KnuA2FkoSbxigGl3S46FNUUQe7dQhuh9 mMvuPjWijKfF2De7+NdIuwgnT2XjGt7b9QwJhaVBiU+MvRVtJgDWPYm4oudtRCovsUAZ r5xA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=UZBWlopT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v202-20020a6361d3000000b00434ecdb8a30si77478pgb.5.2022.09.29.08.36.42; Thu, 29 Sep 2022 08:36:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=UZBWlopT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235821AbiI2P3G (ORCPT + 99 others); Thu, 29 Sep 2022 11:29:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36144 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235800AbiI2P3C (ORCPT ); Thu, 29 Sep 2022 11:29:02 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7E76F15D66E for ; Thu, 29 Sep 2022 08:29:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664465339; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dovtu1XlKkEDvNR0VywsQZbH64mld5MIvsTiAroU6wc=; b=UZBWlopTfSIH+ol+S4vy8YZ0zCStjR5THkafhjtrV/TRIu0Rg6+dXDsBVefHpzzsJo/lxC hxk4asUy2uwO7378oRGw9UaRTJRO4Sj08YckoV6eTYpqwf988si3c2oy52hJ8ElodMgZ0/ iJDv7Ic5J1uVHMZguWCYVV2cOYCbFAU= Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-369-LL-Dq8MLMb2sX4HaCU01EA-1; Thu, 29 Sep 2022 11:28:58 -0400 X-MC-Unique: LL-Dq8MLMb2sX4HaCU01EA-1 Received: by mail-ed1-f70.google.com with SMTP id i17-20020a05640242d100b0044f18a5379aso1557540edc.21 for ; Thu, 29 Sep 2022 08:28:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:subject:from:references:cc:to :content-language:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date; bh=dovtu1XlKkEDvNR0VywsQZbH64mld5MIvsTiAroU6wc=; b=mBsEctC4QG2kqIhwqTME1J1Yqr14coCDEqUu9kUJMHrvvBtYnbLItGXcDFaajzfMAz fm5NQv8UMSMj7SuFg0dn/caRYEOoS6DW5KVMz6x4xQagxVCdQc0WERBy34tH0LKGgfx1 URKv+aqWb9fGAn/tktiRxDS63lQS7AkV3YFf3WOTpGYH+AxgRqW11jGhIoaEKs1HAStz Tk8iGnDZsRJt2pD1yw2BSOn95TguC3ZE5P3CaT3v52awOPPCevSL23/wK+g+ZcopnrU1 vXStrbFVojK8z4Fha4LABZSc/eCnFAZGShj769zZJrbQU77kc1GhQX++vzOJV2nEdzAg T7vA== X-Gm-Message-State: ACrzQf1fW3N5BkN3eMnG8eZK9jzaikz7Ou6s/7FSGw79p0tEunxd4V+z zfU+icTY0WijfUmmfec2evFfV6t7nJ41bk0sBKq06TFfDLzMFFBvtubSyLLV2KGAGvmIWkCJzqg 83g2iq6nzs/DW4kgOgsoykHsM X-Received: by 2002:a17:907:d91:b0:784:107a:6aca with SMTP id go17-20020a1709070d9100b00784107a6acamr3121074ejc.761.1664465337346; Thu, 29 Sep 2022 08:28:57 -0700 (PDT) X-Received: by 2002:a17:907:d91:b0:784:107a:6aca with SMTP id go17-20020a1709070d9100b00784107a6acamr3121057ejc.761.1664465337055; Thu, 29 Sep 2022 08:28:57 -0700 (PDT) Received: from ?IPV6:2001:b07:6468:f312:1c09:f536:3de6:228c? ([2001:b07:6468:f312:1c09:f536:3de6:228c]) by smtp.googlemail.com with ESMTPSA id r19-20020a05640251d300b0044f21c69608sm5836698edd.10.2022.09.29.08.28.55 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 29 Sep 2022 08:28:56 -0700 (PDT) Message-ID: Date: Thu, 29 Sep 2022 17:28:54 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.2.1 Content-Language: en-US To: Sean Christopherson Cc: Emanuele Giuseppe Esposito , David Hildenbrand , Maxim Levitsky , kvm@vger.kernel.org, Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , linux-kernel@vger.kernel.org, Like Xu References: <55d7f0bd-ace1-506b-ea5b-105a86290114@redhat.com> <111a46c1-7082-62e3-4f3a-860a95cd560a@redhat.com> <14d5b8f2-7cb6-ce24-c7a7-32aa9117c953@redhat.com> <3b04db9d-0177-7e6e-a54c-a28ada8b1d36@redhat.com> <8534dfe4-bc71-2c14-b268-e610a3111d14@redhat.com> From: Paolo Bonzini Subject: Re: [RFC PATCH 0/9] kvm: implement atomic memslot updates In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-7.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 9/28/22 22:41, Sean Christopherson wrote: > On Wed, Sep 28, 2022, Paolo Bonzini wrote: >> On 9/28/22 17:58, Sean Christopherson wrote: >>> I don't disagree that the memslots API is lacking, but IMO that is somewhat >>> orthogonal to fixing KVM x86's "code fetch to MMIO" mess. Such a massive new API >>> should be viewed and prioritized as a new feature, not as a bug fix, e.g. I'd >>> like to have the luxury of being able to explore ideas beyond "let userspace >>> batch memslot updates", and I really don't want to feel pressured to get this >>> code reviewed and merge. >> >> I absolutely agree that this is not a bugfix. Most new features for KVM can >> be seen as bug fixes if you squint hard enough, but they're still features. > > I guess I'm complaining that there isn't sufficient justification for this new > feature. The cover letter provides a bug that would be fixed by having batched > updates, but as above, that's really due to deficiencies in a different KVM ABI. I disagree. Failure to fetch should be fixed but is otherwise a red herring. It's the whole memslot API (including dirty logging) that is a mess. If you think we should overhaul it even more than just providing atomic batched updates, that's fine. But still, the impossibility to perform atomic updates in batches *is* a suboptimal part of the KVM API. > - Why can't this be solved in userspace? I don't think *can't* is the right word. If the metric of choice was "what can be solved in userspace", we'd all be using microkernels. The question is why userspace would be a better place to solve it. The only reason to do it in userspace would be if failure to fetch is something that is interesting to userspace, other than between two KVM_SET_USER_MEMORY_REGION. Unless you provide an API to pass failures to fetch down to userspace, the locking in userspace is going to be inferior, because it would have to be unconditional. This means worse performance and more complication, not to mention having to do it N times instead of 1 for N implementations. Not forcing userspace to do "tricks" is in my opinion a strong part of deciding whether an API belongs in KVM. > - What operations does userspace truly need? E.g. if the only use case is to > split/truncate/hole punch an existing memslot, can KVM instead provide a > memslot flag and exit reason that allows kicking vCPUs to userspace if the > memslot is accessed? E.g. KVM_MEM_DISABLED that acts like an invalid memslot, > but KVM exists with a dedicated exit reason instead of generating MMIO semantics. The main cases are: - for the boot case, splitting and merging existing memslots. QEMU likes to merge adjacent memory regions into a single memslot, so if something goes from read-write to read-only it has to be split and vice versa. I guess a "don't merge this memory region" flag would be the less hideous way to solve it in userspace. - however, there is also the case of resizing an existing memslot which is what David would like to have for virtio-mem. This is not really fixable because part of the appeal of virtio-mem is to have a single huge memslot instead of many smaller ones, in order to reduce the granularity of add/remove (David, correct me if I'm wrong). (The latter _would_ be needed by other VMMs). > If updates only need to be "atomic" for an address space, does the API allowing > mixing non-SMM and SMM memslots? I agree that the address space should be moved out of the single entries and into the header if we follow through with this approach. > The update needs to be "atomic", i.e. vCPUs > must never see an invalid/deleted memslot, but if the memslot is writable, > how does KVM prevent some writes from hitting the old HVA and some from hitting > the new HVA without a quiescent period? (Heh, and I forgot likewise that non-x86 does not retry on KVM_MEMSLOT_INVALID. Yes, that would be treated as a bug on other architectures). >> Wholesale replacement was my first idea when I looked at the issue, I think >> at the end of 2020. I never got to a full implementation, but my impression >> was that allocating/deallocating dirty bitmaps, rmaps etc. would make it any >> easier than arbitrary batch updates. > > It's not obvious to me that the memslot metadata is going to be easy to handle > regardless of what we do. E.g. I'm pretty sure that batching updates will "corrupt" > the dirty bitmap if a hole is punched in a memslot that's being dirty logged. Indeed; I would have thought that it is clear with the batch updates API (which requires the update to be split into delete and insert), but apparently it's not and it's by no means optimal. Paolo