Received: by 2002:a05:6358:a55:b0:ec:fcf4:3ecf with SMTP id 21csp1784319rwb; Thu, 19 Jan 2023 15:38:48 -0800 (PST) X-Google-Smtp-Source: AMrXdXu8UYutBLSHEDrrKdHI4kBY5Ngk/YyCAyGDa4ZWweNw7RPeiie2apUKFTXdV/2R98G5beTq X-Received: by 2002:a17:90a:c298:b0:229:b5e:2e03 with SMTP id f24-20020a17090ac29800b002290b5e2e03mr12251078pjt.32.1674171528657; Thu, 19 Jan 2023 15:38:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674171528; cv=none; d=google.com; s=arc-20160816; b=J5Hzg+N/pMvDBAZ4+B6YRDK2s3BbJCbYbmGcAk15Rp7BuJuAWgKD0KSTfS2JeegTUi a9hwLvIJL5G2FLJXQV1vQS3d2gDhApgLavbvq4A47FWnqteErw/0GRRLUfiDWi5DofSY 078v9YlsouHcK6tZ3fjakPRbvG8sFDMaTf2GZiFwdVrCkxASTdgDYeBMfTEn29/BGtY8 qXXk3yAvcP6Bz6IguS+2TdRHmS8/+EdoGIcDBqFZSG1enzq7JZCZVcZVL0fDFg1GM4Iu /veSfVI+KLKT1HUQAIrdLJd/6vw+h2VHudh6mvIwpM3JFVUJWZKP4HJwXr1BaIcL0z78 XY1A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=HeBX4hWuJsaHps7UYjGH1TLVfHHQaTbbfoKxw9Fb+mk=; b=OitN9dO5jOO1O/hLRbPUuVf2jk7d/bwdOgfy6YlvJ941kioN7TSkTMxRvJQKR0ma85 WLn87npiJFgmKS76NIKdwEqKrJrumby4eG7X0TPtfvZha8R6WqgbCJX/uZNOWk42IqGg VWkXTK+kfJOOIpkUxEa2q3SLU40elrHY5knnVfgCRZra7GS0SCftY6OqwXlORjP/ytko Fd6nnBq2hmhEC3dm76yoMtNhl/IEZ8EqsfJAeweY08uoPilkMFs4e+NdPL07OUlDl1CJ t6bItIVBr/Ie4fux4XCANiGxofCn2M8VHo1hWPQnUm2Ac/QzZtfA+DV12I7KSeMl9g4q oj5A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=OzyQ7BOi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h11-20020a17090aa88b00b00223770a1235si560673pjq.138.2023.01.19.15.38.42; Thu, 19 Jan 2023 15:38:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=OzyQ7BOi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230165AbjASXOd (ORCPT + 46 others); Thu, 19 Jan 2023 18:14:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58212 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231184AbjASXNh (ORCPT ); Thu, 19 Jan 2023 18:13:37 -0500 Received: from mail-pl1-x62e.google.com (mail-pl1-x62e.google.com [IPv6:2607:f8b0:4864:20::62e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6DB62212F for ; Thu, 19 Jan 2023 15:11:34 -0800 (PST) Received: by mail-pl1-x62e.google.com with SMTP id d3so3756782plr.10 for ; Thu, 19 Jan 2023 15:11:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=HeBX4hWuJsaHps7UYjGH1TLVfHHQaTbbfoKxw9Fb+mk=; b=OzyQ7BOijAJL49Qa36dbhVv6eDsyXYSJbiPEeGykSdviurgBNnPy1lh9Z1cSkdN3GB ohFFD4yRz/wNITkF4KCvnuk+OpQ72jgmquCsDnsE0mJTm/ClkmVB3FgeI1pWicBRjQCY 0ZnR3mT5AAfHNPydbkRqFIQdFgSbJOosa+hEaeywKuQs8h5TDSut5looUG85PQw+v8Fb 4B/L7qJbD5eXmq+xx49TOZo2Z0dmJq4VwEVgYSvGu59c7m8ZBWD2JCoIcrhZjslDZZZd YXuLnu2kIVOjncMEJlJZlHgjlutcCkM5dDRWKfWCzRomr9OY2brVpqr1VawGLmwKOxsq oh3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=HeBX4hWuJsaHps7UYjGH1TLVfHHQaTbbfoKxw9Fb+mk=; b=fMf6wVrnG1e+srcsNvNNbu5QN22cjB2LrdhHIZGdIRps9su5FL//2KjHZbHX1OxWYQ JhSlaSYLPhs2CUX4NqF59w8VEz0owtO5eAOyDtwBe3ReDnSdADJFl8v3w4bfV8nq03mB BMCR45pp13Ym1vRRbPIf3i/Y9tP9Xc7M8U1Q6w/2hqnZDUzlEe+zAUu0abyimzOwXAKW gLWHSrsXrJ60ApW2AOFtOJoaOYEKPqFZ0WokojBhVdrEQv5ncY60f5Y16glENZRXIqvl z26YgdfJihOUYeXDS+D3dh2lryIjkRRku4tesVwBX+MyIOlmMIOG8N7iSWuckbFaTncN EOEw== X-Gm-Message-State: AFqh2koTPT+H5fOgmFuUyXyCHqgrpfBLaJWVwWhhhKxTJnBdRcRCabsm FBqhj4egDdHgb/HMe1gMeH0hRA== X-Received: by 2002:a17:902:c947:b0:191:1543:6b2f with SMTP id i7-20020a170902c94700b0019115436b2fmr18994pla.3.1674169893768; Thu, 19 Jan 2023 15:11:33 -0800 (PST) Received: from google.com (7.104.168.34.bc.googleusercontent.com. [34.168.104.7]) by smtp.gmail.com with ESMTPSA id u9-20020a170903124900b00174c1855cd9sm25575902plh.267.2023.01.19.15.11.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Jan 2023 15:11:33 -0800 (PST) Date: Thu, 19 Jan 2023 23:11:30 +0000 From: Sean Christopherson To: "Huang, Kai" Cc: "dmatlack@google.com" , "sean.j.christopherson@intel.com" , "Shahar, Sagi" , "isaku.yamahata@gmail.com" , "Aktas, Erdem" , "kvm@vger.kernel.org" , "pbonzini@redhat.com" , "zhi.wang.linux@gmail.com" , "linux-kernel@vger.kernel.org" , "Yamahata, Isaku" Subject: Re: [PATCH v11 018/113] KVM: TDX: create/destroy VM structure Message-ID: References: <20230114111621.00001840@gmail.com> <20230117214414.00003229@gmail.com> <02b0e551647beed9ec3a2fefd3b659eb52c4846c.camel@intel.com> <38f506575caacd5488f73315b231c3282f893d46.camel@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <38f506575caacd5488f73315b231c3282f893d46.camel@intel.com> X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 19, 2023, Huang, Kai wrote: > On Thu, 2023-01-19 at 21:36 +0000, Sean Christopherson wrote: > > On Thu, Jan 19, 2023, Huang, Kai wrote: > > > On Thu, 2023-01-19 at 15:37 +0000, Sean Christopherson wrote: > > > > On Thu, Jan 19, 2023, Huang, Kai wrote: > > > > > On Tue, 2023-01-17 at 21:01 +0000, Sean Christopherson wrote: > > > > > > On Tue, Jan 17, 2023, Sean Christopherson wrote: > > > > > > > On Tue, Jan 17, 2023, Zhi Wang wrote: > > > > > > Oh, the other important piece I forgot to mention is that dropping mmu_lock deep > > > > > > in KVM's MMU in order to wait isn't always an option. Most flows would play nice > > > > > > with dropping mmu_lock and sleeping, but some paths, e.g. from the mmu_notifier, > > > > > > (conditionally) disallow sleeping. > > > > > > > > > > Could we do something similar to tdp_mmu_iter_cond_resched() but not simple busy > > > > > retrying "X times", at least at those paths that can release mmu_lock()? > > > > > > > > That's effectively what happens by unwinding up the stak with an error code. > > > > Eventually the page fault handler will get the error and retry the guest. > > > > > > > > > Basically we treat TDX_OPERAND_BUSY as seamcall_needbreak(), similar to > > > > > rwlock_needbreak(). I haven't thought about details though. > > > > > > > > I am strongly opposed to that approach. I do not want to pollute KVM's MMU code > > > > with a bunch of retry logic and error handling just because the TDX module is > > > > ultra paranoid and hostile to hypervisors. > > > > > > Right. But IIUC there's legal cases that SEPT SEAMCALL can return BUSY due to > > > multiple threads trying to read/modify SEPT simultaneously in case of TDP MMU. > > > For instance, parallel page faults on different vcpus on private pages. I > > > believe this is the main reason to retry. > > > > Um, crud. I think there's a bigger issue. KVM always operates on its copy of the > > S-EPT tables and assumes the the real S-EPT tables will always be synchronized with > > KVM's mirror. That assumption doesn't hold true without serializing SEAMCALLs in > > some way. E.g. if a SPTE is zapped and mapped at the same time, we can end up with: > > > > vCPU0 vCPU1 > > ===== ===== > > mirror[x] = xyz > > old_spte = mirror[x] > > mirror[x] = REMOVED_SPTE > > sept[x] = REMOVED_SPTE > > sept[x] = xyz > > IIUC this case cannot happen, as the two steps in the vcpu0 are within read > lock, which prevents from vcpu1, which holds the write lock during zapping SPTE. Zapping SPTEs can happen while holding mmu_lock for read, e.g. see the bug fixed by commit 21a36ac6b6c7 ("KVM: x86/mmu: Re-check under lock that TDP MMU SP hugepage is disallowed").