Received: by 2002:a05:7412:40d:b0:e2:908c:2ebd with SMTP id 13csp815156rdf; Tue, 21 Nov 2023 18:37:50 -0800 (PST) X-Google-Smtp-Source: AGHT+IHjH+gCT5BqgB1uU1f0jkRcaNXQIuODU+T0Qh2eJ81UFfQktKei+qTvPueadk/UtjE6MDu4 X-Received: by 2002:a05:6358:281a:b0:16b:fe18:27fc with SMTP id k26-20020a056358281a00b0016bfe1827fcmr1085477rwb.31.1700620670538; Tue, 21 Nov 2023 18:37:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700620670; cv=none; d=google.com; s=arc-20160816; b=BUigliOrNIHajMbqj6xCSY7BDuVuPV/NCw0Hta+aBgvXgNoi9thifoH/aWvILtug0d HUavMUU6w1TPtjtDnNxWUfLNZIAuYheqQ0OEjOvq3gu/6wmaUeQE4V6Bmwjh3Jp3pKkT 6S4LBvBaM9ryhpTlUzOrGbDanYFjXyHUAOCXhkytLQPRy5CYs7zx6FL1Cd+ijlRTiZGS WbaW+bvZRQCT0AyK37YA91XboUgWZGZeSVesx1+X7D97z34XEKD8nTzsnrOEWtmgEOOO cWcYpnCcjrRyrL0VpSvME04lbveAL7ZLwBhs+VAVimgMKWZ5lHUY9TPwsbblxyAboI4V wiCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:to:content-language:subject:cc:user-agent:mime-version :date:message-id:dkim-signature; bh=J9X/m9BLGdZHa9Q7wAdHVAOg85cNHuyWZkSgpzQrBpc=; fh=aDxJwOYf0lTMw/vMy7cZvPdaBQS+OnAHJLkcXdWCo5U=; b=kBZQy5Ycwh5X5v8HAsHVe4rqJF53XVtW8oZ6bTKVtnMEOwWxGoQQFBCuWGDtnHVTla 3ttyiMZMSksUDp+TdGHhQAqUj+3ymKRoCoMB2ZrqiIv/A19m0bOgG+qW1h7FBYdjaWP3 EFsj+0j6EqgyUhdar84j/bLyMHgy0rHAnCI99q1lP3o0TobO0v5aYcid1YqOU3G2hEC0 ZSknVk9VlDzBeIApU808uTHxhaxCQ9qAapxnum3OesM58rRRgyOSjLV+WW0RW2E/k3kZ fY5xqq5iSrEJzTTYbFbU5I8/lu626TnSxYa6x4JBv6i2Qv6kLCu+cL6LN4ZrilYbPIkw I5LQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=DaEkRuTS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id q11-20020a635c0b000000b005a9fde46fa1si11386586pgb.343.2023.11.21.18.37.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Nov 2023 18:37:50 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=DaEkRuTS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 2EEA18183F24; Tue, 21 Nov 2023 18:37:48 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229775AbjKVCgt (ORCPT + 99 others); Tue, 21 Nov 2023 21:36:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34494 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229464AbjKVCgs (ORCPT ); Tue, 21 Nov 2023 21:36:48 -0500 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 98242195; Tue, 21 Nov 2023 18:36:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700620604; x=1732156604; h=message-id:date:mime-version:cc:subject:to:references: from:in-reply-to:content-transfer-encoding; bh=XpeZeMNRKx6fjfNcJza/391hB0gfEObFGgA1WhB5A6k=; b=DaEkRuTSS80XQxpWLqayarWDCR1gJwvwKfQGR5aTwKLm01QDG3baPoID CWZ5zyU77eT5/DnZMCsyyez9DtKAAq+YyUJCOxEo2iheaUFW0CNfQkaI2 iAnujYSmWlgs3dpJfpkQQbQLr6FSs+mI22BN0Aq9kYskNx2y2Ov8ho/kg VUQnucWC1I6L/Bjg4tH6JB8xqa/0/twgpovdLMq/v9VBWhmUcGEustbv7 ky/jEk7JzJd4AjA4bFLzTbPI3B9Hx/tbagG5zR1+x42zBOSmcct9Ic2lK r1SlTJvuWiMKuioZJVnEI19i1AsTJFusr1NlQKnq2H+mYEfTIQJ58LTTs Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10901"; a="394804919" X-IronPort-AV: E=Sophos;i="6.04,217,1695711600"; d="scan'208";a="394804919" Received: from fmviesa002.fm.intel.com ([10.60.135.142]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Nov 2023 18:36:43 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.04,217,1695711600"; d="scan'208";a="8289961" Received: from allen-box.sh.intel.com (HELO [10.239.159.127]) ([10.239.159.127]) by fmviesa002.fm.intel.com with ESMTP; 21 Nov 2023 18:36:38 -0800 Message-ID: Date: Wed, 22 Nov 2023 10:32:18 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Cc: baolu.lu@linux.intel.com, "Liu, Yi L" , "joro@8bytes.org" , "alex.williamson@redhat.com" , "robin.murphy@arm.com" , "cohuck@redhat.com" , "eric.auger@redhat.com" , "nicolinc@nvidia.com" , "kvm@vger.kernel.org" , "mjrosato@linux.ibm.com" , "chao.p.peng@linux.intel.com" , "yi.y.sun@linux.intel.com" , "peterx@redhat.com" , "jasowang@redhat.com" , "shameerali.kolothum.thodi@huawei.com" , "lulu@redhat.com" , "suravee.suthikulpanit@amd.com" , "iommu@lists.linux.dev" , "linux-kernel@vger.kernel.org" , "linux-kselftest@vger.kernel.org" , "Duan, Zhenzhong" , "joao.m.martins@oracle.com" , "Zeng, Xin" , "Zhao, Yan Y" Subject: Re: [PATCH v7 1/3] iommufd: Add data structure for Intel VT-d stage-1 cache invalidation Content-Language: en-US To: Jason Gunthorpe , "Tian, Kevin" References: <20231117131816.24359-1-yi.l.liu@intel.com> <20231117131816.24359-2-yi.l.liu@intel.com> <20231120230451.GD6083@nvidia.com> <20231121121704.GE6083@nvidia.com> From: Baolu Lu In-Reply-To: <20231121121704.GE6083@nvidia.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Tue, 21 Nov 2023 18:37:48 -0800 (PST) On 11/21/23 8:17 PM, Jason Gunthorpe wrote: > On Tue, Nov 21, 2023 at 02:54:15AM +0000, Tian, Kevin wrote: >>> From: Jason Gunthorpe >>> Sent: Tuesday, November 21, 2023 7:05 AM >>> >>> On Mon, Nov 20, 2023 at 08:26:31AM +0000, Tian, Kevin wrote: >>>>> From: Liu, Yi L >>>>> Sent: Friday, November 17, 2023 9:18 PM >>>>> >>>>> This adds the data structure for flushing iotlb for the nested domain >>>>> allocated with IOMMU_HWPT_DATA_VTD_S1 type. >>>>> >>>>> This only supports invalidating IOTLB, but no for device-TLB as device-TLB >>>>> invalidation will be covered automatically in the IOTLB invalidation if the >>>>> underlying IOMMU driver has enabled ATS for the affected device. >>>> >>>> "no for device-TLB" is misleading. Here just say that cache invalidation >>>> request applies to both IOTLB and device TLB (if ATS is enabled ...) >>> >>> I think we should forward the ATS invalidation from the guest too? >>> That is what ARM and AMD will have to do, can we keep them all >>> consistent? >>> >>> I understand Intel keeps track of enough stuff to know what the RIDs >>> are, but is it necessary to make it different? >> >> probably ask the other way. Now intel-iommu driver always flushes >> iotlb and device tlb together then is it necessary to separate them >> in uAPI for no good (except doubled syscalls)? :) > > I wish I knew more about Intel CC design to be able to answer that :| > > Doesn't the VM issue the ATC flush command regardless? How does it > know it has a working ATC but does not need to flush it? > The Intel VT-d spec doesn't require the driver to flush iotlb and device tlb together. Therefore, the current approach of relying on caching mode to determine whether device TLB invalidation is necessary appears to be a performance optimization rather than an architectural requirement. The vIOMMU driver assumes that it is running within a VM guest when caching mode is enabled. This assumption leads to an omission of device TLB invalidation, relying on the hypervisor to perform a combined flush of the IOLB and device TLB. While this optimization aims to reduce VMEXIT overhead, it introduces potential issues: - When a Linux guest running on a hypervisor other than KVM/QEMU, the assumption of combined IOLB and device TLB flushing by the hypervisor may be incorrect, potentially leading to missed device TLB invalidation. - The caching mode doesn't apply to first-stage translation. Therefore, if the driver uses first-stage translation and still relies on caching mode to determine device TLB invalidation, the optimization fails. A more reasonable optimization would be to allocate a bit in the iommu capability registers. The vIOMMU driver could then leverage this bit to determine whether it could eliminate a device invalidation request. Best regards, baolu