Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp544273pxj; Thu, 3 Jun 2021 13:06:21 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzuJ9zkkCppbC5jbF7zoaHGkiD3dewNraPg5tguiAuZQDdooj0/9hY6qO68VW8ffQWAVRC8 X-Received: by 2002:a17:906:b7d7:: with SMTP id fy23mr953224ejb.49.1622750780997; Thu, 03 Jun 2021 13:06:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1622750780; cv=none; d=google.com; s=arc-20160816; b=qkL/g3DcrnFXiEWMnt5V4lVN5vDHZzMZ35hXA3W4UauPnn0jHOTWafO8Ja/Z9gBNO/ cG3vC3qsCKzZ0VUelLBibMt2bueJC9vcrqWM50p3LzfCaQP11YLdTXsNkGyaOtCGTCeA sVcCYeDGu5dKhTASjHLWLiWmpj5C6DILejLf7dAu4KsSy4PfoiaAhxaQRDp2fo7KEgm0 yqPHMMRlHBlLvRoozK/4wvzXaIwtWdOthQrJlsi8zoedieQeXfTvs3gprZc4s2wGy7c2 pjCzv+YB1N1hQ1ZaHNVrVsm5hc+uLinmq1x4dv0aeVcUTbkbyaZobhHrkw3UagEtff4a oMxQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=F6+LhEY/maKDsGguwuWfmgtdd1oJK4V5TtjaYZTDv20=; b=iZxB6cRFxq5o9wQ2ddgcdZyEz+KGtFiZfw3aEQ5P0FI4DYbFx/Ne+aN4UCJ8c7osn0 tqZUXmZtr1LXxOO5JZ8BxhTs+Y85y7h2ml7jRiVlqBUxHJkFlMslaGJxpmPVe4OKZAHa tl5R7z0zkVkHXQNJlo//pewUypFQCXOH404ksjYwgGL30NXuMzvqjw0/iCN57N1rDpb0 HfzzsHDp3TTmresbchCcFzirhjIFTOAtMYOVemlJuDv6VxNTClmkbXnJbCUWUsqhXPIZ t+NBsbveNyXQSO1EL18bh4C+hNsRS8LeGFpS5RZFTrlNb5uCTmY0GEb/TIdIyQ+H7nlq Or/g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Y0F0KdvM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n5si2742878edv.210.2021.06.03.13.05.57; Thu, 03 Jun 2021 13:06:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Y0F0KdvM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230145AbhFCUDy (ORCPT + 99 others); Thu, 3 Jun 2021 16:03:54 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:21610 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229617AbhFCUDy (ORCPT ); Thu, 3 Jun 2021 16:03:54 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1622750528; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=F6+LhEY/maKDsGguwuWfmgtdd1oJK4V5TtjaYZTDv20=; b=Y0F0KdvMnwGgWCrLfEyWC1IUUvHJXSdSH07C+VuIfMVmPSq7q7pO4CWRl+T2k9NXEGORvt 8UmCb3fZO0Y05uD6sZBn6FalNvJZItn1dqVgQRKNwSILYNrkcirnfo2ydqqZ7Usf3buhMY YawVsJmSg7Hsn3b4YWirWaKV2Cvt/Z8= Received: from mail-ot1-f70.google.com (mail-ot1-f70.google.com [209.85.210.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-48-Di-CJ6oTNXOwm5HolbqYrg-1; Thu, 03 Jun 2021 16:01:49 -0400 X-MC-Unique: Di-CJ6oTNXOwm5HolbqYrg-1 Received: by mail-ot1-f70.google.com with SMTP id 59-20020a9d0dc10000b02902a57e382ca1so3811318ots.7 for ; Thu, 03 Jun 2021 13:01:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=F6+LhEY/maKDsGguwuWfmgtdd1oJK4V5TtjaYZTDv20=; b=asjKaoPLoGercp+ekFSUe43goa/5lUxR5G3wQywtU3nG7b4+N6T1dSoTAT5Y2E/PH3 oOn2XauIdMajbavDH0TPp2KaPt98AfYY2ItdTPcGccchqn+0Ezq8Is4PdOZkTg9QKYby 0nFNX6UYEQ1gAz4mxEOF/LLTwuG0L5lszV17Qns6U4gGeGplhOqfO0OgyEkLVX1mmU3G I7Zm8KDe7KBeRzfQ5P5TA7pNn5/xpqO3FW+3gYWiggBpVVD2OETUwSDChSgORW6kC8Ba Olb27S+HKO6goFxitfzLOh4oxKIYC2SRRRZz7l9Qs6Bj8rJz0dQkOlKOiglxwZ2PHDO+ R9wQ== X-Gm-Message-State: AOAM531tY7fxM+lSgfQ0qBNnj2pvX2sdILr/MZaGnCWfH+4fCh6iOUdZ f5cy0QlkMWy5rr1fBFbJyap2Ls7SSUen7HgUF5AhEaVZYjL9xtZzMBZ+/3IBnD77ZqKXIZ8b1/e 5rM6zMCOVUWqEeGcr+8jXCMxa X-Received: by 2002:a05:6820:169:: with SMTP id k9mr788072ood.92.1622750508899; Thu, 03 Jun 2021 13:01:48 -0700 (PDT) X-Received: by 2002:a05:6820:169:: with SMTP id k9mr788045ood.92.1622750508679; Thu, 03 Jun 2021 13:01:48 -0700 (PDT) Received: from redhat.com ([198.99.80.109]) by smtp.gmail.com with ESMTPSA id p9sm891275otl.64.2021.06.03.13.01.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Jun 2021 13:01:48 -0700 (PDT) Date: Thu, 3 Jun 2021 14:01:46 -0600 From: Alex Williamson To: Jason Gunthorpe Cc: "Tian, Kevin" , Jean-Philippe Brucker , "Jiang, Dave" , "Raj, Ashok" , "kvm@vger.kernel.org" , Jonathan Corbet , Robin Murphy , LKML , "iommu@lists.linux-foundation.org" , David Gibson , Kirti Wankhede , David Woodhouse , Jason Wang Subject: Re: [RFC] /dev/ioasid uAPI proposal Message-ID: <20210603140146.5ce4f08a.alex.williamson@redhat.com> In-Reply-To: <20210603123401.GT1002214@nvidia.com> References: <20210602160140.GV1002214@nvidia.com> <20210602111117.026d4a26.alex.williamson@redhat.com> <20210602173510.GE1002214@nvidia.com> <20210602120111.5e5bcf93.alex.williamson@redhat.com> <20210602180925.GH1002214@nvidia.com> <20210602130053.615db578.alex.williamson@redhat.com> <20210602195404.GI1002214@nvidia.com> <20210602143734.72fb4fa4.alex.williamson@redhat.com> <20210602224536.GJ1002214@nvidia.com> <20210602205054.3505c9c3.alex.williamson@redhat.com> <20210603123401.GT1002214@nvidia.com> X-Mailer: Claws Mail 3.17.8 (GTK+ 2.24.33; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 3 Jun 2021 09:34:01 -0300 Jason Gunthorpe wrote: > On Wed, Jun 02, 2021 at 08:50:54PM -0600, Alex Williamson wrote: > > On Wed, 2 Jun 2021 19:45:36 -0300 > > Jason Gunthorpe wrote: > > > > > On Wed, Jun 02, 2021 at 02:37:34PM -0600, Alex Williamson wrote: > > > > > > > Right. I don't follow where you're jumping to relaying DMA_PTE_SNP > > > > from the guest page table... what page table? > > > > > > I see my confusion now, the phrasing in your earlier remark led me > > > think this was about allowing the no-snoop performance enhancement in > > > some restricted way. > > > > > > It is really about blocking no-snoop 100% of the time and then > > > disabling the dangerous wbinvd when the block is successful. > > > > > > Didn't closely read the kvm code :\ > > > > > > If it was about allowing the optimization then I'd expect the guest to > > > enable no-snoopable regions via it's vIOMMU and realize them to the > > > hypervisor and plumb the whole thing through. Hence my remark about > > > the guest page tables.. > > > > > > So really the test is just 'were we able to block it' ? > > > > Yup. Do we really still consider that there's some performance benefit > > to be had by enabling a device to use no-snoop? This seems largely a > > legacy thing. > > I've recently had some no-snoopy discussions lately.. The issue didn't > vanish, it is still expensive going through all that cache hardware. > > > > But Ok, back the /dev/ioasid. This answers a few lingering questions I > > > had.. > > > > > > 1) Mixing IOMMU_CAP_CACHE_COHERENCY and !IOMMU_CAP_CACHE_COHERENCY > > > domains. > > > > > > This doesn't actually matter. If you mix them together then kvm > > > will turn on wbinvd anyhow, so we don't need to use the DMA_PTE_SNP > > > anywhere in this VM. > > > > > > This if two IOMMU's are joined together into a single /dev/ioasid > > > then we can just make them both pretend to be > > > !IOMMU_CAP_CACHE_COHERENCY and both not set IOMMU_CACHE. > > > > Yes and no. Yes, if any domain is !IOMMU_CAP_CACHE_COHERENCY then we > > need to emulate wbinvd, but no we'll use IOMMU_CACHE any time it's > > available based on the per domain support available. That gives us the > > most consistent behavior, ie. we don't have VMs emulating wbinvd > > because they used to have a device attached where the domain required > > it and we can't atomically remap with new flags to perform the same as > > a VM that never had that device attached in the first place. > > I think we are saying the same thing.. Hrm? I think I'm saying the opposite of your "both not set IOMMU_CACHE". IOMMU_CACHE is the mapping flag that enables DMA_PTE_SNP. Maybe you're using IOMMU_CACHE as the state reported to KVM? > > > 2) How to fit this part of kvm in some new /dev/ioasid world > > > > > > What we want to do here is iterate over every ioasid associated > > > with the group fd that is passed into kvm. > > > > Yeah, we need some better names, binding a device to an ioasid (fd) but > > then attaching a device to an allocated ioasid (non-fd)... I assume > > you're talking about the latter ioasid. > > Fingers crossed on RFCv2.. Here I mean the IOASID object inside the > /dev/iommu FD. The vfio_device would have some kref handle to the > in-kernel representation of it. So we can interact with it.. > > > > Or perhaps more directly: an op attaching the vfio_device to the > > > kvm and having some simple helper > > > '(un)register ioasid with kvm (kvm, ioasid)' > > > that the vfio_device driver can call that just sorts this out. > > > > We could almost eliminate the device notion altogether here, use an > > ioasidfd_for_each_ioasid() but we really want a way to trigger on each > > change to the composition of the device set for the ioasid, which is > > why we currently do it on addition or removal of a group, where the > > group has a consistent set of IOMMU properties. > > That is another quite good option, just forget about trying to be > highly specific and feed in the /dev/ioasid FD and have kvm ask "does > anything in here not enforce snoop?" > > With something appropriate to track/block changing that answer. > > It doesn't solve the problem to connect kvm to AP and kvmgt though It does not, we'll probably need a vfio ioctl to gratuitously announce the KVM fd to each device. I think some devices might currently fail their open callback if that linkage isn't already available though, so it's not clear when that should happen, ie. it can't currently be a VFIO_DEVICE ioctl as getting the device fd requires an open, but this proposal requires some availability of the vfio device fd without any setup, so presumably that won't yet call the driver open callback. Maybe that's part of the attach phase now... I'm not sure, it's not clear when the vfio device uAPI starts being available in the process of setting up the ioasid. Thanks, Alex