Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp810734pxj; Wed, 2 Jun 2021 12:02:33 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx1iiSl1rwZNl5gWtN8kaGkTraN/WiwWhISUteRcUYrbaYsLBQeACStAliEBHEuU93qvFZI X-Received: by 2002:aa7:cad4:: with SMTP id l20mr39598339edt.382.1622660552787; Wed, 02 Jun 2021 12:02:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1622660552; cv=none; d=google.com; s=arc-20160816; b=Y9rwqxHBMalucAp5Ryc7nHsZKZZOnKxmtVljLVPlsYcu++eZMv9joJg+QcDcDVqiwG r7kKDqdUSJ0fAznIcKol0Qc1K4LuhCQQSRhoUqz7CtbV2e994iCwC5M8n1aoRVURahnK A3mS7mU86wLoq+0FDexOcPbMrHmkinyf9xsmx0Eianx27mbrn7vSJJe+lmz5uHGoa3LQ xeUY6R8M43YAx+7ApPwKGD2VbgEHNjCkGUeW860KAqQe4z9gdDRxR5CloZCQmf4RONSh YS+jg0/6ZDcuW0m/EuzL4KVzKvXlRmBCTnCPzdtnPQ02OZ6Tr1q50X0WByGMKwq0yrtn eJhw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=MXz63wHFCtBChG1iHvZvXLJp7K/v2y7oMTVIZRydq2I=; b=BCeZRIuNVYze0qzQ7fBu5ObFALFfviu8rk9KSGzhO8pba5gwYS5xMwMh8ozdP5MANk VZu5evobv7kpKxcwdbFuOlrvYLgNac1ZU0jxZbXZaYedZDjDwHe3TtMqoMZRlCCr4sQT VCYMeujwcnmeznAGgOKitT+CdjFEqhEBcicWr2oAF8zOiNP74JKZONt8OLonJweUTxGL 4dUBMHQpuPteoEKaeHThkEpEI4ZsFhCqHZnANlS5FB6jLAv2M+2sJh9Xi5qNuEsbJdFA LUuWIde9HFjykZPv/EPfVtA4o98f2hYWqMAhHTeReI1yW5cx0MGIl55h9OenYpxTdXU7 sbtw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=b2X2uSLO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id zm10si592999ejb.67.2021.06.02.12.02.09; Wed, 02 Jun 2021 12:02:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=b2X2uSLO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229667AbhFBTCl (ORCPT + 99 others); Wed, 2 Jun 2021 15:02:41 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:54245 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229640AbhFBTCl (ORCPT ); Wed, 2 Jun 2021 15:02:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1622660457; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MXz63wHFCtBChG1iHvZvXLJp7K/v2y7oMTVIZRydq2I=; b=b2X2uSLOiD2nTJfjhPhwh6TPwda/5s3DBuavmbeqah5Z+vB5hs2pS0bqsUlF4UQd+styPE ZBvVWOMn0t7RD+qhvJqXukltV6MBimHPL+QMKaGI5L9K1+Akn/FVjHTSvX68/TvJZiWsFz ni+smoPMiJ0/tScx+PkOaF6QD2cYWU4= Received: from mail-oo1-f71.google.com (mail-oo1-f71.google.com [209.85.161.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-246-BWBfMCmaO9iV3qzsI2rMYw-1; Wed, 02 Jun 2021 15:00:56 -0400 X-MC-Unique: BWBfMCmaO9iV3qzsI2rMYw-1 Received: by mail-oo1-f71.google.com with SMTP id o2-20020a4ad4820000b0290208a2516d36so2008956oos.16 for ; Wed, 02 Jun 2021 12:00:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=MXz63wHFCtBChG1iHvZvXLJp7K/v2y7oMTVIZRydq2I=; b=UHJMgCiE5BF4JQIr7RTGxc0PVtUV3gZ7QfofbqieZzI4pfCrWE0CTZg/wZ3vxTQdlg OX0QdxthiZXr1Y8tCzFS1nzTBXm1dBvJOanS5zk1YZzvdZEFobvGKX1XVpCTUFtKkqPP mn/L3aNEiA5pb0gM44WQu/F5aW3aegJHKhMr9vbqFgZHMAiFOt+FN+CFUErH1UrhqMA4 YReEd+w0SSuS3X5IU7dWu5E6C9ESLHGdNLoOkEQu7aHZ+Dko4LehcSRfnQf9+3ajojeW PckpaUtrBxY+eP+lNYTiDcAGjqkgynAvu+1yAc0knNTkzkzwpImsqOyG32eunk5cJ0C7 ib2w== X-Gm-Message-State: AOAM531wMEq4kogCy14TRDJufa+H5Y1+U/m5dU53DsAjo7DTPK8ZYhPO PfknwYuIA76BywFRMKb8PYjun3L5PH8IAVzjMIaV5yk2nEKs07QVZHpzlwDicR65hb0zc2YIFjD OwHAGRQff9SiTdJe/oQjFrJZx X-Received: by 2002:a05:6830:15c2:: with SMTP id j2mr26008328otr.367.1622660455933; Wed, 02 Jun 2021 12:00:55 -0700 (PDT) X-Received: by 2002:a05:6830:15c2:: with SMTP id j2mr26008303otr.367.1622660455621; Wed, 02 Jun 2021 12:00:55 -0700 (PDT) Received: from redhat.com ([198.99.80.109]) by smtp.gmail.com with ESMTPSA id q5sm163159oia.31.2021.06.02.12.00.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 02 Jun 2021 12:00:54 -0700 (PDT) Date: Wed, 2 Jun 2021 13:00:53 -0600 From: Alex Williamson To: Jason Gunthorpe Cc: "Tian, Kevin" , Jean-Philippe Brucker , "Jiang, Dave" , "Raj, Ashok" , "kvm@vger.kernel.org" , Jonathan Corbet , Robin Murphy , LKML , "iommu@lists.linux-foundation.org" , David Gibson , Kirti Wankhede , David Woodhouse , Jason Wang Subject: Re: [RFC] /dev/ioasid uAPI proposal Message-ID: <20210602130053.615db578.alex.williamson@redhat.com> In-Reply-To: <20210602180925.GH1002214@nvidia.com> References: <20210528200311.GP1002214@nvidia.com> <20210601162225.259923bc.alex.williamson@redhat.com> <20210602160140.GV1002214@nvidia.com> <20210602111117.026d4a26.alex.williamson@redhat.com> <20210602173510.GE1002214@nvidia.com> <20210602120111.5e5bcf93.alex.williamson@redhat.com> <20210602180925.GH1002214@nvidia.com> X-Mailer: Claws Mail 3.17.8 (GTK+ 2.24.33; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2 Jun 2021 15:09:25 -0300 Jason Gunthorpe wrote: > On Wed, Jun 02, 2021 at 12:01:11PM -0600, Alex Williamson wrote: > > On Wed, 2 Jun 2021 14:35:10 -0300 > > Jason Gunthorpe wrote: > > > > > On Wed, Jun 02, 2021 at 11:11:17AM -0600, Alex Williamson wrote: > > > > > > > > > > present and be able to test if DMA for that device is cache > > > > > > > coherent. > > > > > > > > > > Why is this such a strong linkage to VFIO and not just a 'hey kvm > > > > > emulate wbinvd' flag from qemu? > > > > > > > > IIRC, wbinvd has host implications, a malicious user could tell KVM to > > > > emulate wbinvd then run the op in a loop and induce a disproportionate > > > > load on the system. We therefore wanted a way that it would only be > > > > enabled when required. > > > > > > I think the non-coherentness is vfio_device specific? eg a specific > > > device will decide if it is coherent or not? > > > > No, this is specifically whether DMA is cache coherent to the > > processor, ie. in the case of wbinvd whether the processor needs to > > invalidate its cache in order to see data from DMA. > > I'm confused. This is x86, all DMA is cache coherent unless the device > is doing something special. > > > > If yes I'd recast this to call kvm_arch_register_noncoherent_dma() > > > from the VFIO_GROUP_NOTIFY_SET_KVM in the struct vfio_device > > > implementation and not link it through the IOMMU. > > > > The IOMMU tells us if DMA is cache coherent, VFIO_DMA_CC_IOMMU maps to > > IOMMU_CAP_CACHE_COHERENCY for all domains within a container. > > And this special IOMMU mode is basically requested by the device > driver, right? Because if you use this mode you have to also use > special programming techniques. > > This smells like all the "snoop bypass" stuff from PCIE (for GPUs > even) in a different guise - it is device triggered, not platform > triggered behavior. Right, the device can generate the no-snoop transactions, but it's the IOMMU that essentially determines whether those transactions are actually still cache coherent, AIUI. I did experiment with virtually hardwiring the Enable No-Snoop bit in the Device Control Register to zero, which would be generically allowed by the PCIe spec, but then we get into subtle dependencies in the device drivers and clearing the bit again after any sort of reset and the backdoor accesses to config space which exist mostly in the class of devices that might use no-snoop transactions (yes, GPUs suck). It was much easier and more robust to ignore the device setting and rely on the IOMMU behavior. Yes, maybe we sometimes emulate wbinvd for VMs where the device doesn't support no-snoop, but it seemed like platforms were headed in this direction where no-snoop was ignored anyway. Thanks, Alex