Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp686494pxf; Thu, 1 Apr 2021 10:54:52 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzAkMKqCAB9INchHlzy1S6k6U1CAlcI6XoTxHCR/HuxmeQotnT4tt7gM6hTLxjeav1iKSzE X-Received: by 2002:a17:906:90d8:: with SMTP id v24mr10241958ejw.547.1617299691993; Thu, 01 Apr 2021 10:54:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1617299691; cv=none; d=google.com; s=arc-20160816; b=ytcfakWgLXUX1VAiC5U3lMeRCf28fl8DPL+CmdcrqakrR2zls26KvCc8bDy1qqIOj3 1ZqPl8q7Re6qJ4Z0WBvQI8CLtm22jaGOeRgRJNdLwUD6I0xNV8z4zDz4DEyn0lV6VT3T 7q4lGolTi18OGj9WNji/9C6KNadB7HDnLSC1BmfYyqHGOZk4ls3Pt6oD1VA5BHnjnXrB kwN9tijb7WLBxKSxwPHCNSXIOtQ3P55orBT6HHNxMx8sfYTXMMNxPWLYqtzsuKyTj785 CrZyzVb4SQiH1NOLHAYx0iP2xKDTQrujTlimIzmDDc54gBt0Rv2qybMg7Vi0kkGa5Ny2 TMjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=VYLRwOTaWYW2z/hdkbfC5Hb+/bml3aFiLVAb7iH5y5U=; b=dTUBokd2nfV3KgJ+4aiGDGN+qeLglvw9DU9AQEblQmtSbnKJNpdBe9v8+8+HqTSXle pQgVUprcxqQIGcwDE+n8sQR9qmMLnyZwCGz6RbFDxTjYhbeNqkt7yfQTVUdcO84O8sHT f/K3jTTMirEVNMbt/R/SRG+JxwcF5VwreyFQFfZpxFVFpfhaiqy3mcd7ehqUEQJ7gZ87 RKcgfqKGRzAxLWwJkW6jem3uOBQfdDWRkS6gCLvJ6Bdz692Xgm0WTDRilkEA1H+8fsNo xAORm5fF/bYzpX9umZ5bAnJLCLGcxs8GV5C7srEywtv4NfI6IlmpXySMMrgV4/kodPwf EgmQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="T5/EZgsN"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bc2si4611840edb.263.2021.04.01.10.54.29; Thu, 01 Apr 2021 10:54:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="T5/EZgsN"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236902AbhDARte (ORCPT + 99 others); Thu, 1 Apr 2021 13:49:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57144 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234745AbhDARjl (ORCPT ); Thu, 1 Apr 2021 13:39:41 -0400 Received: from mail-wm1-x332.google.com (mail-wm1-x332.google.com [IPv6:2a00:1450:4864:20::332]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2EF70C0613B6 for ; Thu, 1 Apr 2021 05:05:20 -0700 (PDT) Received: by mail-wm1-x332.google.com with SMTP id b2-20020a7bc2420000b029010be1081172so782158wmj.1 for ; Thu, 01 Apr 2021 05:05:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=VYLRwOTaWYW2z/hdkbfC5Hb+/bml3aFiLVAb7iH5y5U=; b=T5/EZgsNWYvWOQYgR1iqHUswyZyy/ySbnJl+K6AMxf53GW61QFhYWgQJGadYeU9aLg bPKC1laCu3/Ih0DCjVXgv1dGJ0uZTBGRg75RCrmrdwxatAAzAR/MajoHaiA8mNXuMiKb W98rgPel0JxE3vIjfE5Z/0Ud5sXoegvU55MZUrDfgbWuV+ST3HE9afJFEB49yIGk4vpi vIZAVgzScOJBo/yxQvP/GaRKKOto4df7lSqS6tRCj2KDrmxMGLF7Bj1KPg7H5JCjGWCf fbWE/CS2GIv8VcBwAmcWsOhOnxQU2234Q4saAAdEl9WUBzi2deDEOuHfMUoJfCvlVreb fq3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=VYLRwOTaWYW2z/hdkbfC5Hb+/bml3aFiLVAb7iH5y5U=; b=VHBLbxyJZuyLuTSplYKeYJ7Gxi2+o4rrcky+JhzyTZsJf1i4fmmOCHe1gjkxQS2h0c atkiRtaxVBLbwPuTnpOH8pV7A24/YJOfpZwgDdVP8RUtbLZEgDfopEgDemay6Z5PPMM6 1n7/dtwO62GlSwkI6XHUDb4nLEk/HoLVdPSkirlRyE3V5crvBAwqBRyiq7G5pCOxc0Br C+Qdj6/RuEK2VZtFWguTimBFjJdQ914H7yX+SUT2vqo1HArmO0lG11ow8OZv848TY4/I k1301BEi3u6px7TSu1pDUoyUrXlpyjDKNnIQQ3dQdmfrJJ5HxjvmOqUerb0B6NOSUHrM rl3A== X-Gm-Message-State: AOAM532PNTvsMSQIclJVkSFXlRoLmsTSzhYPzgNIdNAZmll3wJKlO9/y FbCyX0xK1PJVwnKvCOT0cM+vg3urjFAI1A== X-Received: by 2002:a7b:c38d:: with SMTP id s13mr7702836wmj.44.1617278718807; Thu, 01 Apr 2021 05:05:18 -0700 (PDT) Received: from myrica ([2001:1715:4e26:a7e0:116c:c27a:3e7f:5eaf]) by smtp.gmail.com with ESMTPSA id o5sm9470674wrx.60.2021.04.01.05.05.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Apr 2021 05:05:18 -0700 (PDT) Date: Thu, 1 Apr 2021 14:05:00 +0200 From: Jean-Philippe Brucker To: "Liu, Yi L" Cc: Jason Gunthorpe , "Tian, Kevin" , Jacob Pan , LKML , Joerg Roedel , Lu Baolu , David Woodhouse , "iommu@lists.linux-foundation.org" , "cgroups@vger.kernel.org" , Tejun Heo , Li Zefan , Johannes Weiner , Jean-Philippe Brucker , Alex Williamson , Eric Auger , Jonathan Corbet , "Raj, Ashok" , "Wu, Hao" , "Jiang, Dave" Subject: Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs Message-ID: References: <20210319112221.5123b984@jacob-builder> <20210322120300.GU2356281@nvidia.com> <20210324120528.24d82dbd@jacob-builder> <20210329163147.GG2356281@nvidia.com> <20210330132830.GO2356281@nvidia.com> <20210331124038.GE1463678@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 01, 2021 at 07:04:01AM +0000, Liu, Yi L wrote: > > - how about AMD and ARM's vSVA support? Their PASID allocation and page > > table > > happens within guest. They only need to bind the guest PASID table to > > host. In this case each VM has its own IOASID space, and the host IOASID allocator doesn't participate. Plus this only makes sense when assigning a whole VF to a guest, and VFIO is the tool for this. So I wouldn't shoehorn those ops into /dev/ioasid, though we do need a transport for invalidate commands. > > Above model seems unable to fit them. (Jean, Eric, Jacob please feel free > > to correct me) > > - this per-ioasid SVA operations is not aligned with the native SVA usage > > model. Native SVA bind is per-device. Bare-metal SVA doesn't need /dev/ioasid either. A program uses a device handle to either ask whether SVA is enabled, or to enable it explicitly. With or without /dev/ioasid, that step is required. OpenCL uses the first method - automatically enable "fine-grain system SVM" if available, and provide a flag to userspace. So userspace does not need to know about PASID. It's only one method for doing SVA (some GPUs are context-switching page tables instead). > After reading your reply in https://lore.kernel.org/linux-iommu/20210331123801.GD1463678@nvidia.com/#t > So you mean /dev/ioasid FD is per-VM instead of per-ioasid, so above skeleton > doesn't suit your idea. I draft below skeleton to see if our mind is the > same. But I still believe there is an open on how to fit ARM and AMD's > vSVA support in this the per-ioasid SVA operation model. thoughts? > > +-----------------------------+-----------------------------------------------+ > | userspace | kernel space | > +-----------------------------+-----------------------------------------------+ > | ioasid_fd = | /dev/ioasid does below: | > | open("/dev/ioasid", O_RDWR);| struct ioasid_fd_ctx { | > | | struct list_head ioasid_list; | > | | ... | > | | } ifd_ctx; // ifd_ctx is per ioasid_fd | > +-----------------------------+-----------------------------------------------+ > | ioctl(ioasid_fd, | /dev/ioasid does below: | > | ALLOC, &ioasid); | struct ioasid_data { | > | | ioasid_t ioasid; | > | | struct list_head device_list; | > | | struct list_head next; | > | | ... | > | | } id_data; // id_data is per ioasid | > | | | > | | list_add(&id_data.next, | > | | &ifd_ctx.ioasid_list); | > +-----------------------------+-----------------------------------------------+ > | ioctl(device_fd, | VFIO does below: | > | DEVICE_ALLOW_IOASID, | 1) get ioasid_fd, check if ioasid_fd is valid | > | ioasid_fd, | 2) check if ioasid is allocated from ioasid_fd| > | ioasid); | 3) register device/domain info to /dev/ioasid | > | | tracked in id_data.device_list | > | | 4) record the ioasid in VFIO's per-device | > | | ioasid list for future security check | > +-----------------------------+-----------------------------------------------+ > | ioctl(ioasid_fd, | /dev/ioasid does below: | > | BIND_PGTBL, | 1) find ioasid's id_data | > | pgtbl_data, | 2) loop the id_data.device_list and tell iommu| > | ioasid); | give ioasid access to the devices | > +-----------------------------+-----------------------------------------------+ > | ioctl(ioasid_fd, | /dev/ioasid does below: | > | UNBIND_PGTBL, | 1) find ioasid's id_data | > | ioasid); | 2) loop the id_data.device_list and tell iommu| > | | clear ioasid access to the devices | > +-----------------------------+-----------------------------------------------+ > | ioctl(device_fd, | VFIO does below: | > | DEVICE_DISALLOW_IOASID,| 1) check if ioasid is associated in VFIO's | > | ioasid_fd, | device ioasid list. | > | ioasid); | 2) unregister device/domain info from | > | | /dev/ioasid, clear in id_data.device_list | > +-----------------------------+-----------------------------------------------+ > | ioctl(ioasid_fd, | /dev/ioasid does below: | > | FREE, ioasid); | list_del(&id_data.next); | > +-----------------------------+-----------------------------------------------+ Also wondering about: * Querying IOMMU nesting capabilities before binding page tables (which page table formats are supported?). We were planning to have a VFIO cap, but I'm guessing we need to go back to the sysfs solution? * Invalidation, probably an ioasid_fd ioctl? * Page faults, page response. From and to devices, and don't necessarily have a PASID. But needed by vdpa as well, so that's also going through /dev/ioasid? Thanks, Jean