Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp4389208imm; Wed, 30 May 2018 04:55:01 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJFrAVlipT7wKkHMf8OzBADI5IZHA9RJDaFngIol+f6tErVPJwLMVG7xOcrIT6w6kO+Hq6V X-Received: by 2002:a62:211c:: with SMTP id h28-v6mr2408393pfh.249.1527681301240; Wed, 30 May 2018 04:55:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527681301; cv=none; d=google.com; s=arc-20160816; b=sidKV8XmQcdqcyy/+e2k853iy82f6o9ZRin2twenNTDKijN27yXTqoKTCzzPOWu2g0 kbmhvy1/Rxf3IRuneZE9sqQWt3gM+rFb6Wo444KR9GI4ozIu02DhbTELsHy4SdnZFcpF 2fod4yIdRfFPYtjDb5RMrBdxw9+83SgG+ihhu+ROwUWUVs8oQ/1MuBrBvqS+VzZ5ihni D2QYjchyFEbhgR/hMwMGM/38UR2k6bIS/a16fARfVPGR24HPv5kBvF71YrdKDv/P2tzC BvMeLGso/rMZIrVuKuJ513e1kwgsqDEKwV7Duy10w5f0w0oehVsCVypCQNYKli/eqUZc 3ADA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language:user-agent:fcc :in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:arc-authentication-results; bh=IZ+zozQ8aJxCkZP5TNCzol0/U1ylmnOAvoFk4mxgtgY=; b=QnGRBbDwf3yaLdcz6/L2nARTUVlWJJ0koT3skVVrwWO42s0x7ALIAKbUSHvElfvo4N u6ugy/8ROELz1k++U+/PsAYZOcL0u3zSx+rzF6JyWbPd62B2Oz9TvLVXIAbolZbOJc2o poD1TxIrR7krI7k7H5mESKliGgV90pqVhJbn44L2VsrBp5K1VgItyPeXx2U+iBpYJ6CJ DXvhM++PI7yBX8vCMHhuURvwm/ddnOis7/agWVYwDlqE6qRGJSMShH2er5e3qQ+HuGNM AxCwiIjD09jy0H6CRBrzx/E/44k8ZMLo0GzZ7obLT9QHjKh5XMwvU4PGx9zyzaPyQAVc TFUA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o189-v6si34029447pfo.20.2018.05.30.04.54.46; Wed, 30 May 2018 04:55:01 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753046AbeE3LyL (ORCPT + 99 others); Wed, 30 May 2018 07:54:11 -0400 Received: from foss.arm.com ([217.140.101.70]:54772 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752772AbeE3LyG (ORCPT ); Wed, 30 May 2018 07:54:06 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0770D15BE; Wed, 30 May 2018 04:54:05 -0700 (PDT) Received: from ostrya.localdomain (ostrya.cambridge.arm.com [10.1.211.38]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 800A93F25D; Wed, 30 May 2018 04:54:02 -0700 (PDT) Date: Wed, 30 May 2018 12:53:53 +0100 From: Jean-Philippe Brucker To: "Tian, Kevin" , Alex Williamson Cc: Jacob Pan , "iommu@lists.linux-foundation.org" , LKML , Joerg Roedel , David Woodhouse , Greg Kroah-Hartman , "Wysocki, Rafael J" , "Liu, Yi L" , "Raj, Ashok" , Christoph Hellwig , Lu Baolu , Yi L , Auger Eric Subject: Re: [PATCH v4 04/22] iommu/vt-d: add bind_pasid_table function Message-ID: References: <1523915351-54415-1-git-send-email-jacob.jun.pan@linux.intel.com> <1523915351-54415-5-git-send-email-jacob.jun.pan@linux.intel.com> <20180417131047.0a9c310f@w520.home> <20180420164251.5245f822@jacob-builder> <20180529140915.1f174689@w520.home> <20180529211746.74f1dd23@w520.home> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: FCC: imap://jeabru01@foss.arm.com/Sent X-Identity-Key: id2 X-Account-Key: account3 X-Mozilla-Draft-Info: internal/draft; vcard=0; receipt=0; DSN=0; uuencode=0; attachmentreminder=0; deliveryformat=4 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 30/05/18 04:45, Tian, Kevin wrote: >>>>>> On SMMUv3 the minimum alignment for base_ptr is 64 bytes, so a >>>> guest >>>>>> under a vSMMU might pass a pointer that's not aligned on 4k. >>>>>> >>>>> PASID table pointer for VT-d is 4K aligned. >>>>>> Maybe this information could be part of the data passed to >> userspace >>>>>> about IOMMU table formats and features? They're not part of this >>>>>> series, but I think we wanted to communicate IOMMU-specific >> features >>>>>> via sysfs. >>>>>> >>>>> Agreed, I believe Yi Liu is working on a sysfs interface such that QEMU >>>>> can match IOMMU model and features. >>>> >>>> Digging this up again since v5 still has this issue. The IOMMU API is >>>> a kernel internal abstraction of the IOMMU. sysfs is a userspace >>>> interface. Are we suggesting that the /only/ way to make use of the >>>> internal IOMMU API here is to have a user provided opaque pasid table >>>> that we can't even do minimal compatibility sanity testing on and we >>>> simply hope that hardware covers all the fault conditions without >>>> taking the host down with it? I guess we have to assume the latter >>>> since the user has full control of the table, but I have a hard time >>>> getting past lack of internal ability to use the interface and no >>>> ability to provide even the slimmest sanity testing. Thanks, >>>> >>> >>> checking size, alignment, ... is OK, which I think is already considered >>> by vendor IOMMU driver. However sanity testing table format might >>> be difficult. The initial table provided by guest is likely just all ZEROs. >>> whatever format violation may be caught only when a PASID entry >>> is updated... >> >> There's sanity testing the actual contents of the table, which I agree >> would be difficult and would likely require some sort of shadowing at >> additional overhead, but what about even basic consistency checking? >> For example, is it possible that due to hardware variations a user >> might generate a table which works on some systems but not others? >> What >> if two table formats are sufficiently similar that the IOMMU driver >> puts an incompatible table in place but it continuously generates >> faults, how do we debug that? As an intermediary in this whole process >> I'd really rather be able to identify that the user claims to be >> providing a TypeA table but the IOMMU only supports TypeB, so clearly >> this won't work. I don't see that we have that capability. Thanks, > > I remember we ever discussed to define some vendor/model ID, > which can be retrieved by user space and then passed back when > doing table binding. Then above simple model matching check can > be done accordingly. It is actually a basic requirement when using > virtio-iommu, same driver expecting to work on all vendor IOMMUs. > > However I don't remember whether/where that logic is implemented > in this series (especially when there are two tracks moving in parallel). > I'll leave to Jacob/Jean to further comment. For Arm we do need some form of sanity checking. As each architecture version brings a new set of features that may be supported and enabled individually, we need to communicate fine-grained features to users. They describes the general capability of the physical IOMMU, and also which fields are available in the PASID table (entries are 512-bits and leave some space for future extensions). In the past I briefly tried using a ioctl-based interface through VFIO only, but it seemed more complicated to extend than sysfs for this kind of probing. Note that the following is from my own prototype. I'm not sure how much Yi Liu's implementation differs but I think this was roughly what we agreed on last time. In sysfs an IOMMU device is described with: * A model number, for example intel-vtd=1, arm-smmu-v3=2. * Properties and features, describing in detail what the pIOMMU device and driver support. /sys/class/iommu/// For example an SMMUv3: The model number is described as a property /sys/class/iommu/smmu.0x00000000e0600000/arm-smmu-v3/model = 2 A few feature bits and values: .../arm-smmu-v3/asid_bits // max address space ID bits, %d .../arm-smmu-v3/ssid_bits // max substream ID (PASID) bits, %d .../arm-smmu-v3/input_bits // max input address size, %d .../arm-smmu-v3/output_bits // max output address size, %d .../arm-smmu-v3/btm // broadcast TLB maintenance, enabled/disabled .../arm-smmu-v3/httu // Hardware table update, access+dirty/access/none .../arm-smmu-v3/stall // transaction stalling, enabled/disabled/force (Note that the base pointer alignment previously discussed could be implied by the model number, or added explicitly here.) Which page table formats are supported: .../arm-smmu-v3/pgtable_format/lpae-64 .../arm-smmu-v3/pgtable_format/v7s I'm not sure yet what values these will have, they might simply contain arbitrary format numbers because fields available in the page tables can be deduced from the above features bits. (Out of laziness, in my prototype I just describe a preferred format in a pgtable_format file) As you can imagine I'd rather not pass the fine details back to the kernel in bind_pasid_table. The list of features is growing, and describing them is a pain. It could be done for debugging purpose, but all we'd be achieving is telling the kernel that userspace has read the values, not that the guest intends to use them. The guest selects features by writing PASID table entries, which aren't read by the host. If the guest writes invalid values in the PASID table then yes, we have to rely on the hardware to contain the fault and not bring the host down with it. If the IOMMU cannot do that, then the driver really shouldn't implement bind_pasid_table... Otherwise, a fault while reading the PASID table can be injected into the guest as an unrecoverable fault (IOMMU_FAULT_REASON_PASID_INVALID or IOMMU_FAULT_REASON_PGD_FETCH in patch 10) or printed by the host when debugging. However I think the model number should be added to pasid_table_config. For one thing it gives us a simple sanity-check, but it also tells which other fields are valid in pasid_table_config. Arm-smmu-v3 needs at least two additional 8-bit fields describing the PASID table format (number of levels and PASID0 behaviour), which are written to device context tables when installing the PASID table pointer. Compatibility: new optional features are easy to add to a given model, just add a new sysfs file. If in the future, the host describes a new feature that is mandatory, or implements a different PASID table format, how does it ensure that user understands it? Perhaps use a new model number for this, e.g. "arm-smmu-v3-a=3", with similar features. I think it would be the same if the host stops supporting a feature for a given model, because they are ABI. But we can also define default values from the start, for example "if ssid_bits file isn't present, default value is 0 - PASID not supported" Thanks, Jean