Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp4830690imm; Wed, 30 May 2018 12:54:44 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJElNeow+VTyTVlm5QCBU5v9n+VK9FpYlcR1kX72UegfmnXARBDS3E03NCmtM2AW/tj28ge X-Received: by 2002:a17:902:3303:: with SMTP id a3-v6mr4109213plc.209.1527710084294; Wed, 30 May 2018 12:54:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527710084; cv=none; d=google.com; s=arc-20160816; b=i5mRMgXUQTdx8zqDyw2ik5C1vVuuEAQiM51S0ARzCWp6iOf55wQ/tu/AK2b76RREFn FU638A97cZxBf1ZZV3UEEzrq7GyjsrWVlD8e5Reik4iOkQWwkGt0ND2/8MTf1ejoN1q+ QFTPWlsVne5f1mNbGTRnoVNPDS0l8BKoqI7Blko7gpIhkApTvfekEOazu4XN1pNvrHoi YyH2hIULEd8Dye9S5CZRU5oh49xZ/2Ha2iqpw9leSmdjnAvFuV5HIIx+qCM/bMJHjotS SfAKYNpThFDrYjAyI4gkJ6v7SU3sgTbX9oK194D15mwG1plFl8T4tYD4EaJDCPzf/AWp t/mQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date:arc-authentication-results; bh=N/I+oAfSkRwMtndDWlRJ0OS6mAZ+X17o4+Zgp0EQnI0=; b=tC7b2AGHJTIEj/MFZ5iRN4/jIKUmdxGnPVKuJQcI7R3MLTpfz2J9rPcRnZfUEcZL+2 A7By/8qdSuYz4BTojuB3ybreNPbFhTmfjAVyR/79RpnG62chAnEVJDhe4VFQACQUo831 GZMjIgxgJMQK/txISuoiaHMYqzuFI7f/z2cJJXmP7DTz/yZ1iaJnRiMgd5EapYAT5v6q EP9kTJFB3tpIXM0wsPFHvV9f9N4xJ635T4Y+2Qot4wl364F7ORIm4a8fNVn+70XBkW9P F5b/DO8hdqo7CghTEzLn2cfBV/Oh3dWhUaqVbjAL7um2qUOO26xiiaL6TnzESk5K1SJD ZQig== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f3-v6si34855002pld.513.2018.05.30.12.54.30; Wed, 30 May 2018 12:54:44 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932253AbeE3TwQ (ORCPT + 99 others); Wed, 30 May 2018 15:52:16 -0400 Received: from mga01.intel.com ([192.55.52.88]:53125 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932250AbeE3Ttw (ORCPT ); Wed, 30 May 2018 15:49:52 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 30 May 2018 12:49:44 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,461,1520924400"; d="scan'208";a="62981339" Received: from jacob-builder.jf.intel.com (HELO jacob-builder) ([10.7.199.155]) by orsmga002.jf.intel.com with ESMTP; 30 May 2018 12:49:44 -0700 Date: Wed, 30 May 2018 12:52:40 -0700 From: Jacob Pan To: Jean-Philippe Brucker Cc: "Tian, Kevin" , Alex Williamson , "iommu@lists.linux-foundation.org" , LKML , Joerg Roedel , David Woodhouse , Greg Kroah-Hartman , "Wysocki, Rafael J" , "Liu, Yi L" , "Raj, Ashok" , Christoph Hellwig , Lu Baolu , Yi L , Auger Eric , jacob.jun.pan@linux.intel.com Subject: Re: [PATCH v4 04/22] iommu/vt-d: add bind_pasid_table function Message-ID: <20180530125240.34e0e80c@jacob-builder> In-Reply-To: References: <1523915351-54415-1-git-send-email-jacob.jun.pan@linux.intel.com> <1523915351-54415-5-git-send-email-jacob.jun.pan@linux.intel.com> <20180417131047.0a9c310f@w520.home> <20180420164251.5245f822@jacob-builder> <20180529140915.1f174689@w520.home> <20180529211746.74f1dd23@w520.home> Organization: OTC X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.30; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 30 May 2018 12:53:53 +0100 Jean-Philippe Brucker wrote: > On 30/05/18 04:45, Tian, Kevin wrote: > >>>>>> On SMMUv3 the minimum alignment for base_ptr is 64 bytes, so > >>>>>> a > >>>> guest > >>>>>> under a vSMMU might pass a pointer that's not aligned on 4k. > >>>>>> > >>>>> PASID table pointer for VT-d is 4K aligned. > >>>>>> Maybe this information could be part of the data passed to > >> userspace > >>>>>> about IOMMU table formats and features? They're not part of > >>>>>> this series, but I think we wanted to communicate > >>>>>> IOMMU-specific > >> features > >>>>>> via sysfs. > >>>>>> > >>>>> Agreed, I believe Yi Liu is working on a sysfs interface such > >>>>> that QEMU can match IOMMU model and features. > >>>> > >>>> Digging this up again since v5 still has this issue. The IOMMU > >>>> API is a kernel internal abstraction of the IOMMU. sysfs is a > >>>> userspace interface. Are we suggesting that the /only/ way to > >>>> make use of the internal IOMMU API here is to have a user > >>>> provided opaque pasid table that we can't even do minimal > >>>> compatibility sanity testing on and we simply hope that hardware > >>>> covers all the fault conditions without taking the host down > >>>> with it? I guess we have to assume the latter since the user > >>>> has full control of the table, but I have a hard time getting > >>>> past lack of internal ability to use the interface and no > >>>> ability to provide even the slimmest sanity testing. Thanks, > >>> > >>> checking size, alignment, ... is OK, which I think is already > >>> considered by vendor IOMMU driver. However sanity testing table > >>> format might be difficult. The initial table provided by guest is > >>> likely just all ZEROs. whatever format violation may be caught > >>> only when a PASID entry is updated... > >> > >> There's sanity testing the actual contents of the table, which I > >> agree would be difficult and would likely require some sort of > >> shadowing at additional overhead, but what about even basic > >> consistency checking? For example, is it possible that due to > >> hardware variations a user might generate a table which works on > >> some systems but not others? What > >> if two table formats are sufficiently similar that the IOMMU driver > >> puts an incompatible table in place but it continuously generates > >> faults, how do we debug that? As an intermediary in this whole > >> process I'd really rather be able to identify that the user claims > >> to be providing a TypeA table but the IOMMU only supports TypeB, > >> so clearly this won't work. I don't see that we have that > >> capability. Thanks, > > > > I remember we ever discussed to define some vendor/model ID, > > which can be retrieved by user space and then passed back when > > doing table binding. Then above simple model matching check can > > be done accordingly. It is actually a basic requirement when using > > virtio-iommu, same driver expecting to work on all vendor IOMMUs. > > > > However I don't remember whether/where that logic is implemented > > in this series (especially when there are two tracks moving in > > parallel). I'll leave to Jacob/Jean to further comment. > > For Arm we do need some form of sanity checking. As each architecture > version brings a new set of features that may be supported and enabled > individually, we need to communicate fine-grained features to users. > They describes the general capability of the physical IOMMU, and also > which fields are available in the PASID table (entries are 512-bits > and leave some space for future extensions). > > In the past I briefly tried using a ioctl-based interface through VFIO > only, but it seemed more complicated to extend than sysfs for this > kind of probing. > > Note that the following is from my own prototype. I'm not sure how > much Yi Liu's implementation differs but I think this was roughly > what we agreed on last time. In sysfs an IOMMU device is described > with: > > * A model number, for example intel-vtd=1, arm-smmu-v3=2. > * Properties and features, describing in detail what the pIOMMU device > and driver support. > > /sys/class/iommu/// > > For example an SMMUv3: > > The model number is described as a property > /sys/class/iommu/smmu.0x00000000e0600000/arm-smmu-v3/model = 2 > > A few feature bits and values: > .../arm-smmu-v3/asid_bits // max address space ID bits, %d > .../arm-smmu-v3/ssid_bits // max substream ID (PASID) bits, %d > .../arm-smmu-v3/input_bits // max input address size, %d > .../arm-smmu-v3/output_bits // max output address size, %d > .../arm-smmu-v3/btm // broadcast TLB maintenance, > enabled/disabled .../arm-smmu-v3/httu // Hardware > table update, > access+dirty/access/none .../arm-smmu-v3/stall // > transaction stalling, enabled/disabled/force > > (Note that the base pointer alignment previously discussed could be > implied by the model number, or added explicitly here.) > > Which page table formats are supported: > .../arm-smmu-v3/pgtable_format/lpae-64 > .../arm-smmu-v3/pgtable_format/v7s > I'm not sure yet what values these will have, they might simply > contain arbitrary format numbers because fields available in the page > tables can be deduced from the above features bits. (Out of laziness, > in my prototype I just describe a preferred format in a > pgtable_format file) > > As you can imagine I'd rather not pass the fine details back to the > kernel in bind_pasid_table. The list of features is growing, and > describing them is a pain. It could be done for debugging purpose, but > all we'd be achieving is telling the kernel that userspace has read > the values, not that the guest intends to use them. The guest selects > features by writing PASID table entries, which aren't read by the > host. > > If the guest writes invalid values in the PASID table then yes, we > have to rely on the hardware to contain the fault and not bring the > host down with it. If the IOMMU cannot do that, then the driver > really shouldn't implement bind_pasid_table... Otherwise, a fault > while reading the PASID table can be injected into the guest as an > unrecoverable fault (IOMMU_FAULT_REASON_PASID_INVALID or > IOMMU_FAULT_REASON_PGD_FETCH in patch 10) or printed by the host when > debugging. > > However I think the model number should be added to > pasid_table_config. For one thing it gives us a simple sanity-check, > but it also tells which other fields are valid in pasid_table_config. > Arm-smmu-v3 needs at least two additional 8-bit fields describing the > PASID table format (number of levels and PASID0 behaviour), which are > written to device context tables when installing the PASID table > pointer. > We had model number field in v2 of this patchset. My thought was that since the config info is meant to be generic, we shouldn't include model info. But I also think a simple sanity check can be useful, would that be sufficient to address Alex's concern? Of course we still need sysfs for more specific IOMMU features. Would this work? enum pasid_table_model { PASID_TABLE_FORMAT_HOST, PASID_TABLE_FORMAT_ARM_1LVL, PASID_TABLE_FORMAT_ARM_2LVL, PASID_TABLE_FORMAT_AMD, PASID_TABLE_FORMAT_INTEL, }; /** * PASID table data used to bind guest PASID table to the host IOMMU. This will * enable guest managed first level page tables. * @version: for future extensions and identification of the data format * @bytes: size of this structure * @model: PASID table format for different IOMMU models * @base_ptr: PASID table pointer * @pasid_bits: number of bits supported in the guest PASID table, must be less * or equal than the host supported PASID size. */ struct pasid_table_config { __u32 version; #define PASID_TABLE_CFG_VERSION_1 1 __u32 bytes; enum pasid_table_model model; __u64 base_ptr; __u8 pasid_bits; }; > Compatibility: new optional features are easy to add to a given model, > just add a new sysfs file. If in the future, the host describes a new > feature that is mandatory, or implements a different PASID table > format, how does it ensure that user understands it? Perhaps use a > new model number for this, e.g. "arm-smmu-v3-a=3", with similar > features. I think it would be the same if the host stops supporting a > feature for a given model, because they are ABI. But we can also > define default values from the start, for example "if ssid_bits file > isn't present, default value is 0 - PASID not supported" > > Thanks, > Jean [Jacob Pan]