Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753893AbdLDMhB (ORCPT ); Mon, 4 Dec 2017 07:37:01 -0500 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:36984 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751864AbdLDMgz (ORCPT ); Mon, 4 Dec 2017 07:36:55 -0500 Subject: Re: [PATCH] vfio/iommu_type1: report the IOMMU aperture info To: Alex Williamson Cc: cohuck@redhat.com, borntraeger@de.ibm.com, zyimin@linux.vnet.ibm.com, pasic@linux.vnet.ibm.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org References: <1512041678-4563-1-git-send-email-pmorel@linux.vnet.ibm.com> <20171130070844.7d6c5053@t450s.home> <20171130113001.5657b037@t450s.home> <20171201092225.36bc10b4@t450s.home> From: Pierre Morel Date: Mon, 4 Dec 2017 13:36:48 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: <20171201092225.36bc10b4@t450s.home> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 x-cbid: 17120412-0012-0000-0000-00000595276B X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17120412-0013-0000-0000-000019102114 Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-12-04_03:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1712040183 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6936 Lines: 180 On 01/12/2017 17:22, Alex Williamson wrote: > On Fri, 1 Dec 2017 10:38:07 +0100 > Pierre Morel wrote: > >> On 30/11/2017 19:30, Alex Williamson wrote: >>> On Thu, 30 Nov 2017 16:11:35 +0100 >>> Pierre Morel wrote: >>> >>>> On 30/11/2017 15:08, Alex Williamson wrote: >>>>> On Thu, 30 Nov 2017 12:34:38 +0100 >>>>> Pierre Morel wrote: >>>>> >>>>>> When userland VFIO defines a new IOMMU for a guest it may >>>>>> want to specify to the guest the physical limits of >>>>>> the underlying host IOMMU to avoid access to forbidden >>>>>> memory ranges. >>>>>> >>>>>> Currently, the vfio_iommu_type1 driver does not report this >>>>>> information to userland. >>>>>> >>>>>> Let's extend the vfio_iommu_type1_info structure reported >>>>>> by the ioctl VFIO_IOMMU_GET_INFO command to report the >>>>>> IOMMU limits as new uint64_t entries aperture_start and >>>>>> aperture_end. >>>>>> >>>>>> Let's also extend the flags bit map to add a flag specifying >>>>>> if this extension of the info structure is reported or not. >>>>>> >>>>>> Signed-off-by: Pierre Morel >>>>>> --- >>>>>> drivers/vfio/vfio_iommu_type1.c | 42 +++++++++++++++++++++++++++++++++++++++++ >>>>>> include/uapi/linux/vfio.h | 3 +++ >>>>>> 2 files changed, 45 insertions(+) >>>>>> >>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c >>>>>> index 8549cb1..7da5fe0 100644 >>>>>> --- a/drivers/vfio/vfio_iommu_type1.c >>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c >>>>>> @@ -1526,6 +1526,40 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu) >>>>>> return ret; >>>>>> } >>>>>> >>>>>> +/** >>>>>> + * vfio_get_aperture - report minimal aperture of a vfio_iommu >>>>>> + * @iommu: the current vfio_iommu >>>>>> + * @start: a pointer to the aperture start >>>>>> + * @end : a pointer to the aperture end >>>>>> + * >>>>>> + * This function iterate on the domains using the given vfio_iommu >>>>>> + * and restrict the aperture to the minimal aperture common >>>>>> + * to all domains sharing this vfio_iommu. >>>>>> + */ >>>>>> +static void vfio_get_aperture(struct vfio_iommu *iommu, uint64_t *start, >>>>>> + uint64_t *end) >>>>>> +{ >>>>>> + struct iommu_domain_geometry geometry; >>>>>> + struct vfio_domain *domain; >>>>>> + >>>>>> + *start = 0; >>>>>> + *end = U64_MAX; >>>>>> + >>>>>> + mutex_lock(&iommu->lock); >>>>>> + /* loop on all domains using this vfio_iommu */ >>>>>> + list_for_each_entry(domain, &iommu->domain_list, next) { >>>>>> + iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY, >>>>>> + &geometry); >>>>>> + if (geometry.force_aperture) { >>>>>> + if (geometry.aperture_start > *start) >>>>>> + *start = geometry.aperture_start; >>>>>> + if (geometry.aperture_end < *end) >>>>>> + *end = geometry.aperture_end; >>>>>> + } >>>>>> + } >>>>>> + mutex_unlock(&iommu->lock); >>>>>> +} >>>>>> + >>>>>> static long vfio_iommu_type1_ioctl(void *iommu_data, >>>>>> unsigned int cmd, unsigned long arg) >>>>>> { >>>>>> @@ -1560,6 +1594,14 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, >>>>>> >>>>>> info.iova_pgsizes = vfio_pgsize_bitmap(iommu); >>>>>> >>>>>> + minsz = min_t(size_t, info.argsz, sizeof(info)); >>>>>> + if (minsz >= offsetofend(struct vfio_iommu_type1_info, >>>>>> + aperture_end)) { >>>>>> + info.flags |= VFIO_IOMMU_INFO_APERTURE; >>>>>> + vfio_get_aperture(iommu, &info.aperture_start, >>>>>> + &info.aperture_end); >>>>>> + } >>>>>> + >>>>>> return copy_to_user((void __user *)arg, &info, minsz) ? >>>>>> -EFAULT : 0; >>>>>> >>>>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h >>>>>> index 0fb25fb..780d909 100644 >>>>>> --- a/include/uapi/linux/vfio.h >>>>>> +++ b/include/uapi/linux/vfio.h >>>>>> @@ -519,6 +519,9 @@ struct vfio_iommu_type1_info { >>>>>> __u32 flags; >>>>>> #define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */ >>>>>> __u64 iova_pgsizes; /* Bitmap of supported page sizes */ >>>>>> +#define VFIO_IOMMU_INFO_APERTURE (1 << 1) /* supported aperture info */ >>>>>> + __u64 aperture_start; /* start of DMA aperture */ >>>>>> + __u64 aperture_end; /* end of DMA aperture */ >>>>>> }; >>>>>> >>>>>> #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12) >>>>> >>>>> This only supports the most simple topology, even x86 cannot claim to >>>>> have a single contiguous aperture, it's typically bisected by an MSI >>>>> window. I think we need an API that supports one or more apertures >>>>> out of the box. Also as Eric indicates, a capability is probably the >>>>> better option for creating a flexible structure. Thanks, >>>>> >>>>> Alex >>>>> >>>> >>>> >>>> Yes, I understand that a capability here is a must, I will follow this way. >>>> >>>> For having multiple aperture and MSI protection, I understood it was >>>> done using windows and reserved regions. >>>> Can you point me to my error? >>> >>> See the thread from Huawei, I don't think that's a solved problem: >>> >>> https://lists.gnu.org/archive/html/qemu-arm/2017-11/msg00237.html >>> >>> If you want sysfs to be consumed separately by the user and fed into >>> new QEMU command line options for creating a VM layout, perhaps that's >>> sufficient, but I think the vfio api for the iommu should encompass >>> describing available ranges of mappable iova space without cobbling >>> together arbitrary info from sysfs. Thanks, >>> >>> Alex >>> >> >> Hi Alex, >> >> I resume to see if I understood you well: >> >> We may have physical IOMMUs with a more complex access that can not be >> specified by only defining the start and end of a read/write region. >> >> Windows can be used to reserve regions for the VM but it is not what we >> want. What we want is to know what the host can offer which is a mix of >> aperture and windows. >> >> To report this we can use capabilities in a positive way, describing >> what the host offers not what it can not provide. >> >> To achieve this we have to use two interfaces: >> - VFIO user interface with VFIO_IOMMU_GET_INFO and capabilities >> - Physical IOMMU interface with both geometry and window iommu_ops >> callbacks. >> >> If it is sufficiently near from what you thought I will provide a new >> version in this direction. > > I believe so. VFIO would construct a set of mappable iova > regions/windows using information provided via the IOMMU API via > iommu_ops and expose this via a new capability supporting multiple such > regions via the VFIO_IOMMU_GET_INFO ioctl. This ioctl would be > extended to support capabilities in the same way we've done so for > other vfio ioctls. Thanks, > > Alex > Hi Alex, Thanks, I go this way. Pierre -- Pierre Morel Linux/KVM/QEMU in Böblingen - Germany