Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp4884851yba; Wed, 8 May 2019 04:35:14 -0700 (PDT) X-Google-Smtp-Source: APXvYqzoq0aVfdvczTVHn46WXm5VTTvpTWbDbtCDHEjhpyhidsRQ9JyK/4CxrTi7BZ63N1zUhfb5 X-Received: by 2002:a63:208:: with SMTP id 8mr38041314pgc.14.1557315314243; Wed, 08 May 2019 04:35:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1557315314; cv=none; d=google.com; s=arc-20160816; b=VVo8L9ZXBZVR2lxuMdXGIsEJ3EoAszz4b1eXzWVnkgayS8UvjR8VGfoVpb/StL5F+V XjpBzD4/Y2tFRbzNU3sy0Em8A0C+YqN+TZocuuwZAvrnrSvWRtb81Eu0vJoS60IyQOCV V51uCvt9ylRhbSfM+ZQLc41/gOqtdKgmmEo2xSHcaYLgsfUSQm1Imgct8jsYffLGLVF8 q4vuwVlEYyZ1IkIPX7xx/+4D5v4Sjuvc3EbpvacA137tV6xIqNTVv3aiojRmhPqB6wD3 1J/ejSC83/QD3JpCc0eUMMtKwwLo6Dt2Sbeg1kaiPcCmu7vjxqekraqpum8+MTGrkZbd wEEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:reply-to:message-id:subject:cc:to:from:date; bh=IdUwXIO+T/CqdZWUFPvRuxPUjcpwEeaLwRJKe+dRPD4=; b=rWS8q3DrmvdaJ+tLZF7LW2qfMAaMw8q5sdj0rCn4m44aSknb7zF/AA414hfF9yOQG/ dNBOSDCb0OwqE+s1U/2BVFwuTf/74SXDvRRhb9fTnQyMlnEV9Oy7BWcaK8nNquuORm5S MWhIaB9dGsuxKIDNAxa/8bgZZ9Q7x0Ssga+3OF5az6QAS+VEgJmnJnaLd3VO8tZVT9mL +spRRoDopIX+UnJ+O5PXlS51p1tp5UiiT8lrOVaXYkMJN7QdXFzu3QiZW46tbB58wrUp xMorCRdKtGKxTXTGCIZD9OkoDSmneyFyCbLYw4V3cBsI/nKJry9rzx8RE+hAbfdjjjft RitA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x4si21782323pgr.41.2019.05.08.04.34.58; Wed, 08 May 2019 04:35:14 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728031AbfEHLdl (ORCPT + 99 others); Wed, 8 May 2019 07:33:41 -0400 Received: from mga04.intel.com ([192.55.52.120]:49637 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725778AbfEHLdj (ORCPT ); Wed, 8 May 2019 07:33:39 -0400 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 May 2019 04:33:23 -0700 X-ExtLoop1: 1 Received: from joy-optiplex-7040.sh.intel.com (HELO joy-OptiPlex-7040) ([10.239.13.9]) by orsmga007.jf.intel.com with ESMTP; 08 May 2019 04:33:17 -0700 Date: Wed, 8 May 2019 07:27:40 -0400 From: Yan Zhao To: Alex Williamson Cc: "intel-gvt-dev@lists.freedesktop.org" , "arei.gonglei@huawei.com" , "aik@ozlabs.ru" , "Zhengxiao.zx@alibaba-inc.com" , "shuangtai.tst@alibaba-inc.com" , "qemu-devel@nongnu.org" , "eauger@redhat.com" , "Liu, Yi L" , "Yang, Ziye" , "mlevitsk@redhat.com" , "pasic@linux.ibm.com" , "felipe@nutanix.com" , "Liu, Changpeng" , "Ken.Xue@amd.com" , "jonathan.davies@nutanix.com" , "He, Shaopeng" , "kvm@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "libvir-list@redhat.com" , "eskultet@redhat.com" , "dgilbert@redhat.com" , "cohuck@redhat.com" , "Tian, Kevin" , "zhenyuw@linux.intel.com" , "Wang, Zhi A" , "cjia@nvidia.com" , "kwankhede@nvidia.com" , "berrange@redhat.com" , "dinechin@redhat.com" Subject: Re: [PATCH v2 1/2] vfio/mdev: add version attribute for mdev device Message-ID: <20190508112740.GA24397@joy-OptiPlex-7040> Reply-To: Yan Zhao References: <20190506014514.3555-1-yan.y.zhao@intel.com> <20190506014904.3621-1-yan.y.zhao@intel.com> <20190507151826.502be009@x1.home> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20190507151826.502be009@x1.home> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 08, 2019 at 05:18:26AM +0800, Alex Williamson wrote: > On Sun, 5 May 2019 21:49:04 -0400 > Yan Zhao wrote: > > > version attribute is used to check two mdev devices' compatibility. > > > > The key point of this version attribute is that it's rw. > > User space has no need to understand internal of device version and no > > need to compare versions by itself. > > Compared to reading version strings from both two mdev devices being > > checked, user space only reads from one mdev device's version attribute. > > After getting its version string, user space writes this string into the > > other mdev device's version attribute. Vendor driver of mdev device > > whose version attribute being written will check device compatibility of > > the two mdev devices for user space and return success for compatibility > > or errno for incompatibility. > > So two readings of version attributes + checking in user space are now > > changed to one reading + one writing of version attributes + checking in > > vendor driver. > > Format and length of version strings are now private to vendor driver > > who can define them freely. > > > > __ user space > > /\ \ > > / \write > > / read \ > > ______/__ ___\|/___ > > | version | | version |-->check compatibility > > ----------- ----------- > > mdev device A mdev device B > > > > This version attribute is optional. If a mdev device does not provide > > with a version attribute, this mdev device is incompatible to all other > > mdev devices. > > > > Live migration is able to take advantage of this version attribute. > > Before user space actually starts live migration, it can first check > > whether two mdev devices are compatible. > > > > v2: > > 1. added detailed intent and usage > > 2. made definition of version string completely private to vendor driver > > (Alex Williamson) > > 3. abandoned changes to sample mdev drivers (Alex Williamson) > > 4. mandatory --> optional (Cornelia Huck) > > 5. added description for errno (Cornelia Huck) > > > > Cc: Alex Williamson > > Cc: Erik Skultety > > Cc: "Dr. David Alan Gilbert" > > Cc: Cornelia Huck > > Cc: "Tian, Kevin" > > Cc: Zhenyu Wang > > Cc: "Wang, Zhi A" > > Cc: Neo Jia > > Cc: Kirti Wankhede > > Cc: Daniel P. Berrang? > > Cc: Christophe de Dinechin > > > > Signed-off-by: Yan Zhao > > --- > > Documentation/vfio-mediated-device.txt | 140 +++++++++++++++++++++++++ > > 1 file changed, 140 insertions(+) > > > > diff --git a/Documentation/vfio-mediated-device.txt b/Documentation/vfio-mediated-device.txt > > index c3f69bcaf96e..013a764968eb 100644 > > --- a/Documentation/vfio-mediated-device.txt > > +++ b/Documentation/vfio-mediated-device.txt > > @@ -202,6 +202,7 @@ Directories and files under the sysfs for Each Physical Device > > | | |--- available_instances > > | | |--- device_api > > | | |--- description > > + | | |--- version > > | | |--- [devices] > > | |--- [] > > | | |--- create > > @@ -209,6 +210,7 @@ Directories and files under the sysfs for Each Physical Device > > | | |--- available_instances > > | | |--- device_api > > | | |--- description > > + | | |--- version > > | | |--- [devices] > > | |--- [] > > | |--- create > > @@ -216,6 +218,7 @@ Directories and files under the sysfs for Each Physical Device > > | |--- available_instances > > | |--- device_api > > | |--- description > > + | |--- version > > | |--- [devices] > > I thought there was a request to make this more specific to migration > by renaming it to something like migration_version. Also, as an > so this attribute may not only include a mdev device's parent device info and mdev type, but also include numeric software version of vendor specific migration code, right? This actually makes sense. So, do I need to add a disclaimer in this doc like: vendor driver should be responsible by itself for a mdev device's migration compatibility. During migration setup phase, general migration code in user space VFIO only checks this version of VFIO migration region, and will not check software version of vendor specific migration code. It is suggested to incorporate at least parent device info and software version of vendor specific migration code into this migration_version attribute. > optional attribute, it seems the example should perhaps not add it to > all types to illustrate that it is not required. ok. got it. > > > > * [mdev_supported_types] > > @@ -246,6 +249,143 @@ Directories and files under the sysfs for Each Physical Device > > This attribute should show the number of devices of type that can be > > created. > > > > +* version > > + > > + This attribute is rw, and is optional. > > + It is used to check device compatibility between two mdev devices and is > > between two mdev devices of the same type. > ok. got it. But I have a question about aggregation proposed earlier. Do we also have to assume the two mdev devices are of the same aggregation count? However, aggregation count is not available before a mdev device is created. :( > > + accessed in pairs between the two mdev devices being checked. > > "in pairs"? I meant, user space needs to access version attributes from two mdev device. but seems that it's needless to mention that... I'll remove it :) > > + The intent of this attribute is to make an mdev device's version opaque to > > + user space, so instead of reading two mdev devices' version strings and > > perhaps "...instead of reading the version string of two mdev devices > and comparing them in userspace..." yes, better, thanks:) > > + comparing in userspace, user space should only read one mdev device's version > > + attribute, and writes this version string into the other mdev device's version > > + attribute. Then vendor driver of mdev device whose version attribute being > > + written would check the incoming version string and tell user space whether > > + the two mdev devices are compatible via return value. That's why this > > + attribute is writable. > > + > > + when reading this attribute, it should show device version string of > > + the device of type . > > + > > + This string is private to vendor driver itself. Vendor driver is able to > > + freely define format and length of device version string. > > + e.g. It can use a combination of pciid of parent device + mdev type. > > Can the user assume the data contents of the string is ascii > characters? It's good that the vendor driver defines the format and > length, but the user probably needs some expectation bounding that > length. Should we define it as no larger than PATH_MAX (4096), or maybe > NAME_MAX (255) might be more reasonable? I think so. I'll add those restrictions in next revision. > > + > > + When writing a string to this attribute, vendor driver should analyze this > > + string and check whether the mdev device being identified by this string is > > + compatible with the mdev device for this attribute. vendor driver should then > > Compatible for what purpose? I think this is where specifically > calling this a migration_version potentially has value. yes. if it also covers version of vendor specific migration code, calling it migration_version is more appropriate. > > + return written string's length if it regards the two mdev devices are > > + compatible; vendor driver should return negative errno if it regards the two > > + mdev devices are not compatible. > > IOW, the write(2) will succeed if the version is determined to be > compatible and otherwise fail with vendor specific errno. > thanks:) > > + > > + User space should treat ANY of below conditions as two mdev devices not > > + compatible: > > (0) The mdev devices are not of the same type. > the same as above. do we also need to take aggregation count into consideration? > > + (1) any one of the two mdev devices does not have a version attribute > > + (2) error when read from one mdev device's version attribute > > Is this intended to support that the vendor driver can supply a version > attribute but not support migration? TBH, this sounds like a vendor > driver bug, but maybe it's necessary if the vendor driver could have > some types that support migration and others that do not? IOW, we're > supplying the same attribute groups to all devices from a vendor, in > which case my comment above regarding an example type without a version > attribute might be invalid. hmm, this is to make life easier for vendor driver to have some types that support migration and others that do not. while we can get rid of returning errno by providing different attribute groups to different devices, the way of returning errno gives a simpler choice to vendors. > > > + (3) error when write one mdev device's version string to the other mdev > > + device's version attribute > > + > > + User space should regard two mdev devices compatible when ALL of below > > + conditions are met: > > (0) The mdev devices are of the same type > > > + (1) success when read from one mdev device's version attribute. > > + (2) success when write one mdev device's version string to the other mdev > > + device's version attribute > > + > > + Errno: > > + If vendor driver wants to claim a mdev device incompatible to all other mdev > > + devices, it should not register version attribute for this mdev device. But if > > + a vendor driver has already registered version attribute and it wants to claim > > + a mdev device incompatible to all other mdev devices, it needs to return > > + -ENODEV on access to this mdev device's version attribute. > > + If a mdev device is only incompatible to certain mdev devices, write of > > + incompatible mdev devices's version strings to its version attribute should > > + return -EINVAL; > > I think it's best not to define the specific errno returned for a > specific situation, let the vendor driver decide, userspace simply > needs to know that an errno on read indicates the device does not > support migration version comparison and that an errno on write > indicates the devices are incompatible or the target doesn't support > migration versions. > yes, user space only gets 0 or 1 as return code, not those errno. maybe I only need to describe errno in patch 2/2. > > + > > + This attribute can be taken advantage of by live migration. > > + If user space detects two mdev devices are compatible through version > > + attribute, it can start migration between the two mdev devices, otherwise it > > + should abort its migration attempts between the two mdev devices. > > + > > + Example Usage: > > + case 1: > > + source side mdev device is of uuid 5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd, > > + its mdev type is i915-GVTg_V5_4. pci id of parent device is 8086-193b. > > + target side mdev device is if of uuid 882cc4da-dede-11e7-9180-078a62063ab1, > > + its mdev type is i915-GVTg_V5_4. pci id of parent device is 8086-193b. > > + > > + # readlink /sys/bus/pci/devices/0000\:00\:02.0/\ > > + 5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/mdev_type > > + ../mdev_supported_types/i915-GVTg_V5_4 > > + > > + # readlink /sys/bus/pci/devices/0000\:00\:02.0/\ > > + 882cc4da-dede-11e7-9180-078a62063ab1/mdev_type > > + ../mdev_supported_types/i915-GVTg_V5_4 > > + > > + (1) read source side mdev device's version. > > + #cat \ > > + /sys/bus/pci/devices/0000\:00\:02.0/5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/\ > > + mdev_type/version > > + 8086-193b-i915-GVTg_V5_4 > > Is this really the version information exposed in 2/2? This is opaque, > so of course you can add things later, but it seems short sighted not > to even append a version 0 tag to account for software compatibility > differences since the above only represents a parent and mdev type > based version. > yes, currently in 2/2, the version only includes + + . but you are right, it's better to include software migration version number. so vendor drivers have below 3 ways to designate a mdev device has no migration capability. 1. not registering migration_version attribute 2. on reading migration_version, returning errno 3. on reading migration_version, returning string indicating non-migratable. The reason of not giving up way 2 is that maybe it can accelerate user space getting information of device incompatible. if we only keep way 3, it would not know this info until writing this string to target attribute. do you agree? > > + (2) write source side mdev device's version string into target side mdev > > + device's version attribute. > > + # echo 8086-193b-i915-GVTg_V5_4 > > > + /sys/bus/pci/devices/0000\:00\:02.0/882cc4da-dede-11e7-9180-078a62063ab1/\ > > + mdev_type/version > > + # echo $? > > + 0 > > TBH, there's a lot of superfluous information in this example that can > be stripped out. For example: > > " > (1) Compare mdev types: > > The mdev type of an instantiated device can be read from the mdev_type > link within the device instance in sysfs, for example: > > # basename $(readlink -f /sys/bus/mdev/devices/$MDEV_UUID/mdev_type/) > > The mdev types available on a given host system can also be found > through /sys/class/mdev_bus, for example: > > # ls /sys/class/mdev_bus/*/mdev_supported_types/ > > Migration is only possible between devices of the same mdev type. > > (2) Retrieve the mdev source version: > > The migration version information can either be read from the mdev_type > link on an instantiated device: > > # cat /sys/bus/mdev/devices/$UUID1/mdev_type/version > > Or it can be read from the mdev type definition, for example: > > # cat /sys/class/mdev_bus/*/mdev_supported_types/$MDEV_TYPE/version > > If reading the source version generates an error, migration is not > possible. NB, there might be several parent devices for a given mdev > type on a host system, each may support or expose different versions. > Matching the specific mdev type to a parent may become important in > such configurations. > > (3) Test source version at target: > > Given a version as outlined above, its compatibility to an instantiated > device of the same mdev type can be tested as: > > # echo $VERSION > /sys/bus/mdev/devices/$UUID2/mdev_type/version > > If this write fails, the source and target versions are not compatible > or the target does not support migration. > > Compatibility can also be tested prior to target device creation using > the mdev type definition for a parent device with a previously found > matching mdev type, for example: > > # echo $VERSION > /sys/class/mdev_bus/$PARENT/mdev_supported_types/$MDEV_TYPE/version > > Again, an error writing the version indicates that an instance of this > mdev type would not support a migration from the provided version. > " > > In particular from the provided example, the specific UUIDs, mdev > types, parent information, and contents of the version attribute do not > contribute to illustrating the protocol. In fact, displaying the > contents of the version attribute may tempt users to do comparison on > their own, especially given how easy it is to decide the GVT-g version > string. > got it! great thanks! I'll update it to the next revision. > > > + > > + in this case, user space's write to target side mdev device's version > > + attribute returns success to indicate the two mdev devices are compatible. > > + > > + case 2: > > + source side mdev device is of uuid 5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd, > > + its mdev type is i915-GVTg_V5_4. pci id of parent device is 8086-193b. > > + target side mdev device is if of uuid 882cc4da-dede-11e7-9180-078a62063ab1, > > + its mdev type is i915-GVTg_V5_4. pci id of parent device is 8086-191b. > > + > > + # readlink /sys/bus/pci/devices/0000\:00\:02.0/\ > > + 5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/mdev_type > > + ../mdev_supported_types/i915-GVTg_V5_4 > > + > > + # readlink /sys/bus/pci/devices/0000\:00\:02.0/\ > > + 882cc4da-dede-11e7-9180-078a62063ab1/mdev_type > > + ../mdev_supported_types/i915-GVTg_V5_4 > > + > > + (1) read source side mdev device's version. > > + #cat \ > > + /sys/bus/pci/devices/0000\:00\:02.0/5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/\ > > + mdev_type/version > > + 8086-193b-i915-GVTg_V5_4 > > + > > + (2) write source side mdev device's version string into target side mdev > > + device's version attribute. > > + # echo 8086-193b-i915-GVTg_V5_4 > > > + /sys/bus/pci/devices/0000\:00\:02.0/882cc4da-dede-11e7-9180-078a62063ab1/\ > > + mdev_type/version > > + -bash: echo: write error: Invalid argument > > + > > + in this case, user space's write to target side mdev device's version > > + attribute returns error to indicate the two mdev devices are incompatible. > > + (incompatible because pci ids of the two mdev devices' parent devices are > > + different). > > + > > + case 3: > > + source side mdev device is of uuid 5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd, > > + its mdev type is i915-GVTg_V5_4. pci id of parent device is 8086-193b. > > + But vendor driver does not provide version attribute for this device. > > + > > + (1) read source side mdev device's version. > > + #cat \ > > + /sys/bus/pci/devices/0000\:00\:02.0/5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/\ > > + mdev_type/version > > + cat: '/sys/bus/pci/devices/0000:00:02.0/5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/\ > > + mdev_type/version': No such file or directory > > + > > + in this case, user space reads source side mdev device's version attribute > > + which does not exist however. user space regards the two mdev devices as not > > + compatible and will not start migration between the two mdev devices. > > + > > + > > This is far too long for description and examples, it's not this > complicated. Thanks, > got it. I'll follow your above example :) thanks Yan > > * [device] > > > > This directory contains links to the devices of type that have been >