Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp55259yba; Wed, 15 May 2019 18:50:16 -0700 (PDT) X-Google-Smtp-Source: APXvYqzkcCQzxhafRG5p724JgOuKbaq3RYAKYyBh0KdT+j+3ZcshTA1BCOonmqLGmfNK4vVp/frU X-Received: by 2002:a63:6e0b:: with SMTP id j11mr2570721pgc.291.1557971416085; Wed, 15 May 2019 18:50:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1557971416; cv=none; d=google.com; s=arc-20160816; b=lxwGXx6xEGM8RD65AWmQXdHwLbOS5rNxYcfie0cfIVF+zlUp2171m6rVoI1Vfi+/fH eHXn0RStgWtn3XOkwUGp7y0ujz+vprrqUy7iJtY96OPC2LoniuLVgHN0P1jDL0MvsNDG f4T9oV2PtHkchU6j+mo/v/gOEoZy0T7GxtkKTk2ImFTU+OHHJKFPmoAOU2aFOEViqTAc xOB/sanguqD57uarcbshOh5MHfimBUHjfRFqtJCBP2A0IGqBXlQMrkM9uy9IyW6reDmG Xc2oyRbq4VQKmqamNOpwHCnCX5Nre29Ypa0zTXDf6jCoH0udPfHGoWXBG07jmYwGei6S 7TCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:reply-to:message-id :subject:cc:to:from:date; bh=cTn+2wkqC/J2y708i3KpBznTGcOBvrtElQ81j+4RHfo=; b=LeeC6vhad4/kDdYLNMiJgAfWwpAwEQgUuUrJcgH+uQyJmEvj49/MZ6wWPd0/LkOFkN BSzlKiDXcC4eGRtrlM3SNWFk89AJ8HiQP3xLHqXEIfgnk1P8yPxJPXkjrOOa0pAZZJF7 NVx7yvgBVHN2PayW6xJnDGFKB7OWJbiGAzvJ9go5lmWoOnPjmQDfpekSP05Rwy25nJSM nuQkw8C6j+xQWsUU4ypnMz1v2Zet7Ix2JxPwITtZVmTZZZbhsuw/+5e64yEP7r99HU7l VGuw0Mfevtl2U9kPtJhRfvrMwvK4SUe0W2o3kzLo+cYPejbdNXTawJco/q2KirilvfKh RMRQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 59si3323451plp.62.2019.05.15.18.49.41; Wed, 15 May 2019 18:50:16 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727820AbfEPBrM (ORCPT + 99 others); Wed, 15 May 2019 21:47:12 -0400 Received: from mga07.intel.com ([134.134.136.100]:60977 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726908AbfEPBGe (ORCPT ); Wed, 15 May 2019 21:06:34 -0400 X-Amp-Result: UNSCANNABLE X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 15 May 2019 18:06:31 -0700 X-ExtLoop1: 1 Received: from joy-optiplex-7040.sh.intel.com (HELO joy-OptiPlex-7040) ([10.239.13.9]) by orsmga008.jf.intel.com with ESMTP; 15 May 2019 18:06:26 -0700 Date: Wed, 15 May 2019 21:00:46 -0400 From: Yan Zhao To: Alex Williamson Cc: Erik Skultety , "cjia@nvidia.com" , "kvm@vger.kernel.org" , "aik@ozlabs.ru" , "Zhengxiao.zx@alibaba-inc.com" , "shuangtai.tst@alibaba-inc.com" , "qemu-devel@nongnu.org" , "kwankhede@nvidia.com" , "eauger@redhat.com" , "Liu, Yi L" , "Yang, Ziye" , "mlevitsk@redhat.com" , "pasic@linux.ibm.com" , "libvir-list@redhat.com" , "arei.gonglei@huawei.com" , "felipe@nutanix.com" , "Ken.Xue@amd.com" , "Tian, Kevin" , "Dr. David Alan Gilbert" , "zhenyuw@linux.intel.com" , "jonathan.davies@nutanix.com" , "intel-gvt-dev@lists.freedesktop.org" , "Liu, Changpeng" , "berrange@redhat.com" , Cornelia Huck , "linux-kernel@vger.kernel.org" , "Wang, Zhi A" , "dinechin@redhat.com" , "He, Shaopeng" Subject: Re: [PATCH v2 1/2] vfio/mdev: add version attribute for mdev device Message-ID: <20190516010046.GA5535@joy-OptiPlex-7040> Reply-To: Yan Zhao References: <20190509164825.GG2868@work-vm> <20190510110838.2df4c4d0.cohuck@redhat.com> <20190510093608.GD2854@work-vm> <20190510114838.7e16c3d6.cohuck@redhat.com> <20190513132804.GD11139@beluga.usersys.redhat.com> <20190514061235.GC20407@joy-OptiPlex-7040> <20190514072039.GA2089@beluga.usersys.redhat.com> <20190514073219.GD20407@joy-OptiPlex-7040> <20190514074344.GB2089@beluga.usersys.redhat.com> <20190514090142.441a8a8c@x1.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190514090142.441a8a8c@x1.home> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 14, 2019 at 11:01:42PM +0800, Alex Williamson wrote: > On Tue, 14 May 2019 09:43:44 +0200 > Erik Skultety wrote: > > > On Tue, May 14, 2019 at 03:32:19AM -0400, Yan Zhao wrote: > > > On Tue, May 14, 2019 at 03:20:40PM +0800, Erik Skultety wrote: > > > > On Tue, May 14, 2019 at 02:12:35AM -0400, Yan Zhao wrote: > > > > > On Mon, May 13, 2019 at 09:28:04PM +0800, Erik Skultety wrote: > > > > > > On Fri, May 10, 2019 at 11:48:38AM +0200, Cornelia Huck wrote: > > > > > > > On Fri, 10 May 2019 10:36:09 +0100 > > > > > > > "Dr. David Alan Gilbert" wrote: > > > > > > > > > > > > > > > * Cornelia Huck (cohuck@redhat.com) wrote: > > > > > > > > > On Thu, 9 May 2019 17:48:26 +0100 > > > > > > > > > "Dr. David Alan Gilbert" wrote: > > > > > > > > > > > > > > > > > > > * Cornelia Huck (cohuck@redhat.com) wrote: > > > > > > > > > > > On Thu, 9 May 2019 16:48:57 +0100 > > > > > > > > > > > "Dr. David Alan Gilbert" wrote: > > > > > > > > > > > > > > > > > > > > > > > * Cornelia Huck (cohuck@redhat.com) wrote: > > > > > > > > > > > > > On Tue, 7 May 2019 15:18:26 -0600 > > > > > > > > > > > > > Alex Williamson wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, 5 May 2019 21:49:04 -0400 > > > > > > > > > > > > > > Yan Zhao wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > + Errno: > > > > > > > > > > > > > > > + If vendor driver wants to claim a mdev device incompatible to all other mdev > > > > > > > > > > > > > > > + devices, it should not register version attribute for this mdev device. But if > > > > > > > > > > > > > > > + a vendor driver has already registered version attribute and it wants to claim > > > > > > > > > > > > > > > + a mdev device incompatible to all other mdev devices, it needs to return > > > > > > > > > > > > > > > + -ENODEV on access to this mdev device's version attribute. > > > > > > > > > > > > > > > + If a mdev device is only incompatible to certain mdev devices, write of > > > > > > > > > > > > > > > + incompatible mdev devices's version strings to its version attribute should > > > > > > > > > > > > > > > + return -EINVAL; > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think it's best not to define the specific errno returned for a > > > > > > > > > > > > > > specific situation, let the vendor driver decide, userspace simply > > > > > > > > > > > > > > needs to know that an errno on read indicates the device does not > > > > > > > > > > > > > > support migration version comparison and that an errno on write > > > > > > > > > > > > > > indicates the devices are incompatible or the target doesn't support > > > > > > > > > > > > > > migration versions. > > > > > > > > > > > > > > > > > > > > > > > > > > I think I have to disagree here: It's probably valuable to have an > > > > > > > > > > > > > agreed error for 'cannot migrate at all' vs 'cannot migrate between > > > > > > > > > > > > > those two particular devices'. Userspace might want to do different > > > > > > > > > > > > > things (e.g. trying with different device pairs). > > > > > > > > > > > > > > > > > > > > > > > > Trying to stuff these things down an errno seems a bad idea; we can't > > > > > > > > > > > > get much information that way. > > > > > > > > > > > > > > > > > > > > > > So, what would be a reasonable approach? Userspace should first read > > > > > > > > > > > the version attributes on both devices (to find out whether migration > > > > > > > > > > > is supported at all), and only then figure out via writing whether they > > > > > > > > > > > are compatible? > > > > > > > > > > > > > > > > > > > > > > (Or just go ahead and try, if it does not care about the reason.) > > > > > > > > > > > > > > > > > > > > Well, I'm OK with something like writing to test whether it's > > > > > > > > > > compatible, it's just we need a better way of saying 'no'. > > > > > > > > > > I'm not sure if that involves reading back from somewhere after > > > > > > > > > > the write or what. > > > > > > > > > > > > > > > > > > Hm, so I basically see two ways of doing that: > > > > > > > > > - standardize on some error codes... problem: error codes can be hard > > > > > > > > > to fit to reasons > > > > > > > > > - make the error available in some attribute that can be read > > > > > > > > > > > > > > > > > > I'm not sure how we can serialize the readback with the last write, > > > > > > > > > though (this looks inherently racy). > > > > > > > > > > > > > > > > > > How important is detailed error reporting here? > > > > > > > > > > > > > > > > I think we need something, otherwise we're just going to get vague > > > > > > > > user reports of 'but my VM doesn't migrate'; I'd like the error to be > > > > > > > > good enough to point most users to something they can understand > > > > > > > > (e.g. wrong card family/too old a driver etc). > > > > > > > > > > > > > > Ok, that sounds like a reasonable point. Not that I have a better idea > > > > > > > how to achieve that, though... we could also log a more verbose error > > > > > > > message to the kernel log, but that's not necessarily where a user will > > > > > > > look first. > > > > > > > > > > > > In case of libvirt checking the compatibility, it won't matter how good the > > > > > > error message in the kernel log is and regardless of how many error states you > > > > > > want to handle, libvirt's only limited to errno here, since we're going to do > > > > > > plain read/write, so our internal error message returned to the user is only > > > > > > going to contain what the errno says - okay, of course we can (and we DO) > > > > > > provide libvirt specific string, further specifying the error but like I > > > > > > mentioned, depending on how many error cases we want to distinguish this may be > > > > > > hard for anyone to figure out solely on the error code, as apps will most > > > > > > probably not parse the > > > > > > logs. > > > > > > > > > > > > Regards, > > > > > > Erik > > > > > hi Erik > > > > > do you mean you are agreeing on defining common errors and only returning errno? > > > > > > > > In a sense, yes. While it is highly desirable to have logs with descriptive > > > > messages which will help in troubleshooting tremendously, I wanted to point out > > > > that spending time with error logs may not be that worthwhile especially since > > > > most apps (like libvirt) will solely rely on using read(3)/write(3) to sysfs. > > > > That means that we're limited by the errnos available, so apart from > > > > reporting the generic system message we can't any more magic in terms of the > > > > error messages, so the driver needs to assure that a proper message is > > > > propagated to the journal and at best libvirt can direct the user (consumer) to > > > > look through the system logs for more info. I also agree with the point > > > > mentioned above that defining a specific errno is IMO not the way to go, as > > > > these would be just too specific for the read(3)/write(3) use case. > > > > > > > > That said, from libvirt POV as a consumer, I'd expect there to be truly only 2 > > > > errors (I believe Alex has mentioned something similar in one of his responses > > > > in one of the threads): > > > > a) read error indicating that an mdev type doesn't support migration > > > > - I assume if one type doesn't support migration, none of the other > > > > types exposed on the parent device do, is that a fair assumption? > > I'd prefer not to make this assumption. Let's leave open the > possibility that (for whatever reason) a vendor may choose to support > migration on some types, but not others. > > > > > b) write error indicating that the mdev types are incompatible for > > > > migration > > > > > > > > Regards, > > > > Erik > > > Thanks for this explanation. > > > so, can we arrive at below agreements? > > > > > > 1. "not to define the specific errno returned for a specific situation, > > > let the vendor driver decide, userspace simply needs to know that an errno on > > > read indicates the device does not support migration version comparison and > > > that an errno on write indicates the devices are incompatible or the target > > > doesn't support migration versions. " > > > 2. vendor driver should log detailed error reasons in kernel log. > > > > That would be my take on this, yes, but I open to hear any other suggestions and > > ideas I couldn't think of as well. > > Kernel logging tends to be rather ineffective, it's surprisingly > difficult to get users to look in dmesg and it's not really a good > choice for scraping diagnostic information either. I'd probably leave > this to vendor driver's discretion at this point. Thanks, > > Alex got it. Thank you all! I'll follow it to prepare the next revision. Thanks Yan > _______________________________________________ > intel-gvt-dev mailing list > intel-gvt-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev