Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp94831pxv; Wed, 7 Jul 2021 21:21:26 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyzmRnkU93wzXtzgltiG21uzSMLF0c/A9HpNujoTIGNm6OTwuLLU9owJBEc85eODQcNtZut X-Received: by 2002:a17:906:57d8:: with SMTP id u24mr12950265ejr.252.1625718086335; Wed, 07 Jul 2021 21:21:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1625718086; cv=none; d=google.com; s=arc-20160816; b=GjLUn3oJccd7PLwLHDzH6z7e+/7vOaZZFJh7nx/QwzVzXssUCFXG0dE1YxdpQ38UNd u+AhT6D2Oy4SFy53Ze7xk7tB6dLcE6ML4kRudRcYRE0buqG0WLuE9gfwX1Q2uF09710J jhw3ltHO+9AeBNnFUGdXvrHxprhEKRxfOPe+1JlSjVW5/odbAnpY4rA5divGHuahJF35 ULjkYXq0oWdPhXDEMJDgi8UQMJH01qLVbkDT77iQYg4GGn5lFvJhQ6cJjMG62FS1OoWc skwSPvVdEqu2pnNBqVAYlK6JcGazLp8zVbNlMU3aDZaxIqEYOoVA6/xHNVvfJDGgD8nX llcA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-language:content-transfer-encoding :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=DNu1zHrcXhgpVSSnQcgsiGwDGO6YEQsjo75QjE9guwc=; b=A7FiSV+Edv4Ql+ahDJX8W1W/9PCahq1Vw6u3MDkW/plyevbZhwSb1FMeEOyu6t8o0o SVUN8QSr7N9K1Baq55cam8rP1nnbooz12wVG23D23SsbmQbT6Gb/JTA/By/qOoLOzi8c EvrW7M4cXTttjRtDNk9tWS3ctzWbsaawkJjaUo4AGGC0zmIt40Vra1kdxzvv42GEJ1zP k2tLX8vZ55CP4F9O7pJWbc8AEKiIBIBNpmizKe6DSpRe1BSaXJbqFtvUdLgrdE7AXPQE pKIGwC6LrK2qC9G7nN9xB3UTTWNoa/9iWF1w/R25KJr1P6w8SqiDzupK/iQzSq9Hc1tT 2FKA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="fYyH6w/e"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id jg29si1562327ejc.508.2021.07.07.21.21.03; Wed, 07 Jul 2021 21:21:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="fYyH6w/e"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229619AbhGHEUy (ORCPT + 99 others); Thu, 8 Jul 2021 00:20:54 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:24137 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229541AbhGHEUw (ORCPT ); Thu, 8 Jul 2021 00:20:52 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1625717890; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DNu1zHrcXhgpVSSnQcgsiGwDGO6YEQsjo75QjE9guwc=; b=fYyH6w/eIBwqjX1SKGqb/LccQle0nQ76DZy2PYdekPwT38XXK39moNSFiBrVBOOVbE+J+H K83AH5gqsHE2DqTPPtw/Xv8Qi0PZnp/1dZ5SgzdL4KL/grJPW5z8BaKU77FAKrLPozCPUy hsfK+R5SflQNIJuUo+/QNpozqj9h2sg= Received: from mail-pj1-f71.google.com (mail-pj1-f71.google.com [209.85.216.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-489-8KeP5ZDlN5meU7vAyhNQhw-1; Thu, 08 Jul 2021 00:18:09 -0400 X-MC-Unique: 8KeP5ZDlN5meU7vAyhNQhw-1 Received: by mail-pj1-f71.google.com with SMTP id ls11-20020a17090b350bb0290172c224979cso3869442pjb.0 for ; Wed, 07 Jul 2021 21:18:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=DNu1zHrcXhgpVSSnQcgsiGwDGO6YEQsjo75QjE9guwc=; b=Pg/JEO2VGzdlbwzAMmiWOq8hVgxoHn9MBrDaK5qNUpePDBcVS1GHRMkAa7sSKS2wYs rVi8ovOJWh6clliLP0wkV4dxvjY+vP/tOocD1Jbsni5xCMpZqy387v/QmWOFHKFlaQRC me4PC5NW/NGm7SWc/R1EsnI1Ioy823wSG6yvS4T8cg1jLcUjEXfKJEsZPZwUbVJ9Pfpa 9TsRVbkGf2o4tE42zMc0Poaex8r0BRPTknIqLSoGmkTGSIr8AnBHfRBGEEHC8hDzki8p QIrbK0i4n+LR6t6jp0ks86Nwh00IcO+50E6yjPdivJ0SVIL5v1zBXjUZCDV+znt5SSTq e4pw== X-Gm-Message-State: AOAM532wZswoqxacJIEVUDC4vGxaI9jSmHtY/zLYMVFNFg4H/wcSFq/S 0nHx1fBqMoLRL0GCis1hJ7aYMspeUuY5MfBMJOPmGZWbE7RqTnan5e2D8KqZiJzVNX+bRtrypnQ AFiI8ibVYl/VP7Za6feTbYA/lVRpfBKvqHmKa1mBwUjMG5SvCd9x9Ppgrrz6sivmCb8ORm9+CV6 5H X-Received: by 2002:a17:902:aa86:b029:116:3e3a:2051 with SMTP id d6-20020a170902aa86b02901163e3a2051mr24239015plr.38.1625717888648; Wed, 07 Jul 2021 21:18:08 -0700 (PDT) X-Received: by 2002:a17:902:aa86:b029:116:3e3a:2051 with SMTP id d6-20020a170902aa86b02901163e3a2051mr24238978plr.38.1625717888336; Wed, 07 Jul 2021 21:18:08 -0700 (PDT) Received: from wangxiaodeMacBook-Air.local ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id c64sm792630pfb.166.2021.07.07.21.18.01 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 07 Jul 2021 21:18:07 -0700 (PDT) Subject: Re: [PATCH v8 10/10] Documentation: Add documentation for VDUSE To: Stefan Hajnoczi Cc: Xie Yongji , "Michael S. Tsirkin" , Stefano Garzarella , Parav Pandit , Christoph Hellwig , Christian Brauner , Randy Dunlap , Matthew Wilcox , Al Viro , Jens Axboe , "bcrl@kvack.org" , Jonathan Corbet , =?UTF-8?Q?Mika_Penttil=c3=a4?= , Dan Carpenter , joro@8bytes.org, gregkh@linuxfoundation.org, "songmuchun@bytedance.com" , virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org, "iommu@lists.linux-foundation.org" , linux-kernel References: <8320d26d-6637-85c6-8773-49553dfa502d@redhat.com> <5b5107fa-3b32-8a3b-720d-eee6b2a84ace@redhat.com> <100e6788-7fdf-1505-d69c-bc28a8bc7a78@redhat.com> From: Jason Wang Message-ID: Date: Thu, 8 Jul 2021 12:17:56 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 在 2021/7/7 下午11:54, Stefan Hajnoczi 写道: > On Wed, Jul 07, 2021 at 05:24:08PM +0800, Jason Wang wrote: >> 在 2021/7/7 下午4:55, Stefan Hajnoczi 写道: >>> On Wed, Jul 07, 2021 at 11:43:28AM +0800, Jason Wang wrote: >>>> 在 2021/7/7 上午1:11, Stefan Hajnoczi 写道: >>>>> On Tue, Jul 06, 2021 at 09:08:26PM +0800, Jason Wang wrote: >>>>>> On Tue, Jul 6, 2021 at 6:15 PM Stefan Hajnoczi wrote: >>>>>>> On Tue, Jul 06, 2021 at 10:34:33AM +0800, Jason Wang wrote: >>>>>>>> 在 2021/7/5 下午8:49, Stefan Hajnoczi 写道: >>>>>>>>> On Mon, Jul 05, 2021 at 11:36:15AM +0800, Jason Wang wrote: >>>>>>>>>> 在 2021/7/4 下午5:49, Yongji Xie 写道: >>>>>>>>>>>>> OK, I get you now. Since the VIRTIO specification says "Device >>>>>>>>>>>>> configuration space is generally used for rarely-changing or >>>>>>>>>>>>> initialization-time parameters". I assume the VDUSE_DEV_SET_CONFIG >>>>>>>>>>>>> ioctl should not be called frequently. >>>>>>>>>>>> The spec uses MUST and other terms to define the precise requirements. >>>>>>>>>>>> Here the language (especially the word "generally") is weaker and means >>>>>>>>>>>> there may be exceptions. >>>>>>>>>>>> >>>>>>>>>>>> Another type of access that doesn't work with the VDUSE_DEV_SET_CONFIG >>>>>>>>>>>> approach is reads that have side-effects. For example, imagine a field >>>>>>>>>>>> containing an error code if the device encounters a problem unrelated to >>>>>>>>>>>> a specific virtqueue request. Reading from this field resets the error >>>>>>>>>>>> code to 0, saving the driver an extra configuration space write access >>>>>>>>>>>> and possibly race conditions. It isn't possible to implement those >>>>>>>>>>>> semantics suing VDUSE_DEV_SET_CONFIG. It's another corner case, but it >>>>>>>>>>>> makes me think that the interface does not allow full VIRTIO semantics. >>>>>>>>>> Note that though you're correct, my understanding is that config space is >>>>>>>>>> not suitable for this kind of error propagating. And it would be very hard >>>>>>>>>> to implement such kind of semantic in some transports. Virtqueue should be >>>>>>>>>> much better. As Yong Ji quoted, the config space is used for >>>>>>>>>> "rarely-changing or intialization-time parameters". >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Agreed. I will use VDUSE_DEV_GET_CONFIG in the next version. And to >>>>>>>>>>> handle the message failure, I'm going to add a return value to >>>>>>>>>>> virtio_config_ops.get() and virtio_cread_* API so that the error can >>>>>>>>>>> be propagated to the virtio device driver. Then the virtio-blk device >>>>>>>>>>> driver can be modified to handle that. >>>>>>>>>>> >>>>>>>>>>> Jason and Stefan, what do you think of this way? >>>>>>>>> Why does VDUSE_DEV_GET_CONFIG need to support an error return value? >>>>>>>>> >>>>>>>>> The VIRTIO spec provides no way for the device to report errors from >>>>>>>>> config space accesses. >>>>>>>>> >>>>>>>>> The QEMU virtio-pci implementation returns -1 from invalid >>>>>>>>> virtio_config_read*() and silently discards virtio_config_write*() >>>>>>>>> accesses. >>>>>>>>> >>>>>>>>> VDUSE can take the same approach with >>>>>>>>> VDUSE_DEV_GET_CONFIG/VDUSE_DEV_SET_CONFIG. >>>>>>>>> >>>>>>>>>> I'd like to stick to the current assumption thich get_config won't fail. >>>>>>>>>> That is to say, >>>>>>>>>> >>>>>>>>>> 1) maintain a config in the kernel, make sure the config space read can >>>>>>>>>> always succeed >>>>>>>>>> 2) introduce an ioctl for the vduse usersapce to update the config space. >>>>>>>>>> 3) we can synchronize with the vduse userspace during set_config >>>>>>>>>> >>>>>>>>>> Does this work? >>>>>>>>> I noticed that caching is also allowed by the vhost-user protocol >>>>>>>>> messages (QEMU's docs/interop/vhost-user.rst), but the device doesn't >>>>>>>>> know whether or not caching is in effect. The interface you outlined >>>>>>>>> above requires caching. >>>>>>>>> >>>>>>>>> Is there a reason why the host kernel vDPA code needs to cache the >>>>>>>>> configuration space? >>>>>>>> Because: >>>>>>>> >>>>>>>> 1) Kernel can not wait forever in get_config(), this is the major difference >>>>>>>> with vhost-user. >>>>>>> virtio_cread() can sleep: >>>>>>> >>>>>>> #define virtio_cread(vdev, structname, member, ptr) \ >>>>>>> do { \ >>>>>>> typeof(((structname*)0)->member) virtio_cread_v; \ >>>>>>> \ >>>>>>> might_sleep(); \ >>>>>>> ^^^^^^^^^^^^^^ >>>>>>> >>>>>>> Which code path cannot sleep? >>>>>> Well, it can sleep but it can't sleep forever. For VDUSE, a >>>>>> buggy/malicious userspace may refuse to respond to the get_config. >>>>>> >>>>>> It looks to me the ideal case, with the current virtio spec, for VDUSE is to >>>>>> >>>>>> 1) maintain the device and its state in the kernel, userspace may sync >>>>>> with the kernel device via ioctls >>>>>> 2) offload the datapath (virtqueue) to the userspace >>>>>> >>>>>> This seems more robust and safe than simply relaying everything to >>>>>> userspace and waiting for its response. >>>>>> >>>>>> And we know for sure this model can work, an example is TUN/TAP: >>>>>> netdevice is abstracted in the kernel and datapath is done via >>>>>> sendmsg()/recvmsg(). >>>>>> >>>>>> Maintaining the config in the kernel follows this model and it can >>>>>> simplify the device generation implementation. >>>>>> >>>>>> For config space write, it requires more thought but fortunately it's >>>>>> not commonly used. So VDUSE can choose to filter out the >>>>>> device/features that depends on the config write. >>>>> This is the problem. There are other messages like SET_FEATURES where I >>>>> guess we'll face the same challenge. >>>> Probably not, userspace device can tell the kernel about the device_features >>>> and mandated_features during creation, and the feature negotiation could be >>>> done purely in the kernel without bothering the userspace. >> >> (For some reason I drop the list accidentally, adding them back, sorry) >> >> >>> Sorry, I confused the messages. I meant SET_STATUS. It's a synchronous >>> interface where the driver waits for the device. >> >> It depends on how we define "synchronous" here. If I understand correctly, >> the spec doesn't expect there will be any kind of failure for the operation >> of set_status itself. >> >> Instead, anytime it want any synchronization, it should be done via >> get_status(): >> >> 1) re-read device status to make sure FEATURES_OK is set during feature >> negotiation >> 2) re-read device status to be 0 to make sure the device has finish the >> reset >> >> >>> VDUSE currently doesn't wait for the device emulation process to handle >>> this message (no reply is needed) but I think this is a mistake because >>> VDUSE is not following the VIRTIO device model. >> >> With the trick that is done for FEATURES_OK above, I think we don't need to >> wait for the reply. >> >> If userspace takes too long to respond, it can be detected since >> get_status() doesn't return the expected value for long time. >> >> And for the case that needs a timeout, we probably can use NEEDS_RESET. > I think you're right. get_status is the synchronization point, not > set_status. > > Currently there is no VDUSE GET_STATUS message. The > VDUSE_START/STOP_DATAPLANE messages could be changed to SET_STATUS so > that the device emulation program can participate in emulating the > Device Status field. I'm not sure I get this, but it is what has been done? +static void vduse_vdpa_set_status(struct vdpa_device *vdpa, u8 status) +{ +    struct vduse_dev *dev = vdpa_to_vduse(vdpa); +    bool started = !!(status & VIRTIO_CONFIG_S_DRIVER_OK); + +    dev->status = status; + +    if (dev->started == started) +        return; + +    dev->started = started; +    if (dev->started) { +        vduse_dev_start_dataplane(dev); +    } else { +        vduse_dev_reset(dev); +        vduse_dev_stop_dataplane(dev); +    } +} But the looks not correct: 1) !DRIVER_OK doesn't means a reset? 2) Need to deal with FEATURES_OK Thanks > This change could affect VDUSE's VIRTIO feature > interface since the device emulation program can reject features by not > setting FEATURES_OK. > > Stefan