2021-02-19 11:58:37

by Si-Wei Liu

[permalink] [raw]
Subject: [PATCH] vdpa/mlx5: set_features should allow reset to zero

Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
for legacy") made an exception for legacy guests to reset
features to 0, when config space is accessed before features
are set. We should relieve the verify_min_features() check
and allow features reset to 0 for this case.

It's worth noting that not just legacy guests could access
config space before features are set. For instance, when
feature VIRTIO_NET_F_MTU is advertised some modern driver
will try to access and validate the MTU present in the config
space before virtio features are set. Rejecting reset to 0
prematurely causes correct MTU and link status unable to load
for the very first config space access, rendering issues like
guest showing inaccurate MTU value, or failure to reject
out-of-range MTU.

Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Si-Wei Liu <[email protected]>
---
drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--------------
1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 7c1f789..540dd67 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1490,14 +1490,6 @@ static u64 mlx5_vdpa_get_features(struct vdpa_device *vdev)
return mvdev->mlx_features;
}

-static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64 features)
-{
- if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
- return -EOPNOTSUPP;
-
- return 0;
-}
-
static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
{
int err;
@@ -1558,18 +1550,13 @@ static int mlx5_vdpa_set_features(struct vdpa_device *vdev, u64 features)
{
struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
- int err;

print_features(mvdev, features, true);

- err = verify_min_features(mvdev, features);
- if (err)
- return err;
-
ndev->mvdev.actual_features = features & ndev->mvdev.mlx_features;
ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
ndev->config.status |= cpu_to_mlx5vdpa16(mvdev, VIRTIO_NET_S_LINK_UP);
- return err;
+ return 0;
}

static void mlx5_vdpa_set_config_cb(struct vdpa_device *vdev, struct vdpa_callback *cb)
--
1.8.3.1


2021-02-21 14:47:34

by Eli Cohen

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

On Fri, Feb 19, 2021 at 06:54:58AM -0500, Si-Wei Liu wrote:
> Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
> for legacy") made an exception for legacy guests to reset
> features to 0, when config space is accessed before features
> are set. We should relieve the verify_min_features() check
> and allow features reset to 0 for this case.
>
> It's worth noting that not just legacy guests could access
> config space before features are set. For instance, when
> feature VIRTIO_NET_F_MTU is advertised some modern driver
> will try to access and validate the MTU present in the config
> space before virtio features are set. Rejecting reset to 0
> prematurely causes correct MTU and link status unable to load
> for the very first config space access, rendering issues like
> guest showing inaccurate MTU value, or failure to reject
> out-of-range MTU.
>
> Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
> Signed-off-by: Si-Wei Liu <[email protected]>
> ---
> drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--------------
> 1 file changed, 1 insertion(+), 14 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 7c1f789..540dd67 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -1490,14 +1490,6 @@ static u64 mlx5_vdpa_get_features(struct vdpa_device *vdev)
> return mvdev->mlx_features;
> }
>
> -static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64 features)
> -{
> - if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
> - return -EOPNOTSUPP;
> -
> - return 0;
> -}
> -

But what if VIRTIO_F_ACCESS_PLATFORM is not offerred? This does not
support such cases.

Maybe we should call verify_min_features() from mlx5_vdpa_set_status()
just before attempting to call setup_driver().

> static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
> {
> int err;
> @@ -1558,18 +1550,13 @@ static int mlx5_vdpa_set_features(struct vdpa_device *vdev, u64 features)
> {
> struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> - int err;
>
> print_features(mvdev, features, true);
>
> - err = verify_min_features(mvdev, features);
> - if (err)
> - return err;
> -
> ndev->mvdev.actual_features = features & ndev->mvdev.mlx_features;
> ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
> ndev->config.status |= cpu_to_mlx5vdpa16(mvdev, VIRTIO_NET_S_LINK_UP);
> - return err;
> + return 0;
> }
>
> static void mlx5_vdpa_set_config_cb(struct vdpa_device *vdev, struct vdpa_callback *cb)
> --
> 1.8.3.1
>

2021-02-21 21:56:44

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

On Sun, Feb 21, 2021 at 04:44:37PM +0200, Eli Cohen wrote:
> On Fri, Feb 19, 2021 at 06:54:58AM -0500, Si-Wei Liu wrote:
> > Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
> > for legacy") made an exception for legacy guests to reset
> > features to 0, when config space is accessed before features
> > are set. We should relieve the verify_min_features() check
> > and allow features reset to 0 for this case.
> >
> > It's worth noting that not just legacy guests could access
> > config space before features are set. For instance, when
> > feature VIRTIO_NET_F_MTU is advertised some modern driver
> > will try to access and validate the MTU present in the config
> > space before virtio features are set. Rejecting reset to 0
> > prematurely causes correct MTU and link status unable to load
> > for the very first config space access, rendering issues like
> > guest showing inaccurate MTU value, or failure to reject
> > out-of-range MTU.
> >
> > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
> > Signed-off-by: Si-Wei Liu <[email protected]>
> > ---
> > drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--------------
> > 1 file changed, 1 insertion(+), 14 deletions(-)
> >
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index 7c1f789..540dd67 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -1490,14 +1490,6 @@ static u64 mlx5_vdpa_get_features(struct vdpa_device *vdev)
> > return mvdev->mlx_features;
> > }
> >
> > -static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64 features)
> > -{
> > - if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
> > - return -EOPNOTSUPP;
> > -
> > - return 0;
> > -}
> > -
>
> But what if VIRTIO_F_ACCESS_PLATFORM is not offerred? This does not
> support such cases.

Did you mean "catch such cases" rather than "support"?


> Maybe we should call verify_min_features() from mlx5_vdpa_set_status()
> just before attempting to call setup_driver().
>
> > static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
> > {
> > int err;
> > @@ -1558,18 +1550,13 @@ static int mlx5_vdpa_set_features(struct vdpa_device *vdev, u64 features)
> > {
> > struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> > struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> > - int err;
> >
> > print_features(mvdev, features, true);
> >
> > - err = verify_min_features(mvdev, features);
> > - if (err)
> > - return err;
> > -
> > ndev->mvdev.actual_features = features & ndev->mvdev.mlx_features;
> > ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
> > ndev->config.status |= cpu_to_mlx5vdpa16(mvdev, VIRTIO_NET_S_LINK_UP);
> > - return err;
> > + return 0;
> > }
> >
> > static void mlx5_vdpa_set_config_cb(struct vdpa_device *vdev, struct vdpa_callback *cb)
> > --
> > 1.8.3.1
> >

2021-02-22 04:18:54

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero


On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
> Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
> for legacy") made an exception for legacy guests to reset
> features to 0, when config space is accessed before features
> are set. We should relieve the verify_min_features() check
> and allow features reset to 0 for this case.
>
> It's worth noting that not just legacy guests could access
> config space before features are set. For instance, when
> feature VIRTIO_NET_F_MTU is advertised some modern driver
> will try to access and validate the MTU present in the config
> space before virtio features are set.


This looks like a spec violation:

"

The following driver-read-only field, mtu only exists if
VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the
driver to use.
"

Do we really want to workaround this?

Thanks


> Rejecting reset to 0
> prematurely causes correct MTU and link status unable to load
> for the very first config space access, rendering issues like
> guest showing inaccurate MTU value, or failure to reject
> out-of-range MTU.
>
> Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
> Signed-off-by: Si-Wei Liu <[email protected]>
> ---
> drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--------------
> 1 file changed, 1 insertion(+), 14 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 7c1f789..540dd67 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -1490,14 +1490,6 @@ static u64 mlx5_vdpa_get_features(struct vdpa_device *vdev)
> return mvdev->mlx_features;
> }
>
> -static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64 features)
> -{
> - if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
> - return -EOPNOTSUPP;
> -
> - return 0;
> -}
> -
> static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
> {
> int err;
> @@ -1558,18 +1550,13 @@ static int mlx5_vdpa_set_features(struct vdpa_device *vdev, u64 features)
> {
> struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> - int err;
>
> print_features(mvdev, features, true);
>
> - err = verify_min_features(mvdev, features);
> - if (err)
> - return err;
> -
> ndev->mvdev.actual_features = features & ndev->mvdev.mlx_features;
> ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
> ndev->config.status |= cpu_to_mlx5vdpa16(mvdev, VIRTIO_NET_S_LINK_UP);
> - return err;
> + return 0;
> }
>
> static void mlx5_vdpa_set_config_cb(struct vdpa_device *vdev, struct vdpa_callback *cb)

2021-02-22 06:08:30

by Eli Cohen

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

On Sun, Feb 21, 2021 at 04:52:05PM -0500, Michael S. Tsirkin wrote:
> On Sun, Feb 21, 2021 at 04:44:37PM +0200, Eli Cohen wrote:
> > On Fri, Feb 19, 2021 at 06:54:58AM -0500, Si-Wei Liu wrote:
> > > Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
> > > for legacy") made an exception for legacy guests to reset
> > > features to 0, when config space is accessed before features
> > > are set. We should relieve the verify_min_features() check
> > > and allow features reset to 0 for this case.
> > >
> > > It's worth noting that not just legacy guests could access
> > > config space before features are set. For instance, when
> > > feature VIRTIO_NET_F_MTU is advertised some modern driver
> > > will try to access and validate the MTU present in the config
> > > space before virtio features are set. Rejecting reset to 0
> > > prematurely causes correct MTU and link status unable to load
> > > for the very first config space access, rendering issues like
> > > guest showing inaccurate MTU value, or failure to reject
> > > out-of-range MTU.
> > >
> > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
> > > Signed-off-by: Si-Wei Liu <[email protected]>
> > > ---
> > > drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--------------
> > > 1 file changed, 1 insertion(+), 14 deletions(-)
> > >
> > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > index 7c1f789..540dd67 100644
> > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > @@ -1490,14 +1490,6 @@ static u64 mlx5_vdpa_get_features(struct vdpa_device *vdev)
> > > return mvdev->mlx_features;
> > > }
> > >
> > > -static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64 features)
> > > -{
> > > - if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
> > > - return -EOPNOTSUPP;
> > > -
> > > - return 0;
> > > -}
> > > -
> >
> > But what if VIRTIO_F_ACCESS_PLATFORM is not offerred? This does not
> > support such cases.
>
> Did you mean "catch such cases" rather than "support"?
>

Actually I meant this driver/device does not support such cases.

>
> > Maybe we should call verify_min_features() from mlx5_vdpa_set_status()
> > just before attempting to call setup_driver().
> >
> > > static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
> > > {
> > > int err;
> > > @@ -1558,18 +1550,13 @@ static int mlx5_vdpa_set_features(struct vdpa_device *vdev, u64 features)
> > > {
> > > struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> > > struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> > > - int err;
> > >
> > > print_features(mvdev, features, true);
> > >
> > > - err = verify_min_features(mvdev, features);
> > > - if (err)
> > > - return err;
> > > -
> > > ndev->mvdev.actual_features = features & ndev->mvdev.mlx_features;
> > > ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
> > > ndev->config.status |= cpu_to_mlx5vdpa16(mvdev, VIRTIO_NET_S_LINK_UP);
> > > - return err;
> > > + return 0;
> > > }
> > >
> > > static void mlx5_vdpa_set_config_cb(struct vdpa_device *vdev, struct vdpa_callback *cb)
> > > --
> > > 1.8.3.1
> > >
>

2021-02-22 07:38:15

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote:
>
> On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
> > Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
> > for legacy") made an exception for legacy guests to reset
> > features to 0, when config space is accessed before features
> > are set. We should relieve the verify_min_features() check
> > and allow features reset to 0 for this case.
> >
> > It's worth noting that not just legacy guests could access
> > config space before features are set. For instance, when
> > feature VIRTIO_NET_F_MTU is advertised some modern driver
> > will try to access and validate the MTU present in the config
> > space before virtio features are set.
>
>
> This looks like a spec violation:
>
> "
>
> The following driver-read-only field, mtu only exists if VIRTIO_NET_F_MTU is
> set.
> This field specifies the maximum MTU for the driver to use.
> "
>
> Do we really want to workaround this?
>
> Thanks

And also:

The driver MUST follow this sequence to initialize a device:
1. Reset the device.
2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
3. Set the DRIVER status bit: the guest OS knows how to drive the device.
4. Read device feature bits, and write the subset of feature bits understood by the OS and driver to the
device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration
fields to check that it can support the device before accepting it.
5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step.
6. Re-read device status to ensure the FEATURES_OK bit is still set: otherwise, the device does not
support our subset of features and the device is unusable.
7. Perform device-specific setup, including discovery of virtqueues for the device, optional per-bus setup,
reading and possibly writing the device’s virtio configuration space, and population of virtqueues.
8. Set the DRIVER_OK status bit. At this point the device is “live”.


so accessing config space before FEATURES_OK is a spec violation, right?


>
> > Rejecting reset to 0
> > prematurely causes correct MTU and link status unable to load
> > for the very first config space access, rendering issues like
> > guest showing inaccurate MTU value, or failure to reject
> > out-of-range MTU.
> >
> > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
> > Signed-off-by: Si-Wei Liu <[email protected]>
> > ---
> > drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--------------
> > 1 file changed, 1 insertion(+), 14 deletions(-)
> >
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index 7c1f789..540dd67 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -1490,14 +1490,6 @@ static u64 mlx5_vdpa_get_features(struct vdpa_device *vdev)
> > return mvdev->mlx_features;
> > }
> > -static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64 features)
> > -{
> > - if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
> > - return -EOPNOTSUPP;
> > -
> > - return 0;
> > -}
> > -
> > static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
> > {
> > int err;
> > @@ -1558,18 +1550,13 @@ static int mlx5_vdpa_set_features(struct vdpa_device *vdev, u64 features)
> > {
> > struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> > struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> > - int err;
> > print_features(mvdev, features, true);
> > - err = verify_min_features(mvdev, features);
> > - if (err)
> > - return err;
> > -
> > ndev->mvdev.actual_features = features & ndev->mvdev.mlx_features;
> > ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
> > ndev->config.status |= cpu_to_mlx5vdpa16(mvdev, VIRTIO_NET_S_LINK_UP);
> > - return err;
> > + return 0;
> > }
> > static void mlx5_vdpa_set_config_cb(struct vdpa_device *vdev, struct vdpa_callback *cb)

2021-02-22 17:13:39

by Si-Wei Liu

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero



On 2/21/2021 8:14 PM, Jason Wang wrote:
>
> On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
>> Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
>> for legacy") made an exception for legacy guests to reset
>> features to 0, when config space is accessed before features
>> are set. We should relieve the verify_min_features() check
>> and allow features reset to 0 for this case.
>>
>> It's worth noting that not just legacy guests could access
>> config space before features are set. For instance, when
>> feature VIRTIO_NET_F_MTU is advertised some modern driver
>> will try to access and validate the MTU present in the config
>> space before virtio features are set.
>
>
> This looks like a spec violation:
>
> "
>
> The following driver-read-only field, mtu only exists if
> VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the
> driver to use.
> "
>
> Do we really want to workaround this?

Isn't the commit 452639a64ad8 itself is a workaround for legacy guest?

I think the point is, since there's legacy guest we'd have to support,
this host side workaround is unavoidable. Although I agree the violating
driver should be fixed (yes, it's in today's upstream kernel which
exists for a while now).

-Siwei

>
> Thanks
>
>
>> Rejecting reset to 0
>> prematurely causes correct MTU and link status unable to load
>> for the very first config space access, rendering issues like
>> guest showing inaccurate MTU value, or failure to reject
>> out-of-range MTU.
>>
>> Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5
>> devices")
>> Signed-off-by: Si-Wei Liu <[email protected]>
>> ---
>>   drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--------------
>>   1 file changed, 1 insertion(+), 14 deletions(-)
>>
>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> index 7c1f789..540dd67 100644
>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> @@ -1490,14 +1490,6 @@ static u64 mlx5_vdpa_get_features(struct
>> vdpa_device *vdev)
>>       return mvdev->mlx_features;
>>   }
>>   -static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64
>> features)
>> -{
>> -    if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
>> -        return -EOPNOTSUPP;
>> -
>> -    return 0;
>> -}
>> -
>>   static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
>>   {
>>       int err;
>> @@ -1558,18 +1550,13 @@ static int mlx5_vdpa_set_features(struct
>> vdpa_device *vdev, u64 features)
>>   {
>>       struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>>       struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
>> -    int err;
>>         print_features(mvdev, features, true);
>>   -    err = verify_min_features(mvdev, features);
>> -    if (err)
>> -        return err;
>> -
>>       ndev->mvdev.actual_features = features & ndev->mvdev.mlx_features;
>>       ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
>>       ndev->config.status |= cpu_to_mlx5vdpa16(mvdev,
>> VIRTIO_NET_S_LINK_UP);
>> -    return err;
>> +    return 0;
>>   }
>>     static void mlx5_vdpa_set_config_cb(struct vdpa_device *vdev,
>> struct vdpa_callback *cb)
>

2021-02-23 01:17:31

by Si-Wei Liu

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero



On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote:
> On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote:
>> On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
>>> Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
>>> for legacy") made an exception for legacy guests to reset
>>> features to 0, when config space is accessed before features
>>> are set. We should relieve the verify_min_features() check
>>> and allow features reset to 0 for this case.
>>>
>>> It's worth noting that not just legacy guests could access
>>> config space before features are set. For instance, when
>>> feature VIRTIO_NET_F_MTU is advertised some modern driver
>>> will try to access and validate the MTU present in the config
>>> space before virtio features are set.
>>
>> This looks like a spec violation:
>>
>> "
>>
>> The following driver-read-only field, mtu only exists if VIRTIO_NET_F_MTU is
>> set.
>> This field specifies the maximum MTU for the driver to use.
>> "
>>
>> Do we really want to workaround this?
>>
>> Thanks
> And also:
>
> The driver MUST follow this sequence to initialize a device:
> 1. Reset the device.
> 2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
> 3. Set the DRIVER status bit: the guest OS knows how to drive the device.
> 4. Read device feature bits, and write the subset of feature bits understood by the OS and driver to the
> device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration
> fields to check that it can support the device before accepting it.
> 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step.
> 6. Re-read device status to ensure the FEATURES_OK bit is still set: otherwise, the device does not
> support our subset of features and the device is unusable.
> 7. Perform device-specific setup, including discovery of virtqueues for the device, optional per-bus setup,
> reading and possibly writing the device’s virtio configuration space, and population of virtqueues.
> 8. Set the DRIVER_OK status bit. At this point the device is “live”.
>
>
> so accessing config space before FEATURES_OK is a spec violation, right?
It is, but it's not relevant to what this commit tries to address. I
thought the legacy guest still needs to be supported.

Having said, a separate patch has to be posted to fix the guest driver
issue where this discrepancy is introduced to virtnet_validate() (since
commit fe36cbe067). But it's not technically related to this patch.

-Siwei

>
>
>>> Rejecting reset to 0
>>> prematurely causes correct MTU and link status unable to load
>>> for the very first config space access, rendering issues like
>>> guest showing inaccurate MTU value, or failure to reject
>>> out-of-range MTU.
>>>
>>> Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
>>> Signed-off-by: Si-Wei Liu <[email protected]>
>>> ---
>>> drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--------------
>>> 1 file changed, 1 insertion(+), 14 deletions(-)
>>>
>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>> index 7c1f789..540dd67 100644
>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>> @@ -1490,14 +1490,6 @@ static u64 mlx5_vdpa_get_features(struct vdpa_device *vdev)
>>> return mvdev->mlx_features;
>>> }
>>> -static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64 features)
>>> -{
>>> - if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
>>> - return -EOPNOTSUPP;
>>> -
>>> - return 0;
>>> -}
>>> -
>>> static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
>>> {
>>> int err;
>>> @@ -1558,18 +1550,13 @@ static int mlx5_vdpa_set_features(struct vdpa_device *vdev, u64 features)
>>> {
>>> struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>>> struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
>>> - int err;
>>> print_features(mvdev, features, true);
>>> - err = verify_min_features(mvdev, features);
>>> - if (err)
>>> - return err;
>>> -
>>> ndev->mvdev.actual_features = features & ndev->mvdev.mlx_features;
>>> ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
>>> ndev->config.status |= cpu_to_mlx5vdpa16(mvdev, VIRTIO_NET_S_LINK_UP);
>>> - return err;
>>> + return 0;
>>> }
>>> static void mlx5_vdpa_set_config_cb(struct vdpa_device *vdev, struct vdpa_callback *cb)

2021-02-23 02:37:23

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero


On 2021/2/23 1:09 上午, Si-Wei Liu wrote:
>
>
> On 2/21/2021 8:14 PM, Jason Wang wrote:
>>
>> On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
>>> Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
>>> for legacy") made an exception for legacy guests to reset
>>> features to 0, when config space is accessed before features
>>> are set. We should relieve the verify_min_features() check
>>> and allow features reset to 0 for this case.
>>>
>>> It's worth noting that not just legacy guests could access
>>> config space before features are set. For instance, when
>>> feature VIRTIO_NET_F_MTU is advertised some modern driver
>>> will try to access and validate the MTU present in the config
>>> space before virtio features are set.
>>
>>
>> This looks like a spec violation:
>>
>> "
>>
>> The following driver-read-only field, mtu only exists if
>> VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the
>> driver to use.
>> "
>>
>> Do we really want to workaround this?
>
> Isn't the commit 452639a64ad8 itself is a workaround for legacy guest?


Yes, but the problem is we can't detect whether or not it's an legacy
guest (e.g feature is not set).


>
> I think the point is, since there's legacy guest we'd have to support,
> this host side workaround is unavoidable.


Since from vhost-vDPA point of view the driver is Qemu, it means we need
make qemu vhost-vDPA driver spec complaint.

E.g how about:

1) revert 452639a64ad8 and fix Qemu? In Qemu, during vhost-vDPA
initialization, do a minial config sync by neogitating minimal features
(e.g just VIRTIO_F_ACCESS_PLATFORM). When FEATURE_OK is not set from
guest, we can only allow it to access the config space that is emulated
by Qemu?

Then


> Although I agree the violating driver should be fixed (yes, it's in
> today's upstream kernel which exists for a while now).


2) Fix the virtio driver bug.

Or a quick workaround is to set (VIRTIO_F_ACCESS_PLATFORM instead of 0)
in this case.

Thanks


>
> -Siwei
>
>>
>> Thanks
>>
>>
>>> Rejecting reset to 0
>>> prematurely causes correct MTU and link status unable to load
>>> for the very first config space access, rendering issues like
>>> guest showing inaccurate MTU value, or failure to reject
>>> out-of-range MTU.
>>>
>>> Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5
>>> devices")
>>> Signed-off-by: Si-Wei Liu <[email protected]>
>>> ---
>>>   drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--------------
>>>   1 file changed, 1 insertion(+), 14 deletions(-)
>>>
>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>> index 7c1f789..540dd67 100644
>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>> @@ -1490,14 +1490,6 @@ static u64 mlx5_vdpa_get_features(struct
>>> vdpa_device *vdev)
>>>       return mvdev->mlx_features;
>>>   }
>>>   -static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64
>>> features)
>>> -{
>>> -    if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
>>> -        return -EOPNOTSUPP;
>>> -
>>> -    return 0;
>>> -}
>>> -
>>>   static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
>>>   {
>>>       int err;
>>> @@ -1558,18 +1550,13 @@ static int mlx5_vdpa_set_features(struct
>>> vdpa_device *vdev, u64 features)
>>>   {
>>>       struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>>>       struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
>>> -    int err;
>>>         print_features(mvdev, features, true);
>>>   -    err = verify_min_features(mvdev, features);
>>> -    if (err)
>>> -        return err;
>>> -
>>>       ndev->mvdev.actual_features = features &
>>> ndev->mvdev.mlx_features;
>>>       ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
>>>       ndev->config.status |= cpu_to_mlx5vdpa16(mvdev,
>>> VIRTIO_NET_S_LINK_UP);
>>> -    return err;
>>> +    return 0;
>>>   }
>>>     static void mlx5_vdpa_set_config_cb(struct vdpa_device *vdev,
>>> struct vdpa_callback *cb)
>>
>

2021-02-23 03:02:50

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero


On 2021/2/23 9:12 上午, Si-Wei Liu wrote:
>
>
> On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote:
>> On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote:
>>> On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
>>>> Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
>>>> for legacy") made an exception for legacy guests to reset
>>>> features to 0, when config space is accessed before features
>>>> are set. We should relieve the verify_min_features() check
>>>> and allow features reset to 0 for this case.
>>>>
>>>> It's worth noting that not just legacy guests could access
>>>> config space before features are set. For instance, when
>>>> feature VIRTIO_NET_F_MTU is advertised some modern driver
>>>> will try to access and validate the MTU present in the config
>>>> space before virtio features are set.
>>>
>>> This looks like a spec violation:
>>>
>>> "
>>>
>>> The following driver-read-only field, mtu only exists if
>>> VIRTIO_NET_F_MTU is
>>> set.
>>> This field specifies the maximum MTU for the driver to use.
>>> "
>>>
>>> Do we really want to workaround this?
>>>
>>> Thanks
>> And also:
>>
>> The driver MUST follow this sequence to initialize a device:
>> 1. Reset the device.
>> 2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
>> 3. Set the DRIVER status bit: the guest OS knows how to drive the
>> device.
>> 4. Read device feature bits, and write the subset of feature bits
>> understood by the OS and driver to the
>> device. During this step the driver MAY read (but MUST NOT write) the
>> device-specific configuration
>> fields to check that it can support the device before accepting it.
>> 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new
>> feature bits after this step.
>> 6. Re-read device status to ensure the FEATURES_OK bit is still set:
>> otherwise, the device does not
>> support our subset of features and the device is unusable.
>> 7. Perform device-specific setup, including discovery of virtqueues
>> for the device, optional per-bus setup,
>> reading and possibly writing the device’s virtio configuration space,
>> and population of virtqueues.
>> 8. Set the DRIVER_OK status bit. At this point the device is “live”.
>>
>>
>> so accessing config space before FEATURES_OK is a spec violation, right?
> It is, but it's not relevant to what this commit tries to address. I
> thought the legacy guest still needs to be supported.
>
> Having said, a separate patch has to be posted to fix the guest driver
> issue where this discrepancy is introduced to virtnet_validate()
> (since commit fe36cbe067). But it's not technically related to this
> patch.
>
> -Siwei


I think it's a bug to read config space in validate, we should move it
to virtnet_probe().

Thanks


>
>>
>>
>>>> Rejecting reset to 0
>>>> prematurely causes correct MTU and link status unable to load
>>>> for the very first config space access, rendering issues like
>>>> guest showing inaccurate MTU value, or failure to reject
>>>> out-of-range MTU.
>>>>
>>>> Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5
>>>> devices")
>>>> Signed-off-by: Si-Wei Liu <[email protected]>
>>>> ---
>>>>    drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--------------
>>>>    1 file changed, 1 insertion(+), 14 deletions(-)
>>>>
>>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>> index 7c1f789..540dd67 100644
>>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>> @@ -1490,14 +1490,6 @@ static u64 mlx5_vdpa_get_features(struct
>>>> vdpa_device *vdev)
>>>>        return mvdev->mlx_features;
>>>>    }
>>>> -static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64
>>>> features)
>>>> -{
>>>> -    if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
>>>> -        return -EOPNOTSUPP;
>>>> -
>>>> -    return 0;
>>>> -}
>>>> -
>>>>    static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
>>>>    {
>>>>        int err;
>>>> @@ -1558,18 +1550,13 @@ static int mlx5_vdpa_set_features(struct
>>>> vdpa_device *vdev, u64 features)
>>>>    {
>>>>        struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>>>>        struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
>>>> -    int err;
>>>>        print_features(mvdev, features, true);
>>>> -    err = verify_min_features(mvdev, features);
>>>> -    if (err)
>>>> -        return err;
>>>> -
>>>>        ndev->mvdev.actual_features = features &
>>>> ndev->mvdev.mlx_features;
>>>>        ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
>>>>        ndev->config.status |= cpu_to_mlx5vdpa16(mvdev,
>>>> VIRTIO_NET_S_LINK_UP);
>>>> -    return err;
>>>> +    return 0;
>>>>    }
>>>>    static void mlx5_vdpa_set_config_cb(struct vdpa_device *vdev,
>>>> struct vdpa_callback *cb)
>

2021-02-23 09:34:56

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

On Mon, Feb 22, 2021 at 09:09:28AM -0800, Si-Wei Liu wrote:
>
>
> On 2/21/2021 8:14 PM, Jason Wang wrote:
> >
> > On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
> > > Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
> > > for legacy") made an exception for legacy guests to reset
> > > features to 0, when config space is accessed before features
> > > are set. We should relieve the verify_min_features() check
> > > and allow features reset to 0 for this case.
> > >
> > > It's worth noting that not just legacy guests could access
> > > config space before features are set. For instance, when
> > > feature VIRTIO_NET_F_MTU is advertised some modern driver
> > > will try to access and validate the MTU present in the config
> > > space before virtio features are set.
> >
> >
> > This looks like a spec violation:
> >
> > "
> >
> > The following driver-read-only field, mtu only exists if
> > VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the
> > driver to use.
> > "
> >
> > Do we really want to workaround this?
>
> Isn't the commit 452639a64ad8 itself is a workaround for legacy guest?
>
> I think the point is, since there's legacy guest we'd have to support, this
> host side workaround is unavoidable. Although I agree the violating driver
> should be fixed (yes, it's in today's upstream kernel which exists for a
> while now).

Oh you are right:


static int virtnet_validate(struct virtio_device *vdev)
{
if (!vdev->config->get) {
dev_err(&vdev->dev, "%s failure: config access disabled\n",
__func__);
return -EINVAL;
}

if (!virtnet_validate_features(vdev))
return -EINVAL;

if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) {
int mtu = virtio_cread16(vdev,
offsetof(struct virtio_net_config,
mtu));
if (mtu < MIN_MTU)
__virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);
}

return 0;
}

And the spec says:


The driver MUST follow this sequence to initialize a device:
1. Reset the device.
2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
3. Set the DRIVER status bit: the guest OS knows how to drive the device.
4. Read device feature bits, and write the subset of feature bits understood by the OS and driver to the
device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration
fields to check that it can support the device before accepting it.
5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step.
6. Re-read device status to ensure the FEATURES_OK bit is still set: otherwise, the device does not
support our subset of features and the device is unusable.
7. Perform device-specific setup, including discovery of virtqueues for the device, optional per-bus setup,
reading and possibly writing the device’s virtio configuration space, and population of virtqueues.
8. Set the DRIVER_OK status bit. At this point the device is “live”.


Item 4 on the list explicitly allows reading config space before
FEATURES_OK.

I conclude that VIRTIO_NET_F_MTU is set means "set in device features".

Generally it is worth going over feature dependent config fields
and checking whether they should be present when device feature is set
or when feature bit has been negotiated, and making this clear.

--
MST

2021-02-23 09:38:28

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

On Mon, Feb 22, 2021 at 08:05:26AM +0200, Eli Cohen wrote:
> On Sun, Feb 21, 2021 at 04:52:05PM -0500, Michael S. Tsirkin wrote:
> > On Sun, Feb 21, 2021 at 04:44:37PM +0200, Eli Cohen wrote:
> > > On Fri, Feb 19, 2021 at 06:54:58AM -0500, Si-Wei Liu wrote:
> > > > Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
> > > > for legacy") made an exception for legacy guests to reset
> > > > features to 0, when config space is accessed before features
> > > > are set. We should relieve the verify_min_features() check
> > > > and allow features reset to 0 for this case.
> > > >
> > > > It's worth noting that not just legacy guests could access
> > > > config space before features are set. For instance, when
> > > > feature VIRTIO_NET_F_MTU is advertised some modern driver
> > > > will try to access and validate the MTU present in the config
> > > > space before virtio features are set. Rejecting reset to 0
> > > > prematurely causes correct MTU and link status unable to load
> > > > for the very first config space access, rendering issues like
> > > > guest showing inaccurate MTU value, or failure to reject
> > > > out-of-range MTU.
> > > >
> > > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
> > > > Signed-off-by: Si-Wei Liu <[email protected]>
> > > > ---
> > > > drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--------------
> > > > 1 file changed, 1 insertion(+), 14 deletions(-)
> > > >
> > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > index 7c1f789..540dd67 100644
> > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > @@ -1490,14 +1490,6 @@ static u64 mlx5_vdpa_get_features(struct vdpa_device *vdev)
> > > > return mvdev->mlx_features;
> > > > }
> > > >
> > > > -static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64 features)
> > > > -{
> > > > - if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
> > > > - return -EOPNOTSUPP;
> > > > -
> > > > - return 0;
> > > > -}
> > > > -
> > >
> > > But what if VIRTIO_F_ACCESS_PLATFORM is not offerred? This does not
> > > support such cases.
> >
> > Did you mean "catch such cases" rather than "support"?
> >
>
> Actually I meant this driver/device does not support such cases.

Well the removed code merely failed without VIRTIO_F_ACCESS_PLATFORM
it didn't actually try to support anything ...

> >
> > > Maybe we should call verify_min_features() from mlx5_vdpa_set_status()
> > > just before attempting to call setup_driver().
> > >
> > > > static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
> > > > {
> > > > int err;
> > > > @@ -1558,18 +1550,13 @@ static int mlx5_vdpa_set_features(struct vdpa_device *vdev, u64 features)
> > > > {
> > > > struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> > > > struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> > > > - int err;
> > > >
> > > > print_features(mvdev, features, true);
> > > >
> > > > - err = verify_min_features(mvdev, features);
> > > > - if (err)
> > > > - return err;
> > > > -
> > > > ndev->mvdev.actual_features = features & ndev->mvdev.mlx_features;
> > > > ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
> > > > ndev->config.status |= cpu_to_mlx5vdpa16(mvdev, VIRTIO_NET_S_LINK_UP);
> > > > - return err;
> > > > + return 0;
> > > > }
> > > >
> > > > static void mlx5_vdpa_set_config_cb(struct vdpa_device *vdev, struct vdpa_callback *cb)
> > > > --
> > > > 1.8.3.1
> > > >
> >

2021-02-23 10:07:56

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

On Tue, Feb 23, 2021 at 05:46:20PM +0800, Jason Wang wrote:
>
> On 2021/2/23 下午5:25, Michael S. Tsirkin wrote:
> > On Mon, Feb 22, 2021 at 09:09:28AM -0800, Si-Wei Liu wrote:
> > >
> > > On 2/21/2021 8:14 PM, Jason Wang wrote:
> > > > On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
> > > > > Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
> > > > > for legacy") made an exception for legacy guests to reset
> > > > > features to 0, when config space is accessed before features
> > > > > are set. We should relieve the verify_min_features() check
> > > > > and allow features reset to 0 for this case.
> > > > >
> > > > > It's worth noting that not just legacy guests could access
> > > > > config space before features are set. For instance, when
> > > > > feature VIRTIO_NET_F_MTU is advertised some modern driver
> > > > > will try to access and validate the MTU present in the config
> > > > > space before virtio features are set.
> > > >
> > > > This looks like a spec violation:
> > > >
> > > > "
> > > >
> > > > The following driver-read-only field, mtu only exists if
> > > > VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the
> > > > driver to use.
> > > > "
> > > >
> > > > Do we really want to workaround this?
> > > Isn't the commit 452639a64ad8 itself is a workaround for legacy guest?
> > >
> > > I think the point is, since there's legacy guest we'd have to support, this
> > > host side workaround is unavoidable. Although I agree the violating driver
> > > should be fixed (yes, it's in today's upstream kernel which exists for a
> > > while now).
> > Oh you are right:
> >
> >
> > static int virtnet_validate(struct virtio_device *vdev)
> > {
> > if (!vdev->config->get) {
> > dev_err(&vdev->dev, "%s failure: config access disabled\n",
> > __func__);
> > return -EINVAL;
> > }
> >
> > if (!virtnet_validate_features(vdev))
> > return -EINVAL;
> >
> > if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) {
> > int mtu = virtio_cread16(vdev,
> > offsetof(struct virtio_net_config,
> > mtu));
> > if (mtu < MIN_MTU)
> > __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);
>
>
> I wonder why not simply fail here?

Back in 2016 it went like this:

On Thu, Jun 02, 2016 at 05:10:59PM -0400, Aaron Conole wrote:
> + if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) {
> + dev->mtu = virtio_cread16(vdev,
> + offsetof(struct virtio_net_config,
> + mtu));
> + }
> +
> if (vi->any_header_sg)
> dev->needed_headroom = vi->hdr_len;
>

One comment though: I think we should validate the mtu.
If it's invalid, clear VIRTIO_NET_F_MTU and ignore.


Too late at this point :)

I guess it's a way to tell device "I can not live with this MTU",
device can fail FEATURES_OK if it wants to. MIN_MTU
is an internal linux thing and at the time I felt it's better to
try to make progress.


>
> > }
> >
> > return 0;
> > }
> >
> > And the spec says:
> >
> >
> > The driver MUST follow this sequence to initialize a device:
> > 1. Reset the device.
> > 2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
> > 3. Set the DRIVER status bit: the guest OS knows how to drive the device.
> > 4. Read device feature bits, and write the subset of feature bits understood by the OS and driver to the
> > device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration
> > fields to check that it can support the device before accepting it.
> > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step.
> > 6. Re-read device status to ensure the FEATURES_OK bit is still set: otherwise, the device does not
> > support our subset of features and the device is unusable.
> > 7. Perform device-specific setup, including discovery of virtqueues for the device, optional per-bus setup,
> > reading and possibly writing the device’s virtio configuration space, and population of virtqueues.
> > 8. Set the DRIVER_OK status bit. At this point the device is “live”.
> >
> >
> > Item 4 on the list explicitly allows reading config space before
> > FEATURES_OK.
> >
> > I conclude that VIRTIO_NET_F_MTU is set means "set in device features".
>
>
> So this probably need some clarification. "is set" is used many times in the
> spec that has different implications.
>
> Thanks
>
>
> >
> > Generally it is worth going over feature dependent config fields
> > and checking whether they should be present when device feature is set
> > or when feature bit has been negotiated, and making this clear.
> >

2021-02-23 10:21:22

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero


On 2021/2/23 6:01 下午, Michael S. Tsirkin wrote:
> On Tue, Feb 23, 2021 at 05:46:20PM +0800, Jason Wang wrote:
>> On 2021/2/23 下午5:25, Michael S. Tsirkin wrote:
>>> On Mon, Feb 22, 2021 at 09:09:28AM -0800, Si-Wei Liu wrote:
>>>> On 2/21/2021 8:14 PM, Jason Wang wrote:
>>>>> On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
>>>>>> Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
>>>>>> for legacy") made an exception for legacy guests to reset
>>>>>> features to 0, when config space is accessed before features
>>>>>> are set. We should relieve the verify_min_features() check
>>>>>> and allow features reset to 0 for this case.
>>>>>>
>>>>>> It's worth noting that not just legacy guests could access
>>>>>> config space before features are set. For instance, when
>>>>>> feature VIRTIO_NET_F_MTU is advertised some modern driver
>>>>>> will try to access and validate the MTU present in the config
>>>>>> space before virtio features are set.
>>>>> This looks like a spec violation:
>>>>>
>>>>> "
>>>>>
>>>>> The following driver-read-only field, mtu only exists if
>>>>> VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the
>>>>> driver to use.
>>>>> "
>>>>>
>>>>> Do we really want to workaround this?
>>>> Isn't the commit 452639a64ad8 itself is a workaround for legacy guest?
>>>>
>>>> I think the point is, since there's legacy guest we'd have to support, this
>>>> host side workaround is unavoidable. Although I agree the violating driver
>>>> should be fixed (yes, it's in today's upstream kernel which exists for a
>>>> while now).
>>> Oh you are right:
>>>
>>>
>>> static int virtnet_validate(struct virtio_device *vdev)
>>> {
>>> if (!vdev->config->get) {
>>> dev_err(&vdev->dev, "%s failure: config access disabled\n",
>>> __func__);
>>> return -EINVAL;
>>> }
>>>
>>> if (!virtnet_validate_features(vdev))
>>> return -EINVAL;
>>>
>>> if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) {
>>> int mtu = virtio_cread16(vdev,
>>> offsetof(struct virtio_net_config,
>>> mtu));
>>> if (mtu < MIN_MTU)
>>> __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);
>>
>> I wonder why not simply fail here?
> Back in 2016 it went like this:
>
> On Thu, Jun 02, 2016 at 05:10:59PM -0400, Aaron Conole wrote:
> > + if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) {
> > + dev->mtu = virtio_cread16(vdev,
> > + offsetof(struct virtio_net_config,
> > + mtu));
> > + }
> > +
> > if (vi->any_header_sg)
> > dev->needed_headroom = vi->hdr_len;
> >
>
> One comment though: I think we should validate the mtu.
> If it's invalid, clear VIRTIO_NET_F_MTU and ignore.
>
>
> Too late at this point :)
>
> I guess it's a way to tell device "I can not live with this MTU",
> device can fail FEATURES_OK if it wants to. MIN_MTU
> is an internal linux thing and at the time I felt it's better to
> try to make progress.


What if e.g the device advertise a large MTU. E.g 64K here? In that
case, the driver can not live either. Clearing MTU won't help here.

Thanks


>
>
>>> }
>>>
>>> return 0;
>>> }
>>>
>>> And the spec says:
>>>
>>>
>>> The driver MUST follow this sequence to initialize a device:
>>> 1. Reset the device.
>>> 2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
>>> 3. Set the DRIVER status bit: the guest OS knows how to drive the device.
>>> 4. Read device feature bits, and write the subset of feature bits understood by the OS and driver to the
>>> device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration
>>> fields to check that it can support the device before accepting it.
>>> 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step.
>>> 6. Re-read device status to ensure the FEATURES_OK bit is still set: otherwise, the device does not
>>> support our subset of features and the device is unusable.
>>> 7. Perform device-specific setup, including discovery of virtqueues for the device, optional per-bus setup,
>>> reading and possibly writing the device’s virtio configuration space, and population of virtqueues.
>>> 8. Set the DRIVER_OK status bit. At this point the device is “live”.
>>>
>>>
>>> Item 4 on the list explicitly allows reading config space before
>>> FEATURES_OK.
>>>
>>> I conclude that VIRTIO_NET_F_MTU is set means "set in device features".
>>
>> So this probably need some clarification. "is set" is used many times in the
>> spec that has different implications.
>>
>> Thanks
>>
>>
>>> Generally it is worth going over feature dependent config fields
>>> and checking whether they should be present when device feature is set
>>> or when feature bit has been negotiated, and making this clear.
>>>

2021-02-23 12:30:27

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

On Fri, Feb 19, 2021 at 06:54:58AM -0500, Si-Wei Liu wrote:
> Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
> for legacy") made an exception for legacy guests to reset
> features to 0, when config space is accessed before features
> are set. We should relieve the verify_min_features() check
> and allow features reset to 0 for this case.
>
> It's worth noting that not just legacy guests could access
> config space before features are set. For instance, when
> feature VIRTIO_NET_F_MTU is advertised some modern driver
> will try to access and validate the MTU present in the config
> space before virtio features are set. Rejecting reset to 0
> prematurely causes correct MTU and link status unable to load
> for the very first config space access, rendering issues like
> guest showing inaccurate MTU value, or failure to reject
> out-of-range MTU.
>
> Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")

isn't this more

vdpa: make sure set_features is invoked for legacy


> Signed-off-by: Si-Wei Liu <[email protected]>

I think we at least need to correct the comment in
include/linux/vdpa.h then

Instead of "we assume a legacy guest" we'd say something like
"call set features in case it's a legacy guest".

Generally it's unfortunate. Need to think about what to do here.
Any idea how else we can cleanly detect a legacy guest?

> ---
> drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--------------
> 1 file changed, 1 insertion(+), 14 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 7c1f789..540dd67 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -1490,14 +1490,6 @@ static u64 mlx5_vdpa_get_features(struct vdpa_device *vdev)
> return mvdev->mlx_features;
> }
>
> -static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64 features)
> -{
> - if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
> - return -EOPNOTSUPP;
> -
> - return 0;
> -}
> -
> static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
> {
> int err;

Let's just set VIRTIO_F_ACCESS_PLATFORM in core?
Then we don't need to hack mlx5 ...


> @@ -1558,18 +1550,13 @@ static int mlx5_vdpa_set_features(struct vdpa_device *vdev, u64 features)
> {
> struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> - int err;
>
> print_features(mvdev, features, true);
>
> - err = verify_min_features(mvdev, features);
> - if (err)
> - return err;
> -
> ndev->mvdev.actual_features = features & ndev->mvdev.mlx_features;
> ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
> ndev->config.status |= cpu_to_mlx5vdpa16(mvdev, VIRTIO_NET_S_LINK_UP);
> - return err;
> + return 0;
> }
>
> static void mlx5_vdpa_set_config_cb(struct vdpa_device *vdev, struct vdpa_callback *cb)
> --
> 1.8.3.1

2021-02-23 13:30:32

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

On Tue, Feb 23, 2021 at 10:03:57AM +0800, Jason Wang wrote:
>
> On 2021/2/23 9:12 上午, Si-Wei Liu wrote:
> >
> >
> > On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote:
> > > On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote:
> > > > On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
> > > > > Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
> > > > > for legacy") made an exception for legacy guests to reset
> > > > > features to 0, when config space is accessed before features
> > > > > are set. We should relieve the verify_min_features() check
> > > > > and allow features reset to 0 for this case.
> > > > >
> > > > > It's worth noting that not just legacy guests could access
> > > > > config space before features are set. For instance, when
> > > > > feature VIRTIO_NET_F_MTU is advertised some modern driver
> > > > > will try to access and validate the MTU present in the config
> > > > > space before virtio features are set.
> > > >
> > > > This looks like a spec violation:
> > > >
> > > > "
> > > >
> > > > The following driver-read-only field, mtu only exists if
> > > > VIRTIO_NET_F_MTU is
> > > > set.
> > > > This field specifies the maximum MTU for the driver to use.
> > > > "
> > > >
> > > > Do we really want to workaround this?
> > > >
> > > > Thanks
> > > And also:
> > >
> > > The driver MUST follow this sequence to initialize a device:
> > > 1. Reset the device.
> > > 2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
> > > 3. Set the DRIVER status bit: the guest OS knows how to drive the
> > > device.
> > > 4. Read device feature bits, and write the subset of feature bits
> > > understood by the OS and driver to the
> > > device. During this step the driver MAY read (but MUST NOT write)
> > > the device-specific configuration
> > > fields to check that it can support the device before accepting it.
> > > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new
> > > feature bits after this step.
> > > 6. Re-read device status to ensure the FEATURES_OK bit is still set:
> > > otherwise, the device does not
> > > support our subset of features and the device is unusable.
> > > 7. Perform device-specific setup, including discovery of virtqueues
> > > for the device, optional per-bus setup,
> > > reading and possibly writing the device’s virtio configuration
> > > space, and population of virtqueues.
> > > 8. Set the DRIVER_OK status bit. At this point the device is “live”.
> > >
> > >
> > > so accessing config space before FEATURES_OK is a spec violation, right?
> > It is, but it's not relevant to what this commit tries to address. I
> > thought the legacy guest still needs to be supported.
> >
> > Having said, a separate patch has to be posted to fix the guest driver
> > issue where this discrepancy is introduced to virtnet_validate() (since
> > commit fe36cbe067). But it's not technically related to this patch.
> >
> > -Siwei
>
>
> I think it's a bug to read config space in validate, we should move it to
> virtnet_probe().
>
> Thanks

I take it back, reading but not writing seems to be explicitly allowed by spec.
So our way to detect a legacy guest is bogus, need to think what is
the best way to handle this.

>
> >
> > >
> > >
> > > > > Rejecting reset to 0
> > > > > prematurely causes correct MTU and link status unable to load
> > > > > for the very first config space access, rendering issues like
> > > > > guest showing inaccurate MTU value, or failure to reject
> > > > > out-of-range MTU.
> > > > >
> > > > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for
> > > > > supported mlx5 devices")
> > > > > Signed-off-by: Si-Wei Liu <[email protected]>
> > > > > ---
> > > > >    drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--------------
> > > > >    1 file changed, 1 insertion(+), 14 deletions(-)
> > > > >
> > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > index 7c1f789..540dd67 100644
> > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > @@ -1490,14 +1490,6 @@ static u64
> > > > > mlx5_vdpa_get_features(struct vdpa_device *vdev)
> > > > >        return mvdev->mlx_features;
> > > > >    }
> > > > > -static int verify_min_features(struct mlx5_vdpa_dev *mvdev,
> > > > > u64 features)
> > > > > -{
> > > > > -    if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
> > > > > -        return -EOPNOTSUPP;
> > > > > -
> > > > > -    return 0;
> > > > > -}
> > > > > -
> > > > >    static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
> > > > >    {
> > > > >        int err;
> > > > > @@ -1558,18 +1550,13 @@ static int
> > > > > mlx5_vdpa_set_features(struct vdpa_device *vdev, u64
> > > > > features)
> > > > >    {
> > > > >        struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> > > > >        struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> > > > > -    int err;
> > > > >        print_features(mvdev, features, true);
> > > > > -    err = verify_min_features(mvdev, features);
> > > > > -    if (err)
> > > > > -        return err;
> > > > > -
> > > > >        ndev->mvdev.actual_features = features &
> > > > > ndev->mvdev.mlx_features;
> > > > >        ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
> > > > >        ndev->config.status |= cpu_to_mlx5vdpa16(mvdev,
> > > > > VIRTIO_NET_S_LINK_UP);
> > > > > -    return err;
> > > > > +    return 0;
> > > > >    }
> > > > >    static void mlx5_vdpa_set_config_cb(struct vdpa_device
> > > > > *vdev, struct vdpa_callback *cb)
> >

2021-02-23 13:56:19

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero


On 2021/2/23 下午5:25, Michael S. Tsirkin wrote:
> On Mon, Feb 22, 2021 at 09:09:28AM -0800, Si-Wei Liu wrote:
>>
>> On 2/21/2021 8:14 PM, Jason Wang wrote:
>>> On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
>>>> Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
>>>> for legacy") made an exception for legacy guests to reset
>>>> features to 0, when config space is accessed before features
>>>> are set. We should relieve the verify_min_features() check
>>>> and allow features reset to 0 for this case.
>>>>
>>>> It's worth noting that not just legacy guests could access
>>>> config space before features are set. For instance, when
>>>> feature VIRTIO_NET_F_MTU is advertised some modern driver
>>>> will try to access and validate the MTU present in the config
>>>> space before virtio features are set.
>>>
>>> This looks like a spec violation:
>>>
>>> "
>>>
>>> The following driver-read-only field, mtu only exists if
>>> VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the
>>> driver to use.
>>> "
>>>
>>> Do we really want to workaround this?
>> Isn't the commit 452639a64ad8 itself is a workaround for legacy guest?
>>
>> I think the point is, since there's legacy guest we'd have to support, this
>> host side workaround is unavoidable. Although I agree the violating driver
>> should be fixed (yes, it's in today's upstream kernel which exists for a
>> while now).
> Oh you are right:
>
>
> static int virtnet_validate(struct virtio_device *vdev)
> {
> if (!vdev->config->get) {
> dev_err(&vdev->dev, "%s failure: config access disabled\n",
> __func__);
> return -EINVAL;
> }
>
> if (!virtnet_validate_features(vdev))
> return -EINVAL;
>
> if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) {
> int mtu = virtio_cread16(vdev,
> offsetof(struct virtio_net_config,
> mtu));
> if (mtu < MIN_MTU)
> __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);


I wonder why not simply fail here?


> }
>
> return 0;
> }
>
> And the spec says:
>
>
> The driver MUST follow this sequence to initialize a device:
> 1. Reset the device.
> 2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
> 3. Set the DRIVER status bit: the guest OS knows how to drive the device.
> 4. Read device feature bits, and write the subset of feature bits understood by the OS and driver to the
> device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration
> fields to check that it can support the device before accepting it.
> 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step.
> 6. Re-read device status to ensure the FEATURES_OK bit is still set: otherwise, the device does not
> support our subset of features and the device is unusable.
> 7. Perform device-specific setup, including discovery of virtqueues for the device, optional per-bus setup,
> reading and possibly writing the device’s virtio configuration space, and population of virtqueues.
> 8. Set the DRIVER_OK status bit. At this point the device is “live”.
>
>
> Item 4 on the list explicitly allows reading config space before
> FEATURES_OK.
>
> I conclude that VIRTIO_NET_F_MTU is set means "set in device features".


So this probably need some clarification. "is set" is used many times in
the spec that has different implications.

Thanks


>
> Generally it is worth going over feature dependent config fields
> and checking whether they should be present when device feature is set
> or when feature bit has been negotiated, and making this clear.
>

2021-02-23 13:56:55

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero


On 2021/2/23 下午5:26, Michael S. Tsirkin wrote:
> On Mon, Feb 22, 2021 at 08:05:26AM +0200, Eli Cohen wrote:
>> On Sun, Feb 21, 2021 at 04:52:05PM -0500, Michael S. Tsirkin wrote:
>>> On Sun, Feb 21, 2021 at 04:44:37PM +0200, Eli Cohen wrote:
>>>> On Fri, Feb 19, 2021 at 06:54:58AM -0500, Si-Wei Liu wrote:
>>>>> Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
>>>>> for legacy") made an exception for legacy guests to reset
>>>>> features to 0, when config space is accessed before features
>>>>> are set. We should relieve the verify_min_features() check
>>>>> and allow features reset to 0 for this case.
>>>>>
>>>>> It's worth noting that not just legacy guests could access
>>>>> config space before features are set. For instance, when
>>>>> feature VIRTIO_NET_F_MTU is advertised some modern driver
>>>>> will try to access and validate the MTU present in the config
>>>>> space before virtio features are set. Rejecting reset to 0
>>>>> prematurely causes correct MTU and link status unable to load
>>>>> for the very first config space access, rendering issues like
>>>>> guest showing inaccurate MTU value, or failure to reject
>>>>> out-of-range MTU.
>>>>>
>>>>> Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
>>>>> Signed-off-by: Si-Wei Liu<[email protected]>
>>>>> ---
>>>>> drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--------------
>>>>> 1 file changed, 1 insertion(+), 14 deletions(-)
>>>>>
>>>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>> index 7c1f789..540dd67 100644
>>>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>> @@ -1490,14 +1490,6 @@ static u64 mlx5_vdpa_get_features(struct vdpa_device *vdev)
>>>>> return mvdev->mlx_features;
>>>>> }
>>>>>
>>>>> -static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64 features)
>>>>> -{
>>>>> - if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
>>>>> - return -EOPNOTSUPP;
>>>>> -
>>>>> - return 0;
>>>>> -}
>>>>> -
>>>> But what if VIRTIO_F_ACCESS_PLATFORM is not offerred? This does not
>>>> support such cases.
>>> Did you mean "catch such cases" rather than "support"?
>>>
>> Actually I meant this driver/device does not support such cases.
> Well the removed code merely failed without VIRTIO_F_ACCESS_PLATFORM
> it didn't actually try to support anything ...


I think it's used to catch the driver that doesn't support ACCESS_PLATFORM?

Thanks


>

2021-02-23 14:01:12

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

On Tue, Feb 23, 2021 at 05:48:10PM +0800, Jason Wang wrote:
>
> On 2021/2/23 下午5:26, Michael S. Tsirkin wrote:
> > On Mon, Feb 22, 2021 at 08:05:26AM +0200, Eli Cohen wrote:
> > > On Sun, Feb 21, 2021 at 04:52:05PM -0500, Michael S. Tsirkin wrote:
> > > > On Sun, Feb 21, 2021 at 04:44:37PM +0200, Eli Cohen wrote:
> > > > > On Fri, Feb 19, 2021 at 06:54:58AM -0500, Si-Wei Liu wrote:
> > > > > > Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
> > > > > > for legacy") made an exception for legacy guests to reset
> > > > > > features to 0, when config space is accessed before features
> > > > > > are set. We should relieve the verify_min_features() check
> > > > > > and allow features reset to 0 for this case.
> > > > > >
> > > > > > It's worth noting that not just legacy guests could access
> > > > > > config space before features are set. For instance, when
> > > > > > feature VIRTIO_NET_F_MTU is advertised some modern driver
> > > > > > will try to access and validate the MTU present in the config
> > > > > > space before virtio features are set. Rejecting reset to 0
> > > > > > prematurely causes correct MTU and link status unable to load
> > > > > > for the very first config space access, rendering issues like
> > > > > > guest showing inaccurate MTU value, or failure to reject
> > > > > > out-of-range MTU.
> > > > > >
> > > > > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
> > > > > > Signed-off-by: Si-Wei Liu<[email protected]>
> > > > > > ---
> > > > > > drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--------------
> > > > > > 1 file changed, 1 insertion(+), 14 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > index 7c1f789..540dd67 100644
> > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > @@ -1490,14 +1490,6 @@ static u64 mlx5_vdpa_get_features(struct vdpa_device *vdev)
> > > > > > return mvdev->mlx_features;
> > > > > > }
> > > > > > -static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64 features)
> > > > > > -{
> > > > > > - if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
> > > > > > - return -EOPNOTSUPP;
> > > > > > -
> > > > > > - return 0;
> > > > > > -}
> > > > > > -
> > > > > But what if VIRTIO_F_ACCESS_PLATFORM is not offerred? This does not
> > > > > support such cases.
> > > > Did you mean "catch such cases" rather than "support"?
> > > >
> > > Actually I meant this driver/device does not support such cases.
> > Well the removed code merely failed without VIRTIO_F_ACCESS_PLATFORM
> > it didn't actually try to support anything ...
>
>
> I think it's used to catch the driver that doesn't support ACCESS_PLATFORM?
>
> Thanks
>

That is why I asked whether Eli meant catch.

--
MST

2021-02-23 14:03:03

by Cornelia Huck

[permalink] [raw]
Subject: Re: [virtio-dev] Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

On Tue, 23 Feb 2021 17:46:20 +0800
Jason Wang <[email protected]> wrote:

> On 2021/2/23 下午5:25, Michael S. Tsirkin wrote:
> > On Mon, Feb 22, 2021 at 09:09:28AM -0800, Si-Wei Liu wrote:
> >>
> >> On 2/21/2021 8:14 PM, Jason Wang wrote:
> >>> On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
> >>>> Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
> >>>> for legacy") made an exception for legacy guests to reset
> >>>> features to 0, when config space is accessed before features
> >>>> are set. We should relieve the verify_min_features() check
> >>>> and allow features reset to 0 for this case.
> >>>>
> >>>> It's worth noting that not just legacy guests could access
> >>>> config space before features are set. For instance, when
> >>>> feature VIRTIO_NET_F_MTU is advertised some modern driver
> >>>> will try to access and validate the MTU present in the config
> >>>> space before virtio features are set.
> >>>
> >>> This looks like a spec violation:
> >>>
> >>> "
> >>>
> >>> The following driver-read-only field, mtu only exists if
> >>> VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the
> >>> driver to use.
> >>> "
> >>>
> >>> Do we really want to workaround this?
> >> Isn't the commit 452639a64ad8 itself is a workaround for legacy guest?
> >>
> >> I think the point is, since there's legacy guest we'd have to support, this
> >> host side workaround is unavoidable. Although I agree the violating driver
> >> should be fixed (yes, it's in today's upstream kernel which exists for a
> >> while now).
> > Oh you are right:
> >
> >
> > static int virtnet_validate(struct virtio_device *vdev)
> > {
> > if (!vdev->config->get) {
> > dev_err(&vdev->dev, "%s failure: config access disabled\n",
> > __func__);
> > return -EINVAL;
> > }
> >
> > if (!virtnet_validate_features(vdev))
> > return -EINVAL;
> >
> > if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) {
> > int mtu = virtio_cread16(vdev,
> > offsetof(struct virtio_net_config,
> > mtu));
> > if (mtu < MIN_MTU)
> > __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);
>
>
> I wonder why not simply fail here?

I think both failing or not accepting the feature can be argued to make
sense: "the device presented us with a mtu size that does not make
sense" would point to failing, "we cannot work with the mtu size that
the device presented us" would point to not negotiating the feature.

>
>
> > }
> >
> > return 0;
> > }
> >
> > And the spec says:
> >
> >
> > The driver MUST follow this sequence to initialize a device:
> > 1. Reset the device.
> > 2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
> > 3. Set the DRIVER status bit: the guest OS knows how to drive the device.
> > 4. Read device feature bits, and write the subset of feature bits understood by the OS and driver to the
> > device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration
> > fields to check that it can support the device before accepting it.
> > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step.
> > 6. Re-read device status to ensure the FEATURES_OK bit is still set: otherwise, the device does not
> > support our subset of features and the device is unusable.
> > 7. Perform device-specific setup, including discovery of virtqueues for the device, optional per-bus setup,
> > reading and possibly writing the device’s virtio configuration space, and population of virtqueues.
> > 8. Set the DRIVER_OK status bit. At this point the device is “live”.
> >
> >
> > Item 4 on the list explicitly allows reading config space before
> > FEATURES_OK.
> >
> > I conclude that VIRTIO_NET_F_MTU is set means "set in device features".
>
>
> So this probably need some clarification. "is set" is used many times in
> the spec that has different implications.

Before FEATURES_OK is set by the driver, I guess it means "the device
has offered the feature"; during normal usage, it means "the feature
has been negotiated". (This is a bit fuzzy for legacy mode.)

Should we add a wording clarification to the spec?

2021-02-23 14:04:47

by Jason Wang

[permalink] [raw]
Subject: Re: [virtio-dev] Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero


On 2021/2/23 6:04 下午, Cornelia Huck wrote:
> On Tue, 23 Feb 2021 17:46:20 +0800
> Jason Wang <[email protected]> wrote:
>
>> On 2021/2/23 下午5:25, Michael S. Tsirkin wrote:
>>> On Mon, Feb 22, 2021 at 09:09:28AM -0800, Si-Wei Liu wrote:
>>>> On 2/21/2021 8:14 PM, Jason Wang wrote:
>>>>> On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
>>>>>> Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
>>>>>> for legacy") made an exception for legacy guests to reset
>>>>>> features to 0, when config space is accessed before features
>>>>>> are set. We should relieve the verify_min_features() check
>>>>>> and allow features reset to 0 for this case.
>>>>>>
>>>>>> It's worth noting that not just legacy guests could access
>>>>>> config space before features are set. For instance, when
>>>>>> feature VIRTIO_NET_F_MTU is advertised some modern driver
>>>>>> will try to access and validate the MTU present in the config
>>>>>> space before virtio features are set.
>>>>> This looks like a spec violation:
>>>>>
>>>>> "
>>>>>
>>>>> The following driver-read-only field, mtu only exists if
>>>>> VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the
>>>>> driver to use.
>>>>> "
>>>>>
>>>>> Do we really want to workaround this?
>>>> Isn't the commit 452639a64ad8 itself is a workaround for legacy guest?
>>>>
>>>> I think the point is, since there's legacy guest we'd have to support, this
>>>> host side workaround is unavoidable. Although I agree the violating driver
>>>> should be fixed (yes, it's in today's upstream kernel which exists for a
>>>> while now).
>>> Oh you are right:
>>>
>>>
>>> static int virtnet_validate(struct virtio_device *vdev)
>>> {
>>> if (!vdev->config->get) {
>>> dev_err(&vdev->dev, "%s failure: config access disabled\n",
>>> __func__);
>>> return -EINVAL;
>>> }
>>>
>>> if (!virtnet_validate_features(vdev))
>>> return -EINVAL;
>>>
>>> if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) {
>>> int mtu = virtio_cread16(vdev,
>>> offsetof(struct virtio_net_config,
>>> mtu));
>>> if (mtu < MIN_MTU)
>>> __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);
>>
>> I wonder why not simply fail here?
> I think both failing or not accepting the feature can be argued to make
> sense: "the device presented us with a mtu size that does not make
> sense" would point to failing, "we cannot work with the mtu size that
> the device presented us" would point to not negotiating the feature.
>
>>
>>> }
>>>
>>> return 0;
>>> }
>>>
>>> And the spec says:
>>>
>>>
>>> The driver MUST follow this sequence to initialize a device:
>>> 1. Reset the device.
>>> 2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
>>> 3. Set the DRIVER status bit: the guest OS knows how to drive the device.
>>> 4. Read device feature bits, and write the subset of feature bits understood by the OS and driver to the
>>> device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration
>>> fields to check that it can support the device before accepting it.
>>> 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step.
>>> 6. Re-read device status to ensure the FEATURES_OK bit is still set: otherwise, the device does not
>>> support our subset of features and the device is unusable.
>>> 7. Perform device-specific setup, including discovery of virtqueues for the device, optional per-bus setup,
>>> reading and possibly writing the device’s virtio configuration space, and population of virtqueues.
>>> 8. Set the DRIVER_OK status bit. At this point the device is “live”.
>>>
>>>
>>> Item 4 on the list explicitly allows reading config space before
>>> FEATURES_OK.
>>>
>>> I conclude that VIRTIO_NET_F_MTU is set means "set in device features".
>>
>> So this probably need some clarification. "is set" is used many times in
>> the spec that has different implications.
> Before FEATURES_OK is set by the driver, I guess it means "the device
> has offered the feature";


For me this part is ok since it clarify that it's the driver that set
the bit.



> during normal usage, it means "the feature
> has been negotiated".

/?

It looks to me the feature negotiation is done only after device set
FEATURES_OK, or FEATURES_OK could be read from device status?


> (This is a bit fuzzy for legacy mode.)


The problem is the MTU description for example:

"The following driver-read-only field, mtu only exists if
VIRTIO_NET_F_MTU is set."

It looks to me need to use "if VIRTIO_NET_F_MTU is set by device".
Otherwise readers (at least for me), may think the MTU is only valid if
driver set the bit.


>
> Should we add a wording clarification to the spec?


I think so.

Thanks

>

2021-02-23 23:47:00

by Si-Wei Liu

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero



On 2/23/2021 5:26 AM, Michael S. Tsirkin wrote:
> On Tue, Feb 23, 2021 at 10:03:57AM +0800, Jason Wang wrote:
>> On 2021/2/23 9:12 上午, Si-Wei Liu wrote:
>>>
>>> On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote:
>>>> On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote:
>>>>> On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
>>>>>> Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
>>>>>> for legacy") made an exception for legacy guests to reset
>>>>>> features to 0, when config space is accessed before features
>>>>>> are set. We should relieve the verify_min_features() check
>>>>>> and allow features reset to 0 for this case.
>>>>>>
>>>>>> It's worth noting that not just legacy guests could access
>>>>>> config space before features are set. For instance, when
>>>>>> feature VIRTIO_NET_F_MTU is advertised some modern driver
>>>>>> will try to access and validate the MTU present in the config
>>>>>> space before virtio features are set.
>>>>> This looks like a spec violation:
>>>>>
>>>>> "
>>>>>
>>>>> The following driver-read-only field, mtu only exists if
>>>>> VIRTIO_NET_F_MTU is
>>>>> set.
>>>>> This field specifies the maximum MTU for the driver to use.
>>>>> "
>>>>>
>>>>> Do we really want to workaround this?
>>>>>
>>>>> Thanks
>>>> And also:
>>>>
>>>> The driver MUST follow this sequence to initialize a device:
>>>> 1. Reset the device.
>>>> 2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
>>>> 3. Set the DRIVER status bit: the guest OS knows how to drive the
>>>> device.
>>>> 4. Read device feature bits, and write the subset of feature bits
>>>> understood by the OS and driver to the
>>>> device. During this step the driver MAY read (but MUST NOT write)
>>>> the device-specific configuration
>>>> fields to check that it can support the device before accepting it.
>>>> 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new
>>>> feature bits after this step.
>>>> 6. Re-read device status to ensure the FEATURES_OK bit is still set:
>>>> otherwise, the device does not
>>>> support our subset of features and the device is unusable.
>>>> 7. Perform device-specific setup, including discovery of virtqueues
>>>> for the device, optional per-bus setup,
>>>> reading and possibly writing the device’s virtio configuration
>>>> space, and population of virtqueues.
>>>> 8. Set the DRIVER_OK status bit. At this point the device is “live”.
>>>>
>>>>
>>>> so accessing config space before FEATURES_OK is a spec violation, right?
>>> It is, but it's not relevant to what this commit tries to address. I
>>> thought the legacy guest still needs to be supported.
>>>
>>> Having said, a separate patch has to be posted to fix the guest driver
>>> issue where this discrepancy is introduced to virtnet_validate() (since
>>> commit fe36cbe067). But it's not technically related to this patch.
>>>
>>> -Siwei
>>
>> I think it's a bug to read config space in validate, we should move it to
>> virtnet_probe().
>>
>> Thanks
> I take it back, reading but not writing seems to be explicitly allowed by spec.
> So our way to detect a legacy guest is bogus, need to think what is
> the best way to handle this.
Then maybe revert commit fe36cbe067 and friends, and have QEMU detect
legacy guest? Supposedly only config space write access needs to be
guarded before setting FEATURES_OK.

-Siwie

>>>>
>>>>>> Rejecting reset to 0
>>>>>> prematurely causes correct MTU and link status unable to load
>>>>>> for the very first config space access, rendering issues like
>>>>>> guest showing inaccurate MTU value, or failure to reject
>>>>>> out-of-range MTU.
>>>>>>
>>>>>> Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for
>>>>>> supported mlx5 devices")
>>>>>> Signed-off-by: Si-Wei Liu <[email protected]>
>>>>>> ---
>>>>>>    drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--------------
>>>>>>    1 file changed, 1 insertion(+), 14 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>> index 7c1f789..540dd67 100644
>>>>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>> @@ -1490,14 +1490,6 @@ static u64
>>>>>> mlx5_vdpa_get_features(struct vdpa_device *vdev)
>>>>>>        return mvdev->mlx_features;
>>>>>>    }
>>>>>> -static int verify_min_features(struct mlx5_vdpa_dev *mvdev,
>>>>>> u64 features)
>>>>>> -{
>>>>>> -    if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
>>>>>> -        return -EOPNOTSUPP;
>>>>>> -
>>>>>> -    return 0;
>>>>>> -}
>>>>>> -
>>>>>>    static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
>>>>>>    {
>>>>>>        int err;
>>>>>> @@ -1558,18 +1550,13 @@ static int
>>>>>> mlx5_vdpa_set_features(struct vdpa_device *vdev, u64
>>>>>> features)
>>>>>>    {
>>>>>>        struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>>>>>>        struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
>>>>>> -    int err;
>>>>>>        print_features(mvdev, features, true);
>>>>>> -    err = verify_min_features(mvdev, features);
>>>>>> -    if (err)
>>>>>> -        return err;
>>>>>> -
>>>>>>        ndev->mvdev.actual_features = features &
>>>>>> ndev->mvdev.mlx_features;
>>>>>>        ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
>>>>>>        ndev->config.status |= cpu_to_mlx5vdpa16(mvdev,
>>>>>> VIRTIO_NET_S_LINK_UP);
>>>>>> -    return err;
>>>>>> +    return 0;
>>>>>>    }
>>>>>>    static void mlx5_vdpa_set_config_cb(struct vdpa_device
>>>>>> *vdev, struct vdpa_callback *cb)

2021-02-24 07:16:03

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero


On 2021/2/24 3:35 上午, Si-Wei Liu wrote:
>
>
> On 2/23/2021 5:26 AM, Michael S. Tsirkin wrote:
>> On Tue, Feb 23, 2021 at 10:03:57AM +0800, Jason Wang wrote:
>>> On 2021/2/23 9:12 上午, Si-Wei Liu wrote:
>>>>
>>>> On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote:
>>>>> On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote:
>>>>>> On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
>>>>>>> Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
>>>>>>> for legacy") made an exception for legacy guests to reset
>>>>>>> features to 0, when config space is accessed before features
>>>>>>> are set. We should relieve the verify_min_features() check
>>>>>>> and allow features reset to 0 for this case.
>>>>>>>
>>>>>>> It's worth noting that not just legacy guests could access
>>>>>>> config space before features are set. For instance, when
>>>>>>> feature VIRTIO_NET_F_MTU is advertised some modern driver
>>>>>>> will try to access and validate the MTU present in the config
>>>>>>> space before virtio features are set.
>>>>>> This looks like a spec violation:
>>>>>>
>>>>>> "
>>>>>>
>>>>>> The following driver-read-only field, mtu only exists if
>>>>>> VIRTIO_NET_F_MTU is
>>>>>> set.
>>>>>> This field specifies the maximum MTU for the driver to use.
>>>>>> "
>>>>>>
>>>>>> Do we really want to workaround this?
>>>>>>
>>>>>> Thanks
>>>>> And also:
>>>>>
>>>>> The driver MUST follow this sequence to initialize a device:
>>>>> 1. Reset the device.
>>>>> 2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the
>>>>> device.
>>>>> 3. Set the DRIVER status bit: the guest OS knows how to drive the
>>>>> device.
>>>>> 4. Read device feature bits, and write the subset of feature bits
>>>>> understood by the OS and driver to the
>>>>> device. During this step the driver MAY read (but MUST NOT write)
>>>>> the device-specific configuration
>>>>> fields to check that it can support the device before accepting it.
>>>>> 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new
>>>>> feature bits after this step.
>>>>> 6. Re-read device status to ensure the FEATURES_OK bit is still set:
>>>>> otherwise, the device does not
>>>>> support our subset of features and the device is unusable.
>>>>> 7. Perform device-specific setup, including discovery of virtqueues
>>>>> for the device, optional per-bus setup,
>>>>> reading and possibly writing the device’s virtio configuration
>>>>> space, and population of virtqueues.
>>>>> 8. Set the DRIVER_OK status bit. At this point the device is “live”.
>>>>>
>>>>>
>>>>> so accessing config space before FEATURES_OK is a spec violation,
>>>>> right?
>>>> It is, but it's not relevant to what this commit tries to address. I
>>>> thought the legacy guest still needs to be supported.
>>>>
>>>> Having said, a separate patch has to be posted to fix the guest driver
>>>> issue where this discrepancy is introduced to virtnet_validate()
>>>> (since
>>>> commit fe36cbe067). But it's not technically related to this patch.
>>>>
>>>> -Siwei
>>>
>>> I think it's a bug to read config space in validate, we should move
>>> it to
>>> virtnet_probe().
>>>
>>> Thanks
>> I take it back, reading but not writing seems to be explicitly
>> allowed by spec.
>> So our way to detect a legacy guest is bogus, need to think what is
>> the best way to handle this.
> Then maybe revert commit fe36cbe067 and friends, and have QEMU detect
> legacy guest? Supposedly only config space write access needs to be
> guarded before setting FEATURES_OK.


I agree. My understanding is that all vDPA must be modern device (since
VIRITO_F_ACCESS_PLATFORM is mandated) instead of transitional device.

Thanks


>
> -Siwie
>
>>>>>
>>>>>>> Rejecting reset to 0
>>>>>>> prematurely causes correct MTU and link status unable to load
>>>>>>> for the very first config space access, rendering issues like
>>>>>>> guest showing inaccurate MTU value, or failure to reject
>>>>>>> out-of-range MTU.
>>>>>>>
>>>>>>> Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for
>>>>>>> supported mlx5 devices")
>>>>>>> Signed-off-by: Si-Wei Liu <[email protected]>
>>>>>>> ---
>>>>>>>     drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--------------
>>>>>>>     1 file changed, 1 insertion(+), 14 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>> index 7c1f789..540dd67 100644
>>>>>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>> @@ -1490,14 +1490,6 @@ static u64
>>>>>>> mlx5_vdpa_get_features(struct vdpa_device *vdev)
>>>>>>>         return mvdev->mlx_features;
>>>>>>>     }
>>>>>>> -static int verify_min_features(struct mlx5_vdpa_dev *mvdev,
>>>>>>> u64 features)
>>>>>>> -{
>>>>>>> -    if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
>>>>>>> -        return -EOPNOTSUPP;
>>>>>>> -
>>>>>>> -    return 0;
>>>>>>> -}
>>>>>>> -
>>>>>>>     static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
>>>>>>>     {
>>>>>>>         int err;
>>>>>>> @@ -1558,18 +1550,13 @@ static int
>>>>>>> mlx5_vdpa_set_features(struct vdpa_device *vdev, u64
>>>>>>> features)
>>>>>>>     {
>>>>>>>         struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>>>>>>>         struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
>>>>>>> -    int err;
>>>>>>>         print_features(mvdev, features, true);
>>>>>>> -    err = verify_min_features(mvdev, features);
>>>>>>> -    if (err)
>>>>>>> -        return err;
>>>>>>> -
>>>>>>>         ndev->mvdev.actual_features = features &
>>>>>>> ndev->mvdev.mlx_features;
>>>>>>>         ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
>>>>>>>         ndev->config.status |= cpu_to_mlx5vdpa16(mvdev,
>>>>>>> VIRTIO_NET_S_LINK_UP);
>>>>>>> -    return err;
>>>>>>> +    return 0;
>>>>>>>     }
>>>>>>>     static void mlx5_vdpa_set_config_cb(struct vdpa_device
>>>>>>> *vdev, struct vdpa_callback *cb)
>

2021-02-24 07:50:30

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

On Wed, Feb 24, 2021 at 11:20:01AM +0800, Jason Wang wrote:
>
> On 2021/2/24 3:35 上午, Si-Wei Liu wrote:
> >
> >
> > On 2/23/2021 5:26 AM, Michael S. Tsirkin wrote:
> > > On Tue, Feb 23, 2021 at 10:03:57AM +0800, Jason Wang wrote:
> > > > On 2021/2/23 9:12 上午, Si-Wei Liu wrote:
> > > > >
> > > > > On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote:
> > > > > > On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote:
> > > > > > > On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
> > > > > > > > Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
> > > > > > > > for legacy") made an exception for legacy guests to reset
> > > > > > > > features to 0, when config space is accessed before features
> > > > > > > > are set. We should relieve the verify_min_features() check
> > > > > > > > and allow features reset to 0 for this case.
> > > > > > > >
> > > > > > > > It's worth noting that not just legacy guests could access
> > > > > > > > config space before features are set. For instance, when
> > > > > > > > feature VIRTIO_NET_F_MTU is advertised some modern driver
> > > > > > > > will try to access and validate the MTU present in the config
> > > > > > > > space before virtio features are set.
> > > > > > > This looks like a spec violation:
> > > > > > >
> > > > > > > "
> > > > > > >
> > > > > > > The following driver-read-only field, mtu only exists if
> > > > > > > VIRTIO_NET_F_MTU is
> > > > > > > set.
> > > > > > > This field specifies the maximum MTU for the driver to use.
> > > > > > > "
> > > > > > >
> > > > > > > Do we really want to workaround this?
> > > > > > >
> > > > > > > Thanks
> > > > > > And also:
> > > > > >
> > > > > > The driver MUST follow this sequence to initialize a device:
> > > > > > 1. Reset the device.
> > > > > > 2. Set the ACKNOWLEDGE status bit: the guest OS has
> > > > > > noticed the device.
> > > > > > 3. Set the DRIVER status bit: the guest OS knows how to drive the
> > > > > > device.
> > > > > > 4. Read device feature bits, and write the subset of feature bits
> > > > > > understood by the OS and driver to the
> > > > > > device. During this step the driver MAY read (but MUST NOT write)
> > > > > > the device-specific configuration
> > > > > > fields to check that it can support the device before accepting it.
> > > > > > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new
> > > > > > feature bits after this step.
> > > > > > 6. Re-read device status to ensure the FEATURES_OK bit is still set:
> > > > > > otherwise, the device does not
> > > > > > support our subset of features and the device is unusable.
> > > > > > 7. Perform device-specific setup, including discovery of virtqueues
> > > > > > for the device, optional per-bus setup,
> > > > > > reading and possibly writing the device’s virtio configuration
> > > > > > space, and population of virtqueues.
> > > > > > 8. Set the DRIVER_OK status bit. At this point the device is “live”.
> > > > > >
> > > > > >
> > > > > > so accessing config space before FEATURES_OK is a spec
> > > > > > violation, right?
> > > > > It is, but it's not relevant to what this commit tries to address. I
> > > > > thought the legacy guest still needs to be supported.
> > > > >
> > > > > Having said, a separate patch has to be posted to fix the guest driver
> > > > > issue where this discrepancy is introduced to
> > > > > virtnet_validate() (since
> > > > > commit fe36cbe067). But it's not technically related to this patch.
> > > > >
> > > > > -Siwei
> > > >
> > > > I think it's a bug to read config space in validate, we should
> > > > move it to
> > > > virtnet_probe().
> > > >
> > > > Thanks
> > > I take it back, reading but not writing seems to be explicitly
> > > allowed by spec.
> > > So our way to detect a legacy guest is bogus, need to think what is
> > > the best way to handle this.
> > Then maybe revert commit fe36cbe067 and friends, and have QEMU detect
> > legacy guest? Supposedly only config space write access needs to be
> > guarded before setting FEATURES_OK.
>
>
> I agree. My understanding is that all vDPA must be modern device (since
> VIRITO_F_ACCESS_PLATFORM is mandated) instead of transitional device.
>
> Thanks

Well mlx5 has some code to handle legacy guests ...
Eli, could you comment? Is that support unused right now?


>
> >
> > -Siwie
> >
> > > > > >
> > > > > > > > Rejecting reset to 0
> > > > > > > > prematurely causes correct MTU and link status unable to load
> > > > > > > > for the very first config space access, rendering issues like
> > > > > > > > guest showing inaccurate MTU value, or failure to reject
> > > > > > > > out-of-range MTU.
> > > > > > > >
> > > > > > > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for
> > > > > > > > supported mlx5 devices")
> > > > > > > > Signed-off-by: Si-Wei Liu <[email protected]>
> > > > > > > > ---
> > > > > > > >     drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--------------
> > > > > > > >     1 file changed, 1 insertion(+), 14 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > index 7c1f789..540dd67 100644
> > > > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > @@ -1490,14 +1490,6 @@ static u64
> > > > > > > > mlx5_vdpa_get_features(struct vdpa_device *vdev)
> > > > > > > >         return mvdev->mlx_features;
> > > > > > > >     }
> > > > > > > > -static int verify_min_features(struct mlx5_vdpa_dev *mvdev,
> > > > > > > > u64 features)
> > > > > > > > -{
> > > > > > > > -    if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
> > > > > > > > -        return -EOPNOTSUPP;
> > > > > > > > -
> > > > > > > > -    return 0;
> > > > > > > > -}
> > > > > > > > -
> > > > > > > >     static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
> > > > > > > >     {
> > > > > > > >         int err;
> > > > > > > > @@ -1558,18 +1550,13 @@ static int
> > > > > > > > mlx5_vdpa_set_features(struct vdpa_device *vdev, u64
> > > > > > > > features)
> > > > > > > >     {
> > > > > > > >         struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> > > > > > > >         struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> > > > > > > > -    int err;
> > > > > > > >         print_features(mvdev, features, true);
> > > > > > > > -    err = verify_min_features(mvdev, features);
> > > > > > > > -    if (err)
> > > > > > > > -        return err;
> > > > > > > > -
> > > > > > > >         ndev->mvdev.actual_features = features &
> > > > > > > > ndev->mvdev.mlx_features;
> > > > > > > >         ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
> > > > > > > >         ndev->config.status |= cpu_to_mlx5vdpa16(mvdev,
> > > > > > > > VIRTIO_NET_S_LINK_UP);
> > > > > > > > -    return err;
> > > > > > > > +    return 0;
> > > > > > > >     }
> > > > > > > >     static void mlx5_vdpa_set_config_cb(struct vdpa_device
> > > > > > > > *vdev, struct vdpa_callback *cb)
> >

2021-02-24 08:05:25

by Eli Cohen

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

On Wed, Feb 24, 2021 at 12:17:58AM -0500, Michael S. Tsirkin wrote:
> On Wed, Feb 24, 2021 at 11:20:01AM +0800, Jason Wang wrote:
> >
> > On 2021/2/24 3:35 上午, Si-Wei Liu wrote:
> > >
> > >
> > > On 2/23/2021 5:26 AM, Michael S. Tsirkin wrote:
> > > > On Tue, Feb 23, 2021 at 10:03:57AM +0800, Jason Wang wrote:
> > > > > On 2021/2/23 9:12 上午, Si-Wei Liu wrote:
> > > > > >
> > > > > > On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote:
> > > > > > > On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote:
> > > > > > > > On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
> > > > > > > > > Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
> > > > > > > > > for legacy") made an exception for legacy guests to reset
> > > > > > > > > features to 0, when config space is accessed before features
> > > > > > > > > are set. We should relieve the verify_min_features() check
> > > > > > > > > and allow features reset to 0 for this case.
> > > > > > > > >
> > > > > > > > > It's worth noting that not just legacy guests could access
> > > > > > > > > config space before features are set. For instance, when
> > > > > > > > > feature VIRTIO_NET_F_MTU is advertised some modern driver
> > > > > > > > > will try to access and validate the MTU present in the config
> > > > > > > > > space before virtio features are set.
> > > > > > > > This looks like a spec violation:
> > > > > > > >
> > > > > > > > "
> > > > > > > >
> > > > > > > > The following driver-read-only field, mtu only exists if
> > > > > > > > VIRTIO_NET_F_MTU is
> > > > > > > > set.
> > > > > > > > This field specifies the maximum MTU for the driver to use.
> > > > > > > > "
> > > > > > > >
> > > > > > > > Do we really want to workaround this?
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > And also:
> > > > > > >
> > > > > > > The driver MUST follow this sequence to initialize a device:
> > > > > > > 1. Reset the device.
> > > > > > > 2. Set the ACKNOWLEDGE status bit: the guest OS has
> > > > > > > noticed the device.
> > > > > > > 3. Set the DRIVER status bit: the guest OS knows how to drive the
> > > > > > > device.
> > > > > > > 4. Read device feature bits, and write the subset of feature bits
> > > > > > > understood by the OS and driver to the
> > > > > > > device. During this step the driver MAY read (but MUST NOT write)
> > > > > > > the device-specific configuration
> > > > > > > fields to check that it can support the device before accepting it.
> > > > > > > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new
> > > > > > > feature bits after this step.
> > > > > > > 6. Re-read device status to ensure the FEATURES_OK bit is still set:
> > > > > > > otherwise, the device does not
> > > > > > > support our subset of features and the device is unusable.
> > > > > > > 7. Perform device-specific setup, including discovery of virtqueues
> > > > > > > for the device, optional per-bus setup,
> > > > > > > reading and possibly writing the device’s virtio configuration
> > > > > > > space, and population of virtqueues.
> > > > > > > 8. Set the DRIVER_OK status bit. At this point the device is “live”.
> > > > > > >
> > > > > > >
> > > > > > > so accessing config space before FEATURES_OK is a spec
> > > > > > > violation, right?
> > > > > > It is, but it's not relevant to what this commit tries to address. I
> > > > > > thought the legacy guest still needs to be supported.
> > > > > >
> > > > > > Having said, a separate patch has to be posted to fix the guest driver
> > > > > > issue where this discrepancy is introduced to
> > > > > > virtnet_validate() (since
> > > > > > commit fe36cbe067). But it's not technically related to this patch.
> > > > > >
> > > > > > -Siwei
> > > > >
> > > > > I think it's a bug to read config space in validate, we should
> > > > > move it to
> > > > > virtnet_probe().
> > > > >
> > > > > Thanks
> > > > I take it back, reading but not writing seems to be explicitly
> > > > allowed by spec.
> > > > So our way to detect a legacy guest is bogus, need to think what is
> > > > the best way to handle this.
> > > Then maybe revert commit fe36cbe067 and friends, and have QEMU detect
> > > legacy guest? Supposedly only config space write access needs to be
> > > guarded before setting FEATURES_OK.
> >
> >
> > I agree. My understanding is that all vDPA must be modern device (since
> > VIRITO_F_ACCESS_PLATFORM is mandated) instead of transitional device.
> >
> > Thanks
>
> Well mlx5 has some code to handle legacy guests ...
> Eli, could you comment? Is that support unused right now?
>

If you mean support for version 1.0, well the knob is there but it's not
set in the firmware I use. Note sure if we will support this.

>
> >
> > >
> > > -Siwie
> > >
> > > > > > >
> > > > > > > > > Rejecting reset to 0
> > > > > > > > > prematurely causes correct MTU and link status unable to load
> > > > > > > > > for the very first config space access, rendering issues like
> > > > > > > > > guest showing inaccurate MTU value, or failure to reject
> > > > > > > > > out-of-range MTU.
> > > > > > > > >
> > > > > > > > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for
> > > > > > > > > supported mlx5 devices")
> > > > > > > > > Signed-off-by: Si-Wei Liu <[email protected]>
> > > > > > > > > ---
> > > > > > > > >     drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--------------
> > > > > > > > >     1 file changed, 1 insertion(+), 14 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > index 7c1f789..540dd67 100644
> > > > > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > @@ -1490,14 +1490,6 @@ static u64
> > > > > > > > > mlx5_vdpa_get_features(struct vdpa_device *vdev)
> > > > > > > > >         return mvdev->mlx_features;
> > > > > > > > >     }
> > > > > > > > > -static int verify_min_features(struct mlx5_vdpa_dev *mvdev,
> > > > > > > > > u64 features)
> > > > > > > > > -{
> > > > > > > > > -    if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
> > > > > > > > > -        return -EOPNOTSUPP;
> > > > > > > > > -
> > > > > > > > > -    return 0;
> > > > > > > > > -}
> > > > > > > > > -
> > > > > > > > >     static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
> > > > > > > > >     {
> > > > > > > > >         int err;
> > > > > > > > > @@ -1558,18 +1550,13 @@ static int
> > > > > > > > > mlx5_vdpa_set_features(struct vdpa_device *vdev, u64
> > > > > > > > > features)
> > > > > > > > >     {
> > > > > > > > >         struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> > > > > > > > >         struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> > > > > > > > > -    int err;
> > > > > > > > >         print_features(mvdev, features, true);
> > > > > > > > > -    err = verify_min_features(mvdev, features);
> > > > > > > > > -    if (err)
> > > > > > > > > -        return err;
> > > > > > > > > -
> > > > > > > > >         ndev->mvdev.actual_features = features &
> > > > > > > > > ndev->mvdev.mlx_features;
> > > > > > > > >         ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
> > > > > > > > >         ndev->config.status |= cpu_to_mlx5vdpa16(mvdev,
> > > > > > > > > VIRTIO_NET_S_LINK_UP);
> > > > > > > > > -    return err;
> > > > > > > > > +    return 0;
> > > > > > > > >     }
> > > > > > > > >     static void mlx5_vdpa_set_config_cb(struct vdpa_device
> > > > > > > > > *vdev, struct vdpa_callback *cb)
> > >
>

2021-02-24 08:06:32

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

On Wed, Feb 24, 2021 at 08:45:20AM +0200, Eli Cohen wrote:
> On Wed, Feb 24, 2021 at 12:17:58AM -0500, Michael S. Tsirkin wrote:
> > On Wed, Feb 24, 2021 at 11:20:01AM +0800, Jason Wang wrote:
> > >
> > > On 2021/2/24 3:35 上午, Si-Wei Liu wrote:
> > > >
> > > >
> > > > On 2/23/2021 5:26 AM, Michael S. Tsirkin wrote:
> > > > > On Tue, Feb 23, 2021 at 10:03:57AM +0800, Jason Wang wrote:
> > > > > > On 2021/2/23 9:12 上午, Si-Wei Liu wrote:
> > > > > > >
> > > > > > > On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote:
> > > > > > > > On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote:
> > > > > > > > > On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
> > > > > > > > > > Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
> > > > > > > > > > for legacy") made an exception for legacy guests to reset
> > > > > > > > > > features to 0, when config space is accessed before features
> > > > > > > > > > are set. We should relieve the verify_min_features() check
> > > > > > > > > > and allow features reset to 0 for this case.
> > > > > > > > > >
> > > > > > > > > > It's worth noting that not just legacy guests could access
> > > > > > > > > > config space before features are set. For instance, when
> > > > > > > > > > feature VIRTIO_NET_F_MTU is advertised some modern driver
> > > > > > > > > > will try to access and validate the MTU present in the config
> > > > > > > > > > space before virtio features are set.
> > > > > > > > > This looks like a spec violation:
> > > > > > > > >
> > > > > > > > > "
> > > > > > > > >
> > > > > > > > > The following driver-read-only field, mtu only exists if
> > > > > > > > > VIRTIO_NET_F_MTU is
> > > > > > > > > set.
> > > > > > > > > This field specifies the maximum MTU for the driver to use.
> > > > > > > > > "
> > > > > > > > >
> > > > > > > > > Do we really want to workaround this?
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > And also:
> > > > > > > >
> > > > > > > > The driver MUST follow this sequence to initialize a device:
> > > > > > > > 1. Reset the device.
> > > > > > > > 2. Set the ACKNOWLEDGE status bit: the guest OS has
> > > > > > > > noticed the device.
> > > > > > > > 3. Set the DRIVER status bit: the guest OS knows how to drive the
> > > > > > > > device.
> > > > > > > > 4. Read device feature bits, and write the subset of feature bits
> > > > > > > > understood by the OS and driver to the
> > > > > > > > device. During this step the driver MAY read (but MUST NOT write)
> > > > > > > > the device-specific configuration
> > > > > > > > fields to check that it can support the device before accepting it.
> > > > > > > > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new
> > > > > > > > feature bits after this step.
> > > > > > > > 6. Re-read device status to ensure the FEATURES_OK bit is still set:
> > > > > > > > otherwise, the device does not
> > > > > > > > support our subset of features and the device is unusable.
> > > > > > > > 7. Perform device-specific setup, including discovery of virtqueues
> > > > > > > > for the device, optional per-bus setup,
> > > > > > > > reading and possibly writing the device’s virtio configuration
> > > > > > > > space, and population of virtqueues.
> > > > > > > > 8. Set the DRIVER_OK status bit. At this point the device is “live”.
> > > > > > > >
> > > > > > > >
> > > > > > > > so accessing config space before FEATURES_OK is a spec
> > > > > > > > violation, right?
> > > > > > > It is, but it's not relevant to what this commit tries to address. I
> > > > > > > thought the legacy guest still needs to be supported.
> > > > > > >
> > > > > > > Having said, a separate patch has to be posted to fix the guest driver
> > > > > > > issue where this discrepancy is introduced to
> > > > > > > virtnet_validate() (since
> > > > > > > commit fe36cbe067). But it's not technically related to this patch.
> > > > > > >
> > > > > > > -Siwei
> > > > > >
> > > > > > I think it's a bug to read config space in validate, we should
> > > > > > move it to
> > > > > > virtnet_probe().
> > > > > >
> > > > > > Thanks
> > > > > I take it back, reading but not writing seems to be explicitly
> > > > > allowed by spec.
> > > > > So our way to detect a legacy guest is bogus, need to think what is
> > > > > the best way to handle this.
> > > > Then maybe revert commit fe36cbe067 and friends, and have QEMU detect
> > > > legacy guest? Supposedly only config space write access needs to be
> > > > guarded before setting FEATURES_OK.
> > >
> > >
> > > I agree. My understanding is that all vDPA must be modern device (since
> > > VIRITO_F_ACCESS_PLATFORM is mandated) instead of transitional device.
> > >
> > > Thanks
> >
> > Well mlx5 has some code to handle legacy guests ...
> > Eli, could you comment? Is that support unused right now?
> >
>
> If you mean support for version 1.0, well the knob is there but it's not
> set in the firmware I use. Note sure if we will support this.

Hmm you mean it's legacy only right now?
Well at some point you will want advanced goodies like RSS
and all that is gated on 1.0 ;)

> >
> > >
> > > >
> > > > -Siwie
> > > >
> > > > > > > >
> > > > > > > > > > Rejecting reset to 0
> > > > > > > > > > prematurely causes correct MTU and link status unable to load
> > > > > > > > > > for the very first config space access, rendering issues like
> > > > > > > > > > guest showing inaccurate MTU value, or failure to reject
> > > > > > > > > > out-of-range MTU.
> > > > > > > > > >
> > > > > > > > > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for
> > > > > > > > > > supported mlx5 devices")
> > > > > > > > > > Signed-off-by: Si-Wei Liu <[email protected]>
> > > > > > > > > > ---
> > > > > > > > > >     drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--------------
> > > > > > > > > >     1 file changed, 1 insertion(+), 14 deletions(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > index 7c1f789..540dd67 100644
> > > > > > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > @@ -1490,14 +1490,6 @@ static u64
> > > > > > > > > > mlx5_vdpa_get_features(struct vdpa_device *vdev)
> > > > > > > > > >         return mvdev->mlx_features;
> > > > > > > > > >     }
> > > > > > > > > > -static int verify_min_features(struct mlx5_vdpa_dev *mvdev,
> > > > > > > > > > u64 features)
> > > > > > > > > > -{
> > > > > > > > > > -    if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
> > > > > > > > > > -        return -EOPNOTSUPP;
> > > > > > > > > > -
> > > > > > > > > > -    return 0;
> > > > > > > > > > -}
> > > > > > > > > > -
> > > > > > > > > >     static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
> > > > > > > > > >     {
> > > > > > > > > >         int err;
> > > > > > > > > > @@ -1558,18 +1550,13 @@ static int
> > > > > > > > > > mlx5_vdpa_set_features(struct vdpa_device *vdev, u64
> > > > > > > > > > features)
> > > > > > > > > >     {
> > > > > > > > > >         struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> > > > > > > > > >         struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> > > > > > > > > > -    int err;
> > > > > > > > > >         print_features(mvdev, features, true);
> > > > > > > > > > -    err = verify_min_features(mvdev, features);
> > > > > > > > > > -    if (err)
> > > > > > > > > > -        return err;
> > > > > > > > > > -
> > > > > > > > > >         ndev->mvdev.actual_features = features &
> > > > > > > > > > ndev->mvdev.mlx_features;
> > > > > > > > > >         ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
> > > > > > > > > >         ndev->config.status |= cpu_to_mlx5vdpa16(mvdev,
> > > > > > > > > > VIRTIO_NET_S_LINK_UP);
> > > > > > > > > > -    return err;
> > > > > > > > > > +    return 0;
> > > > > > > > > >     }
> > > > > > > > > >     static void mlx5_vdpa_set_config_cb(struct vdpa_device
> > > > > > > > > > *vdev, struct vdpa_callback *cb)
> > > >
> >

2021-02-24 08:17:06

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

On Wed, Feb 24, 2021 at 02:53:08PM +0800, Jason Wang wrote:
>
> On 2021/2/24 2:46 下午, Michael S. Tsirkin wrote:
> > On Wed, Feb 24, 2021 at 02:04:36PM +0800, Jason Wang wrote:
> > > On 2021/2/24 1:04 下午, Michael S. Tsirkin wrote:
> > > > On Tue, Feb 23, 2021 at 11:35:57AM -0800, Si-Wei Liu wrote:
> > > > > On 2/23/2021 5:26 AM, Michael S. Tsirkin wrote:
> > > > > > On Tue, Feb 23, 2021 at 10:03:57AM +0800, Jason Wang wrote:
> > > > > > > On 2021/2/23 9:12 上午, Si-Wei Liu wrote:
> > > > > > > > On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote:
> > > > > > > > > On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote:
> > > > > > > > > > On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
> > > > > > > > > > > Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
> > > > > > > > > > > for legacy") made an exception for legacy guests to reset
> > > > > > > > > > > features to 0, when config space is accessed before features
> > > > > > > > > > > are set. We should relieve the verify_min_features() check
> > > > > > > > > > > and allow features reset to 0 for this case.
> > > > > > > > > > >
> > > > > > > > > > > It's worth noting that not just legacy guests could access
> > > > > > > > > > > config space before features are set. For instance, when
> > > > > > > > > > > feature VIRTIO_NET_F_MTU is advertised some modern driver
> > > > > > > > > > > will try to access and validate the MTU present in the config
> > > > > > > > > > > space before virtio features are set.
> > > > > > > > > > This looks like a spec violation:
> > > > > > > > > >
> > > > > > > > > > "
> > > > > > > > > >
> > > > > > > > > > The following driver-read-only field, mtu only exists if
> > > > > > > > > > VIRTIO_NET_F_MTU is
> > > > > > > > > > set.
> > > > > > > > > > This field specifies the maximum MTU for the driver to use.
> > > > > > > > > > "
> > > > > > > > > >
> > > > > > > > > > Do we really want to workaround this?
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > And also:
> > > > > > > > >
> > > > > > > > > The driver MUST follow this sequence to initialize a device:
> > > > > > > > > 1. Reset the device.
> > > > > > > > > 2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
> > > > > > > > > 3. Set the DRIVER status bit: the guest OS knows how to drive the
> > > > > > > > > device.
> > > > > > > > > 4. Read device feature bits, and write the subset of feature bits
> > > > > > > > > understood by the OS and driver to the
> > > > > > > > > device. During this step the driver MAY read (but MUST NOT write)
> > > > > > > > > the device-specific configuration
> > > > > > > > > fields to check that it can support the device before accepting it.
> > > > > > > > > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new
> > > > > > > > > feature bits after this step.
> > > > > > > > > 6. Re-read device status to ensure the FEATURES_OK bit is still set:
> > > > > > > > > otherwise, the device does not
> > > > > > > > > support our subset of features and the device is unusable.
> > > > > > > > > 7. Perform device-specific setup, including discovery of virtqueues
> > > > > > > > > for the device, optional per-bus setup,
> > > > > > > > > reading and possibly writing the device’s virtio configuration
> > > > > > > > > space, and population of virtqueues.
> > > > > > > > > 8. Set the DRIVER_OK status bit. At this point the device is “live”.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > so accessing config space before FEATURES_OK is a spec violation, right?
> > > > > > > > It is, but it's not relevant to what this commit tries to address. I
> > > > > > > > thought the legacy guest still needs to be supported.
> > > > > > > >
> > > > > > > > Having said, a separate patch has to be posted to fix the guest driver
> > > > > > > > issue where this discrepancy is introduced to virtnet_validate() (since
> > > > > > > > commit fe36cbe067). But it's not technically related to this patch.
> > > > > > > >
> > > > > > > > -Siwei
> > > > > > > I think it's a bug to read config space in validate, we should move it to
> > > > > > > virtnet_probe().
> > > > > > >
> > > > > > > Thanks
> > > > > > I take it back, reading but not writing seems to be explicitly allowed by spec.
> > > > > > So our way to detect a legacy guest is bogus, need to think what is
> > > > > > the best way to handle this.
> > > > > Then maybe revert commit fe36cbe067 and friends, and have QEMU detect legacy
> > > > > guest? Supposedly only config space write access needs to be guarded before
> > > > > setting FEATURES_OK.
> > > > >
> > > > > -Siwie
> > > > Detecting it isn't enough though, we will need a new ioctl to notify
> > > > the kernel that it's a legacy guest. Ugh :(
> > >
> > > I'm not sure I get this, how can we know if there's a legacy driver before
> > > set_features()?
> > qemu knows for sure. It does not communicate this information to the
> > kernel right now unfortunately.
>
>
> I may miss something, but I still don't get how the new ioctl is supposed to
> work.
>
> Thanks



Basically on first guest access QEMU would tell kernel whether
guest is using the legacy or the modern interface.
E.g. virtio_pci_config_read/virtio_pci_config_write will call ioctl(ENABLE_LEGACY, 1)
while virtio_pci_common_read will call ioctl(ENABLE_LEGACY, 0)

Or maybe we just add GET_CONFIG_MODERN and GET_CONFIG_LEGACY and
call the correct ioctl ... there are many ways to build this API.

>
> >
> > > And I wonder what will hapeen if we just revert the set_features(0)?
> > >
> > > Thanks
> > >
> > >
> > > >
> > > > > > > > > > > Rejecting reset to 0
> > > > > > > > > > > prematurely causes correct MTU and link status unable to load
> > > > > > > > > > > for the very first config space access, rendering issues like
> > > > > > > > > > > guest showing inaccurate MTU value, or failure to reject
> > > > > > > > > > > out-of-range MTU.
> > > > > > > > > > >
> > > > > > > > > > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for
> > > > > > > > > > > supported mlx5 devices")
> > > > > > > > > > > Signed-off-by: Si-Wei Liu <[email protected]>
> > > > > > > > > > > ---
> > > > > > > > > > >    drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--------------
> > > > > > > > > > >    1 file changed, 1 insertion(+), 14 deletions(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > > index 7c1f789..540dd67 100644
> > > > > > > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > > @@ -1490,14 +1490,6 @@ static u64
> > > > > > > > > > > mlx5_vdpa_get_features(struct vdpa_device *vdev)
> > > > > > > > > > >        return mvdev->mlx_features;
> > > > > > > > > > >    }
> > > > > > > > > > > -static int verify_min_features(struct mlx5_vdpa_dev *mvdev,
> > > > > > > > > > > u64 features)
> > > > > > > > > > > -{
> > > > > > > > > > > -    if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
> > > > > > > > > > > -        return -EOPNOTSUPP;
> > > > > > > > > > > -
> > > > > > > > > > > -    return 0;
> > > > > > > > > > > -}
> > > > > > > > > > > -
> > > > > > > > > > >    static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
> > > > > > > > > > >    {
> > > > > > > > > > >        int err;
> > > > > > > > > > > @@ -1558,18 +1550,13 @@ static int
> > > > > > > > > > > mlx5_vdpa_set_features(struct vdpa_device *vdev, u64
> > > > > > > > > > > features)
> > > > > > > > > > >    {
> > > > > > > > > > >        struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> > > > > > > > > > >        struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> > > > > > > > > > > -    int err;
> > > > > > > > > > >        print_features(mvdev, features, true);
> > > > > > > > > > > -    err = verify_min_features(mvdev, features);
> > > > > > > > > > > -    if (err)
> > > > > > > > > > > -        return err;
> > > > > > > > > > > -
> > > > > > > > > > >        ndev->mvdev.actual_features = features &
> > > > > > > > > > > ndev->mvdev.mlx_features;
> > > > > > > > > > >        ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
> > > > > > > > > > >        ndev->config.status |= cpu_to_mlx5vdpa16(mvdev,
> > > > > > > > > > > VIRTIO_NET_S_LINK_UP);
> > > > > > > > > > > -    return err;
> > > > > > > > > > > +    return 0;
> > > > > > > > > > >    }
> > > > > > > > > > >    static void mlx5_vdpa_set_config_cb(struct vdpa_device
> > > > > > > > > > > *vdev, struct vdpa_callback *cb)

2021-02-24 08:48:59

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

On Wed, Feb 24, 2021 at 04:26:43PM +0800, Jason Wang wrote:
> Basically on first guest access QEMU would tell kernel whether
> guest is using the legacy or the modern interface.
> E.g. virtio_pci_config_read/virtio_pci_config_write will call ioctl(ENABLE_LEGACY, 1)
> while virtio_pci_common_read will call ioctl(ENABLE_LEGACY, 0)
>
>
> But this trick work only for PCI I think?
>
> Thanks

ccw has a revision it can check. mmio does not have transitional devices
at all.

--
MST

2021-02-24 09:47:24

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero


On 2021/2/23 6:17 下午, Jason Wang wrote:
>
> On 2021/2/23 6:01 下午, Michael S. Tsirkin wrote:
>> On Tue, Feb 23, 2021 at 05:46:20PM +0800, Jason Wang wrote:
>>> On 2021/2/23 下午5:25, Michael S. Tsirkin wrote:
>>>> On Mon, Feb 22, 2021 at 09:09:28AM -0800, Si-Wei Liu wrote:
>>>>> On 2/21/2021 8:14 PM, Jason Wang wrote:
>>>>>> On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
>>>>>>> Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
>>>>>>> for legacy") made an exception for legacy guests to reset
>>>>>>> features to 0, when config space is accessed before features
>>>>>>> are set. We should relieve the verify_min_features() check
>>>>>>> and allow features reset to 0 for this case.
>>>>>>>
>>>>>>> It's worth noting that not just legacy guests could access
>>>>>>> config space before features are set. For instance, when
>>>>>>> feature VIRTIO_NET_F_MTU is advertised some modern driver
>>>>>>> will try to access and validate the MTU present in the config
>>>>>>> space before virtio features are set.
>>>>>> This looks like a spec violation:
>>>>>>
>>>>>> "
>>>>>>
>>>>>> The following driver-read-only field, mtu only exists if
>>>>>> VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for
>>>>>> the
>>>>>> driver to use.
>>>>>> "
>>>>>>
>>>>>> Do we really want to workaround this?
>>>>> Isn't the commit 452639a64ad8 itself is a workaround for legacy
>>>>> guest?
>>>>>
>>>>> I think the point is, since there's legacy guest we'd have to
>>>>> support, this
>>>>> host side workaround is unavoidable. Although I agree the
>>>>> violating driver
>>>>> should be fixed (yes, it's in today's upstream kernel which exists
>>>>> for a
>>>>> while now).
>>>> Oh  you are right:
>>>>
>>>>
>>>> static int virtnet_validate(struct virtio_device *vdev)
>>>> {
>>>>           if (!vdev->config->get) {
>>>>                   dev_err(&vdev->dev, "%s failure: config access
>>>> disabled\n",
>>>>                           __func__);
>>>>                   return -EINVAL;
>>>>           }
>>>>
>>>>           if (!virtnet_validate_features(vdev))
>>>>                   return -EINVAL;
>>>>
>>>>           if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) {
>>>>                   int mtu = virtio_cread16(vdev,
>>>>                                            offsetof(struct
>>>> virtio_net_config,
>>>>                                                     mtu));
>>>>                   if (mtu < MIN_MTU)
>>>>                           __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);
>>>
>>> I wonder why not simply fail here?
>> Back in 2016 it went like this:
>>
>>     On Thu, Jun 02, 2016 at 05:10:59PM -0400, Aaron Conole wrote:
>>     > +     if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) {
>>     > +             dev->mtu = virtio_cread16(vdev,
>>     > +                                       offsetof(struct
>> virtio_net_config,
>>     > +                                                mtu));
>>     > +     }
>>     > +
>>     >       if (vi->any_header_sg)
>>     >               dev->needed_headroom = vi->hdr_len;
>>     >
>>
>>     One comment though: I think we should validate the mtu.
>>     If it's invalid, clear VIRTIO_NET_F_MTU and ignore.
>>
>>
>> Too late at this point :)
>>
>> I guess it's a way to tell device "I can not live with this MTU",
>> device can fail FEATURES_OK if it wants to. MIN_MTU
>> is an internal linux thing and at the time I felt it's better to
>> try to make progress.
>
>
> What if e.g the device advertise a large MTU. E.g 64K here?


Ok, consider we use add_recvbuf_small() when neither GSO nor mrg_rxbuf.
This means we should fail the probing if MTU is greater than 1500 in
this case.

Thanks


> In that case, the driver can not live either. Clearing MTU won't help
> here.
>
> Thanks
>
>
>>
>>
>>>>           }
>>>>
>>>>           return 0;
>>>> }
>>>>
>>>> And the spec says:
>>>>
>>>>
>>>> The driver MUST follow this sequence to initialize a device:
>>>> 1. Reset the device.
>>>> 2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the
>>>> device.
>>>> 3. Set the DRIVER status bit: the guest OS knows how to drive the
>>>> device.
>>>> 4. Read device feature bits, and write the subset of feature bits
>>>> understood by the OS and driver to the
>>>> device. During this step the driver MAY read (but MUST NOT write)
>>>> the device-specific configuration
>>>> fields to check that it can support the device before accepting it.
>>>> 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new
>>>> feature bits after this step.
>>>> 6. Re-read device status to ensure the FEATURES_OK bit is still
>>>> set: otherwise, the device does not
>>>> support our subset of features and the device is unusable.
>>>> 7. Perform device-specific setup, including discovery of virtqueues
>>>> for the device, optional per-bus setup,
>>>> reading and possibly writing the device’s virtio configuration
>>>> space, and population of virtqueues.
>>>> 8. Set the DRIVER_OK status bit. At this point the device is “live”.
>>>>
>>>>
>>>> Item 4 on the list explicitly allows reading config space before
>>>> FEATURES_OK.
>>>>
>>>> I conclude that VIRTIO_NET_F_MTU is set means "set in device
>>>> features".
>>>
>>> So this probably need some clarification. "is set" is used many
>>> times in the
>>> spec that has different implications.
>>>
>>> Thanks
>>>
>>>
>>>> Generally it is worth going over feature dependent config fields
>>>> and checking whether they should be present when device feature is set
>>>> or when feature bit has been negotiated, and making this clear.
>>>>
>

2021-02-24 12:47:19

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

On Tue, Feb 23, 2021 at 11:35:57AM -0800, Si-Wei Liu wrote:
>
>
> On 2/23/2021 5:26 AM, Michael S. Tsirkin wrote:
> > On Tue, Feb 23, 2021 at 10:03:57AM +0800, Jason Wang wrote:
> > > On 2021/2/23 9:12 上午, Si-Wei Liu wrote:
> > > >
> > > > On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote:
> > > > > On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote:
> > > > > > On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
> > > > > > > Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
> > > > > > > for legacy") made an exception for legacy guests to reset
> > > > > > > features to 0, when config space is accessed before features
> > > > > > > are set. We should relieve the verify_min_features() check
> > > > > > > and allow features reset to 0 for this case.
> > > > > > >
> > > > > > > It's worth noting that not just legacy guests could access
> > > > > > > config space before features are set. For instance, when
> > > > > > > feature VIRTIO_NET_F_MTU is advertised some modern driver
> > > > > > > will try to access and validate the MTU present in the config
> > > > > > > space before virtio features are set.
> > > > > > This looks like a spec violation:
> > > > > >
> > > > > > "
> > > > > >
> > > > > > The following driver-read-only field, mtu only exists if
> > > > > > VIRTIO_NET_F_MTU is
> > > > > > set.
> > > > > > This field specifies the maximum MTU for the driver to use.
> > > > > > "
> > > > > >
> > > > > > Do we really want to workaround this?
> > > > > >
> > > > > > Thanks
> > > > > And also:
> > > > >
> > > > > The driver MUST follow this sequence to initialize a device:
> > > > > 1. Reset the device.
> > > > > 2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
> > > > > 3. Set the DRIVER status bit: the guest OS knows how to drive the
> > > > > device.
> > > > > 4. Read device feature bits, and write the subset of feature bits
> > > > > understood by the OS and driver to the
> > > > > device. During this step the driver MAY read (but MUST NOT write)
> > > > > the device-specific configuration
> > > > > fields to check that it can support the device before accepting it.
> > > > > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new
> > > > > feature bits after this step.
> > > > > 6. Re-read device status to ensure the FEATURES_OK bit is still set:
> > > > > otherwise, the device does not
> > > > > support our subset of features and the device is unusable.
> > > > > 7. Perform device-specific setup, including discovery of virtqueues
> > > > > for the device, optional per-bus setup,
> > > > > reading and possibly writing the device’s virtio configuration
> > > > > space, and population of virtqueues.
> > > > > 8. Set the DRIVER_OK status bit. At this point the device is “live”.
> > > > >
> > > > >
> > > > > so accessing config space before FEATURES_OK is a spec violation, right?
> > > > It is, but it's not relevant to what this commit tries to address. I
> > > > thought the legacy guest still needs to be supported.
> > > >
> > > > Having said, a separate patch has to be posted to fix the guest driver
> > > > issue where this discrepancy is introduced to virtnet_validate() (since
> > > > commit fe36cbe067). But it's not technically related to this patch.
> > > >
> > > > -Siwei
> > >
> > > I think it's a bug to read config space in validate, we should move it to
> > > virtnet_probe().
> > >
> > > Thanks
> > I take it back, reading but not writing seems to be explicitly allowed by spec.
> > So our way to detect a legacy guest is bogus, need to think what is
> > the best way to handle this.
> Then maybe revert commit fe36cbe067 and friends, and have QEMU detect legacy
> guest? Supposedly only config space write access needs to be guarded before
> setting FEATURES_OK.
>
> -Siwie

Detecting it isn't enough though, we will need a new ioctl to notify
the kernel that it's a legacy guest. Ugh :(


> > > > >
> > > > > > > Rejecting reset to 0
> > > > > > > prematurely causes correct MTU and link status unable to load
> > > > > > > for the very first config space access, rendering issues like
> > > > > > > guest showing inaccurate MTU value, or failure to reject
> > > > > > > out-of-range MTU.
> > > > > > >
> > > > > > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for
> > > > > > > supported mlx5 devices")
> > > > > > > Signed-off-by: Si-Wei Liu <[email protected]>
> > > > > > > ---
> > > > > > >    drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--------------
> > > > > > >    1 file changed, 1 insertion(+), 14 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > index 7c1f789..540dd67 100644
> > > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > @@ -1490,14 +1490,6 @@ static u64
> > > > > > > mlx5_vdpa_get_features(struct vdpa_device *vdev)
> > > > > > >        return mvdev->mlx_features;
> > > > > > >    }
> > > > > > > -static int verify_min_features(struct mlx5_vdpa_dev *mvdev,
> > > > > > > u64 features)
> > > > > > > -{
> > > > > > > -    if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
> > > > > > > -        return -EOPNOTSUPP;
> > > > > > > -
> > > > > > > -    return 0;
> > > > > > > -}
> > > > > > > -
> > > > > > >    static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
> > > > > > >    {
> > > > > > >        int err;
> > > > > > > @@ -1558,18 +1550,13 @@ static int
> > > > > > > mlx5_vdpa_set_features(struct vdpa_device *vdev, u64
> > > > > > > features)
> > > > > > >    {
> > > > > > >        struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> > > > > > >        struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> > > > > > > -    int err;
> > > > > > >        print_features(mvdev, features, true);
> > > > > > > -    err = verify_min_features(mvdev, features);
> > > > > > > -    if (err)
> > > > > > > -        return err;
> > > > > > > -
> > > > > > >        ndev->mvdev.actual_features = features &
> > > > > > > ndev->mvdev.mlx_features;
> > > > > > >        ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
> > > > > > >        ndev->config.status |= cpu_to_mlx5vdpa16(mvdev,
> > > > > > > VIRTIO_NET_S_LINK_UP);
> > > > > > > -    return err;
> > > > > > > +    return 0;
> > > > > > >    }
> > > > > > >    static void mlx5_vdpa_set_config_cb(struct vdpa_device
> > > > > > > *vdev, struct vdpa_callback *cb)

2021-02-24 12:53:13

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero


On 2021/2/24 1:17 下午, Michael S. Tsirkin wrote:
> On Wed, Feb 24, 2021 at 11:20:01AM +0800, Jason Wang wrote:
>> On 2021/2/24 3:35 上午, Si-Wei Liu wrote:
>>>
>>> On 2/23/2021 5:26 AM, Michael S. Tsirkin wrote:
>>>> On Tue, Feb 23, 2021 at 10:03:57AM +0800, Jason Wang wrote:
>>>>> On 2021/2/23 9:12 上午, Si-Wei Liu wrote:
>>>>>> On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote:
>>>>>>> On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote:
>>>>>>>> On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
>>>>>>>>> Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
>>>>>>>>> for legacy") made an exception for legacy guests to reset
>>>>>>>>> features to 0, when config space is accessed before features
>>>>>>>>> are set. We should relieve the verify_min_features() check
>>>>>>>>> and allow features reset to 0 for this case.
>>>>>>>>>
>>>>>>>>> It's worth noting that not just legacy guests could access
>>>>>>>>> config space before features are set. For instance, when
>>>>>>>>> feature VIRTIO_NET_F_MTU is advertised some modern driver
>>>>>>>>> will try to access and validate the MTU present in the config
>>>>>>>>> space before virtio features are set.
>>>>>>>> This looks like a spec violation:
>>>>>>>>
>>>>>>>> "
>>>>>>>>
>>>>>>>> The following driver-read-only field, mtu only exists if
>>>>>>>> VIRTIO_NET_F_MTU is
>>>>>>>> set.
>>>>>>>> This field specifies the maximum MTU for the driver to use.
>>>>>>>> "
>>>>>>>>
>>>>>>>> Do we really want to workaround this?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>> And also:
>>>>>>>
>>>>>>> The driver MUST follow this sequence to initialize a device:
>>>>>>> 1. Reset the device.
>>>>>>> 2. Set the ACKNOWLEDGE status bit: the guest OS has
>>>>>>> noticed the device.
>>>>>>> 3. Set the DRIVER status bit: the guest OS knows how to drive the
>>>>>>> device.
>>>>>>> 4. Read device feature bits, and write the subset of feature bits
>>>>>>> understood by the OS and driver to the
>>>>>>> device. During this step the driver MAY read (but MUST NOT write)
>>>>>>> the device-specific configuration
>>>>>>> fields to check that it can support the device before accepting it.
>>>>>>> 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new
>>>>>>> feature bits after this step.
>>>>>>> 6. Re-read device status to ensure the FEATURES_OK bit is still set:
>>>>>>> otherwise, the device does not
>>>>>>> support our subset of features and the device is unusable.
>>>>>>> 7. Perform device-specific setup, including discovery of virtqueues
>>>>>>> for the device, optional per-bus setup,
>>>>>>> reading and possibly writing the device’s virtio configuration
>>>>>>> space, and population of virtqueues.
>>>>>>> 8. Set the DRIVER_OK status bit. At this point the device is “live”.
>>>>>>>
>>>>>>>
>>>>>>> so accessing config space before FEATURES_OK is a spec
>>>>>>> violation, right?
>>>>>> It is, but it's not relevant to what this commit tries to address. I
>>>>>> thought the legacy guest still needs to be supported.
>>>>>>
>>>>>> Having said, a separate patch has to be posted to fix the guest driver
>>>>>> issue where this discrepancy is introduced to
>>>>>> virtnet_validate() (since
>>>>>> commit fe36cbe067). But it's not technically related to this patch.
>>>>>>
>>>>>> -Siwei
>>>>> I think it's a bug to read config space in validate, we should
>>>>> move it to
>>>>> virtnet_probe().
>>>>>
>>>>> Thanks
>>>> I take it back, reading but not writing seems to be explicitly
>>>> allowed by spec.
>>>> So our way to detect a legacy guest is bogus, need to think what is
>>>> the best way to handle this.
>>> Then maybe revert commit fe36cbe067 and friends, and have QEMU detect
>>> legacy guest? Supposedly only config space write access needs to be
>>> guarded before setting FEATURES_OK.
>>
>> I agree. My understanding is that all vDPA must be modern device (since
>> VIRITO_F_ACCESS_PLATFORM is mandated) instead of transitional device.
>>
>> Thanks
> Well mlx5 has some code to handle legacy guests ...


My understanding is that, even if mlx5 is modern device it can still
suppot legacy guests since the device saw by guest is emulated by Qemu.
Qemu can just present a transitional device to guest, but negotiate
VIRTIO_F_ACCESS_PLATFORM. (Actually this is what has been done now).

Thanks


> Eli, could you comment? Is that support unused right now?
>
>
>>> -Siwie
>>>
>>>>>>>>> Rejecting reset to 0
>>>>>>>>> prematurely causes correct MTU and link status unable to load
>>>>>>>>> for the very first config space access, rendering issues like
>>>>>>>>> guest showing inaccurate MTU value, or failure to reject
>>>>>>>>> out-of-range MTU.
>>>>>>>>>
>>>>>>>>> Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for
>>>>>>>>> supported mlx5 devices")
>>>>>>>>> Signed-off-by: Si-Wei Liu <[email protected]>
>>>>>>>>> ---
>>>>>>>>>     drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--------------
>>>>>>>>>     1 file changed, 1 insertion(+), 14 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>> index 7c1f789..540dd67 100644
>>>>>>>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>> @@ -1490,14 +1490,6 @@ static u64
>>>>>>>>> mlx5_vdpa_get_features(struct vdpa_device *vdev)
>>>>>>>>>         return mvdev->mlx_features;
>>>>>>>>>     }
>>>>>>>>> -static int verify_min_features(struct mlx5_vdpa_dev *mvdev,
>>>>>>>>> u64 features)
>>>>>>>>> -{
>>>>>>>>> -    if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
>>>>>>>>> -        return -EOPNOTSUPP;
>>>>>>>>> -
>>>>>>>>> -    return 0;
>>>>>>>>> -}
>>>>>>>>> -
>>>>>>>>>     static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
>>>>>>>>>     {
>>>>>>>>>         int err;
>>>>>>>>> @@ -1558,18 +1550,13 @@ static int
>>>>>>>>> mlx5_vdpa_set_features(struct vdpa_device *vdev, u64
>>>>>>>>> features)
>>>>>>>>>     {
>>>>>>>>>         struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>>>>>>>>>         struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
>>>>>>>>> -    int err;
>>>>>>>>>         print_features(mvdev, features, true);
>>>>>>>>> -    err = verify_min_features(mvdev, features);
>>>>>>>>> -    if (err)
>>>>>>>>> -        return err;
>>>>>>>>> -
>>>>>>>>>         ndev->mvdev.actual_features = features &
>>>>>>>>> ndev->mvdev.mlx_features;
>>>>>>>>>         ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
>>>>>>>>>         ndev->config.status |= cpu_to_mlx5vdpa16(mvdev,
>>>>>>>>> VIRTIO_NET_S_LINK_UP);
>>>>>>>>> -    return err;
>>>>>>>>> +    return 0;
>>>>>>>>>     }
>>>>>>>>>     static void mlx5_vdpa_set_config_cb(struct vdpa_device
>>>>>>>>> *vdev, struct vdpa_callback *cb)

2021-02-24 12:54:10

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero


On 2021/2/24 1:04 下午, Michael S. Tsirkin wrote:
> On Tue, Feb 23, 2021 at 11:35:57AM -0800, Si-Wei Liu wrote:
>>
>> On 2/23/2021 5:26 AM, Michael S. Tsirkin wrote:
>>> On Tue, Feb 23, 2021 at 10:03:57AM +0800, Jason Wang wrote:
>>>> On 2021/2/23 9:12 上午, Si-Wei Liu wrote:
>>>>> On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote:
>>>>>> On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote:
>>>>>>> On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
>>>>>>>> Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
>>>>>>>> for legacy") made an exception for legacy guests to reset
>>>>>>>> features to 0, when config space is accessed before features
>>>>>>>> are set. We should relieve the verify_min_features() check
>>>>>>>> and allow features reset to 0 for this case.
>>>>>>>>
>>>>>>>> It's worth noting that not just legacy guests could access
>>>>>>>> config space before features are set. For instance, when
>>>>>>>> feature VIRTIO_NET_F_MTU is advertised some modern driver
>>>>>>>> will try to access and validate the MTU present in the config
>>>>>>>> space before virtio features are set.
>>>>>>> This looks like a spec violation:
>>>>>>>
>>>>>>> "
>>>>>>>
>>>>>>> The following driver-read-only field, mtu only exists if
>>>>>>> VIRTIO_NET_F_MTU is
>>>>>>> set.
>>>>>>> This field specifies the maximum MTU for the driver to use.
>>>>>>> "
>>>>>>>
>>>>>>> Do we really want to workaround this?
>>>>>>>
>>>>>>> Thanks
>>>>>> And also:
>>>>>>
>>>>>> The driver MUST follow this sequence to initialize a device:
>>>>>> 1. Reset the device.
>>>>>> 2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
>>>>>> 3. Set the DRIVER status bit: the guest OS knows how to drive the
>>>>>> device.
>>>>>> 4. Read device feature bits, and write the subset of feature bits
>>>>>> understood by the OS and driver to the
>>>>>> device. During this step the driver MAY read (but MUST NOT write)
>>>>>> the device-specific configuration
>>>>>> fields to check that it can support the device before accepting it.
>>>>>> 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new
>>>>>> feature bits after this step.
>>>>>> 6. Re-read device status to ensure the FEATURES_OK bit is still set:
>>>>>> otherwise, the device does not
>>>>>> support our subset of features and the device is unusable.
>>>>>> 7. Perform device-specific setup, including discovery of virtqueues
>>>>>> for the device, optional per-bus setup,
>>>>>> reading and possibly writing the device’s virtio configuration
>>>>>> space, and population of virtqueues.
>>>>>> 8. Set the DRIVER_OK status bit. At this point the device is “live”.
>>>>>>
>>>>>>
>>>>>> so accessing config space before FEATURES_OK is a spec violation, right?
>>>>> It is, but it's not relevant to what this commit tries to address. I
>>>>> thought the legacy guest still needs to be supported.
>>>>>
>>>>> Having said, a separate patch has to be posted to fix the guest driver
>>>>> issue where this discrepancy is introduced to virtnet_validate() (since
>>>>> commit fe36cbe067). But it's not technically related to this patch.
>>>>>
>>>>> -Siwei
>>>> I think it's a bug to read config space in validate, we should move it to
>>>> virtnet_probe().
>>>>
>>>> Thanks
>>> I take it back, reading but not writing seems to be explicitly allowed by spec.
>>> So our way to detect a legacy guest is bogus, need to think what is
>>> the best way to handle this.
>> Then maybe revert commit fe36cbe067 and friends, and have QEMU detect legacy
>> guest? Supposedly only config space write access needs to be guarded before
>> setting FEATURES_OK.
>>
>> -Siwie
> Detecting it isn't enough though, we will need a new ioctl to notify
> the kernel that it's a legacy guest. Ugh :(


I'm not sure I get this, how can we know if there's a legacy driver
before set_features()?

And I wonder what will hapeen if we just revert the set_features(0)?

Thanks


>
>
>>>>>>>> Rejecting reset to 0
>>>>>>>> prematurely causes correct MTU and link status unable to load
>>>>>>>> for the very first config space access, rendering issues like
>>>>>>>> guest showing inaccurate MTU value, or failure to reject
>>>>>>>> out-of-range MTU.
>>>>>>>>
>>>>>>>> Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for
>>>>>>>> supported mlx5 devices")
>>>>>>>> Signed-off-by: Si-Wei Liu <[email protected]>
>>>>>>>> ---
>>>>>>>>    drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--------------
>>>>>>>>    1 file changed, 1 insertion(+), 14 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>> index 7c1f789..540dd67 100644
>>>>>>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>> @@ -1490,14 +1490,6 @@ static u64
>>>>>>>> mlx5_vdpa_get_features(struct vdpa_device *vdev)
>>>>>>>>        return mvdev->mlx_features;
>>>>>>>>    }
>>>>>>>> -static int verify_min_features(struct mlx5_vdpa_dev *mvdev,
>>>>>>>> u64 features)
>>>>>>>> -{
>>>>>>>> -    if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
>>>>>>>> -        return -EOPNOTSUPP;
>>>>>>>> -
>>>>>>>> -    return 0;
>>>>>>>> -}
>>>>>>>> -
>>>>>>>>    static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
>>>>>>>>    {
>>>>>>>>        int err;
>>>>>>>> @@ -1558,18 +1550,13 @@ static int
>>>>>>>> mlx5_vdpa_set_features(struct vdpa_device *vdev, u64
>>>>>>>> features)
>>>>>>>>    {
>>>>>>>>        struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>>>>>>>>        struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
>>>>>>>> -    int err;
>>>>>>>>        print_features(mvdev, features, true);
>>>>>>>> -    err = verify_min_features(mvdev, features);
>>>>>>>> -    if (err)
>>>>>>>> -        return err;
>>>>>>>> -
>>>>>>>>        ndev->mvdev.actual_features = features &
>>>>>>>> ndev->mvdev.mlx_features;
>>>>>>>>        ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
>>>>>>>>        ndev->config.status |= cpu_to_mlx5vdpa16(mvdev,
>>>>>>>> VIRTIO_NET_S_LINK_UP);
>>>>>>>> -    return err;
>>>>>>>> +    return 0;
>>>>>>>>    }
>>>>>>>>    static void mlx5_vdpa_set_config_cb(struct vdpa_device
>>>>>>>> *vdev, struct vdpa_callback *cb)

2021-02-24 12:54:57

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero


On 2021/2/24 2:46 下午, Michael S. Tsirkin wrote:
> On Wed, Feb 24, 2021 at 02:04:36PM +0800, Jason Wang wrote:
>> On 2021/2/24 1:04 下午, Michael S. Tsirkin wrote:
>>> On Tue, Feb 23, 2021 at 11:35:57AM -0800, Si-Wei Liu wrote:
>>>> On 2/23/2021 5:26 AM, Michael S. Tsirkin wrote:
>>>>> On Tue, Feb 23, 2021 at 10:03:57AM +0800, Jason Wang wrote:
>>>>>> On 2021/2/23 9:12 上午, Si-Wei Liu wrote:
>>>>>>> On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote:
>>>>>>>> On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote:
>>>>>>>>> On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
>>>>>>>>>> Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
>>>>>>>>>> for legacy") made an exception for legacy guests to reset
>>>>>>>>>> features to 0, when config space is accessed before features
>>>>>>>>>> are set. We should relieve the verify_min_features() check
>>>>>>>>>> and allow features reset to 0 for this case.
>>>>>>>>>>
>>>>>>>>>> It's worth noting that not just legacy guests could access
>>>>>>>>>> config space before features are set. For instance, when
>>>>>>>>>> feature VIRTIO_NET_F_MTU is advertised some modern driver
>>>>>>>>>> will try to access and validate the MTU present in the config
>>>>>>>>>> space before virtio features are set.
>>>>>>>>> This looks like a spec violation:
>>>>>>>>>
>>>>>>>>> "
>>>>>>>>>
>>>>>>>>> The following driver-read-only field, mtu only exists if
>>>>>>>>> VIRTIO_NET_F_MTU is
>>>>>>>>> set.
>>>>>>>>> This field specifies the maximum MTU for the driver to use.
>>>>>>>>> "
>>>>>>>>>
>>>>>>>>> Do we really want to workaround this?
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>> And also:
>>>>>>>>
>>>>>>>> The driver MUST follow this sequence to initialize a device:
>>>>>>>> 1. Reset the device.
>>>>>>>> 2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
>>>>>>>> 3. Set the DRIVER status bit: the guest OS knows how to drive the
>>>>>>>> device.
>>>>>>>> 4. Read device feature bits, and write the subset of feature bits
>>>>>>>> understood by the OS and driver to the
>>>>>>>> device. During this step the driver MAY read (but MUST NOT write)
>>>>>>>> the device-specific configuration
>>>>>>>> fields to check that it can support the device before accepting it.
>>>>>>>> 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new
>>>>>>>> feature bits after this step.
>>>>>>>> 6. Re-read device status to ensure the FEATURES_OK bit is still set:
>>>>>>>> otherwise, the device does not
>>>>>>>> support our subset of features and the device is unusable.
>>>>>>>> 7. Perform device-specific setup, including discovery of virtqueues
>>>>>>>> for the device, optional per-bus setup,
>>>>>>>> reading and possibly writing the device’s virtio configuration
>>>>>>>> space, and population of virtqueues.
>>>>>>>> 8. Set the DRIVER_OK status bit. At this point the device is “live”.
>>>>>>>>
>>>>>>>>
>>>>>>>> so accessing config space before FEATURES_OK is a spec violation, right?
>>>>>>> It is, but it's not relevant to what this commit tries to address. I
>>>>>>> thought the legacy guest still needs to be supported.
>>>>>>>
>>>>>>> Having said, a separate patch has to be posted to fix the guest driver
>>>>>>> issue where this discrepancy is introduced to virtnet_validate() (since
>>>>>>> commit fe36cbe067). But it's not technically related to this patch.
>>>>>>>
>>>>>>> -Siwei
>>>>>> I think it's a bug to read config space in validate, we should move it to
>>>>>> virtnet_probe().
>>>>>>
>>>>>> Thanks
>>>>> I take it back, reading but not writing seems to be explicitly allowed by spec.
>>>>> So our way to detect a legacy guest is bogus, need to think what is
>>>>> the best way to handle this.
>>>> Then maybe revert commit fe36cbe067 and friends, and have QEMU detect legacy
>>>> guest? Supposedly only config space write access needs to be guarded before
>>>> setting FEATURES_OK.
>>>>
>>>> -Siwie
>>> Detecting it isn't enough though, we will need a new ioctl to notify
>>> the kernel that it's a legacy guest. Ugh :(
>>
>> I'm not sure I get this, how can we know if there's a legacy driver before
>> set_features()?
> qemu knows for sure. It does not communicate this information to the
> kernel right now unfortunately.


I may miss something, but I still don't get how the new ioctl is
supposed to work.

Thanks


>
>> And I wonder what will hapeen if we just revert the set_features(0)?
>>
>> Thanks
>>
>>
>>>
>>>>>>>>>> Rejecting reset to 0
>>>>>>>>>> prematurely causes correct MTU and link status unable to load
>>>>>>>>>> for the very first config space access, rendering issues like
>>>>>>>>>> guest showing inaccurate MTU value, or failure to reject
>>>>>>>>>> out-of-range MTU.
>>>>>>>>>>
>>>>>>>>>> Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for
>>>>>>>>>> supported mlx5 devices")
>>>>>>>>>> Signed-off-by: Si-Wei Liu <[email protected]>
>>>>>>>>>> ---
>>>>>>>>>>    drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--------------
>>>>>>>>>>    1 file changed, 1 insertion(+), 14 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>>> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>>> index 7c1f789..540dd67 100644
>>>>>>>>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>>> @@ -1490,14 +1490,6 @@ static u64
>>>>>>>>>> mlx5_vdpa_get_features(struct vdpa_device *vdev)
>>>>>>>>>>        return mvdev->mlx_features;
>>>>>>>>>>    }
>>>>>>>>>> -static int verify_min_features(struct mlx5_vdpa_dev *mvdev,
>>>>>>>>>> u64 features)
>>>>>>>>>> -{
>>>>>>>>>> -    if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
>>>>>>>>>> -        return -EOPNOTSUPP;
>>>>>>>>>> -
>>>>>>>>>> -    return 0;
>>>>>>>>>> -}
>>>>>>>>>> -
>>>>>>>>>>    static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
>>>>>>>>>>    {
>>>>>>>>>>        int err;
>>>>>>>>>> @@ -1558,18 +1550,13 @@ static int
>>>>>>>>>> mlx5_vdpa_set_features(struct vdpa_device *vdev, u64
>>>>>>>>>> features)
>>>>>>>>>>    {
>>>>>>>>>>        struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>>>>>>>>>>        struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
>>>>>>>>>> -    int err;
>>>>>>>>>>        print_features(mvdev, features, true);
>>>>>>>>>> -    err = verify_min_features(mvdev, features);
>>>>>>>>>> -    if (err)
>>>>>>>>>> -        return err;
>>>>>>>>>> -
>>>>>>>>>>        ndev->mvdev.actual_features = features &
>>>>>>>>>> ndev->mvdev.mlx_features;
>>>>>>>>>>        ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
>>>>>>>>>>        ndev->config.status |= cpu_to_mlx5vdpa16(mvdev,
>>>>>>>>>> VIRTIO_NET_S_LINK_UP);
>>>>>>>>>> -    return err;
>>>>>>>>>> +    return 0;
>>>>>>>>>>    }
>>>>>>>>>>    static void mlx5_vdpa_set_config_cb(struct vdpa_device
>>>>>>>>>> *vdev, struct vdpa_callback *cb)

2021-02-24 12:55:18

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

On Wed, Feb 24, 2021 at 02:04:36PM +0800, Jason Wang wrote:
>
> On 2021/2/24 1:04 下午, Michael S. Tsirkin wrote:
> > On Tue, Feb 23, 2021 at 11:35:57AM -0800, Si-Wei Liu wrote:
> > >
> > > On 2/23/2021 5:26 AM, Michael S. Tsirkin wrote:
> > > > On Tue, Feb 23, 2021 at 10:03:57AM +0800, Jason Wang wrote:
> > > > > On 2021/2/23 9:12 上午, Si-Wei Liu wrote:
> > > > > > On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote:
> > > > > > > On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote:
> > > > > > > > On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
> > > > > > > > > Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
> > > > > > > > > for legacy") made an exception for legacy guests to reset
> > > > > > > > > features to 0, when config space is accessed before features
> > > > > > > > > are set. We should relieve the verify_min_features() check
> > > > > > > > > and allow features reset to 0 for this case.
> > > > > > > > >
> > > > > > > > > It's worth noting that not just legacy guests could access
> > > > > > > > > config space before features are set. For instance, when
> > > > > > > > > feature VIRTIO_NET_F_MTU is advertised some modern driver
> > > > > > > > > will try to access and validate the MTU present in the config
> > > > > > > > > space before virtio features are set.
> > > > > > > > This looks like a spec violation:
> > > > > > > >
> > > > > > > > "
> > > > > > > >
> > > > > > > > The following driver-read-only field, mtu only exists if
> > > > > > > > VIRTIO_NET_F_MTU is
> > > > > > > > set.
> > > > > > > > This field specifies the maximum MTU for the driver to use.
> > > > > > > > "
> > > > > > > >
> > > > > > > > Do we really want to workaround this?
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > And also:
> > > > > > >
> > > > > > > The driver MUST follow this sequence to initialize a device:
> > > > > > > 1. Reset the device.
> > > > > > > 2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
> > > > > > > 3. Set the DRIVER status bit: the guest OS knows how to drive the
> > > > > > > device.
> > > > > > > 4. Read device feature bits, and write the subset of feature bits
> > > > > > > understood by the OS and driver to the
> > > > > > > device. During this step the driver MAY read (but MUST NOT write)
> > > > > > > the device-specific configuration
> > > > > > > fields to check that it can support the device before accepting it.
> > > > > > > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new
> > > > > > > feature bits after this step.
> > > > > > > 6. Re-read device status to ensure the FEATURES_OK bit is still set:
> > > > > > > otherwise, the device does not
> > > > > > > support our subset of features and the device is unusable.
> > > > > > > 7. Perform device-specific setup, including discovery of virtqueues
> > > > > > > for the device, optional per-bus setup,
> > > > > > > reading and possibly writing the device’s virtio configuration
> > > > > > > space, and population of virtqueues.
> > > > > > > 8. Set the DRIVER_OK status bit. At this point the device is “live”.
> > > > > > >
> > > > > > >
> > > > > > > so accessing config space before FEATURES_OK is a spec violation, right?
> > > > > > It is, but it's not relevant to what this commit tries to address. I
> > > > > > thought the legacy guest still needs to be supported.
> > > > > >
> > > > > > Having said, a separate patch has to be posted to fix the guest driver
> > > > > > issue where this discrepancy is introduced to virtnet_validate() (since
> > > > > > commit fe36cbe067). But it's not technically related to this patch.
> > > > > >
> > > > > > -Siwei
> > > > > I think it's a bug to read config space in validate, we should move it to
> > > > > virtnet_probe().
> > > > >
> > > > > Thanks
> > > > I take it back, reading but not writing seems to be explicitly allowed by spec.
> > > > So our way to detect a legacy guest is bogus, need to think what is
> > > > the best way to handle this.
> > > Then maybe revert commit fe36cbe067 and friends, and have QEMU detect legacy
> > > guest? Supposedly only config space write access needs to be guarded before
> > > setting FEATURES_OK.
> > >
> > > -Siwie
> > Detecting it isn't enough though, we will need a new ioctl to notify
> > the kernel that it's a legacy guest. Ugh :(
>
>
> I'm not sure I get this, how can we know if there's a legacy driver before
> set_features()?

qemu knows for sure. It does not communicate this information to the
kernel right now unfortunately.

> And I wonder what will hapeen if we just revert the set_features(0)?
>
> Thanks
>
>
> >
> >
> > > > > > > > > Rejecting reset to 0
> > > > > > > > > prematurely causes correct MTU and link status unable to load
> > > > > > > > > for the very first config space access, rendering issues like
> > > > > > > > > guest showing inaccurate MTU value, or failure to reject
> > > > > > > > > out-of-range MTU.
> > > > > > > > >
> > > > > > > > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for
> > > > > > > > > supported mlx5 devices")
> > > > > > > > > Signed-off-by: Si-Wei Liu <[email protected]>
> > > > > > > > > ---
> > > > > > > > >    drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--------------
> > > > > > > > >    1 file changed, 1 insertion(+), 14 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > index 7c1f789..540dd67 100644
> > > > > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > @@ -1490,14 +1490,6 @@ static u64
> > > > > > > > > mlx5_vdpa_get_features(struct vdpa_device *vdev)
> > > > > > > > >        return mvdev->mlx_features;
> > > > > > > > >    }
> > > > > > > > > -static int verify_min_features(struct mlx5_vdpa_dev *mvdev,
> > > > > > > > > u64 features)
> > > > > > > > > -{
> > > > > > > > > -    if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
> > > > > > > > > -        return -EOPNOTSUPP;
> > > > > > > > > -
> > > > > > > > > -    return 0;
> > > > > > > > > -}
> > > > > > > > > -
> > > > > > > > >    static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
> > > > > > > > >    {
> > > > > > > > >        int err;
> > > > > > > > > @@ -1558,18 +1550,13 @@ static int
> > > > > > > > > mlx5_vdpa_set_features(struct vdpa_device *vdev, u64
> > > > > > > > > features)
> > > > > > > > >    {
> > > > > > > > >        struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> > > > > > > > >        struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> > > > > > > > > -    int err;
> > > > > > > > >        print_features(mvdev, features, true);
> > > > > > > > > -    err = verify_min_features(mvdev, features);
> > > > > > > > > -    if (err)
> > > > > > > > > -        return err;
> > > > > > > > > -
> > > > > > > > >        ndev->mvdev.actual_features = features &
> > > > > > > > > ndev->mvdev.mlx_features;
> > > > > > > > >        ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
> > > > > > > > >        ndev->config.status |= cpu_to_mlx5vdpa16(mvdev,
> > > > > > > > > VIRTIO_NET_S_LINK_UP);
> > > > > > > > > -    return err;
> > > > > > > > > +    return 0;
> > > > > > > > >    }
> > > > > > > > >    static void mlx5_vdpa_set_config_cb(struct vdpa_device
> > > > > > > > > *vdev, struct vdpa_callback *cb)

2021-02-24 12:55:58

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero


On 2021/2/24 2:47 下午, Michael S. Tsirkin wrote:
> On Wed, Feb 24, 2021 at 08:45:20AM +0200, Eli Cohen wrote:
>> On Wed, Feb 24, 2021 at 12:17:58AM -0500, Michael S. Tsirkin wrote:
>>> On Wed, Feb 24, 2021 at 11:20:01AM +0800, Jason Wang wrote:
>>>> On 2021/2/24 3:35 上午, Si-Wei Liu wrote:
>>>>>
>>>>> On 2/23/2021 5:26 AM, Michael S. Tsirkin wrote:
>>>>>> On Tue, Feb 23, 2021 at 10:03:57AM +0800, Jason Wang wrote:
>>>>>>> On 2021/2/23 9:12 上午, Si-Wei Liu wrote:
>>>>>>>> On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote:
>>>>>>>>> On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote:
>>>>>>>>>> On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
>>>>>>>>>>> Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
>>>>>>>>>>> for legacy") made an exception for legacy guests to reset
>>>>>>>>>>> features to 0, when config space is accessed before features
>>>>>>>>>>> are set. We should relieve the verify_min_features() check
>>>>>>>>>>> and allow features reset to 0 for this case.
>>>>>>>>>>>
>>>>>>>>>>> It's worth noting that not just legacy guests could access
>>>>>>>>>>> config space before features are set. For instance, when
>>>>>>>>>>> feature VIRTIO_NET_F_MTU is advertised some modern driver
>>>>>>>>>>> will try to access and validate the MTU present in the config
>>>>>>>>>>> space before virtio features are set.
>>>>>>>>>> This looks like a spec violation:
>>>>>>>>>>
>>>>>>>>>> "
>>>>>>>>>>
>>>>>>>>>> The following driver-read-only field, mtu only exists if
>>>>>>>>>> VIRTIO_NET_F_MTU is
>>>>>>>>>> set.
>>>>>>>>>> This field specifies the maximum MTU for the driver to use.
>>>>>>>>>> "
>>>>>>>>>>
>>>>>>>>>> Do we really want to workaround this?
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>> And also:
>>>>>>>>>
>>>>>>>>> The driver MUST follow this sequence to initialize a device:
>>>>>>>>> 1. Reset the device.
>>>>>>>>> 2. Set the ACKNOWLEDGE status bit: the guest OS has
>>>>>>>>> noticed the device.
>>>>>>>>> 3. Set the DRIVER status bit: the guest OS knows how to drive the
>>>>>>>>> device.
>>>>>>>>> 4. Read device feature bits, and write the subset of feature bits
>>>>>>>>> understood by the OS and driver to the
>>>>>>>>> device. During this step the driver MAY read (but MUST NOT write)
>>>>>>>>> the device-specific configuration
>>>>>>>>> fields to check that it can support the device before accepting it.
>>>>>>>>> 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new
>>>>>>>>> feature bits after this step.
>>>>>>>>> 6. Re-read device status to ensure the FEATURES_OK bit is still set:
>>>>>>>>> otherwise, the device does not
>>>>>>>>> support our subset of features and the device is unusable.
>>>>>>>>> 7. Perform device-specific setup, including discovery of virtqueues
>>>>>>>>> for the device, optional per-bus setup,
>>>>>>>>> reading and possibly writing the device’s virtio configuration
>>>>>>>>> space, and population of virtqueues.
>>>>>>>>> 8. Set the DRIVER_OK status bit. At this point the device is “live”.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> so accessing config space before FEATURES_OK is a spec
>>>>>>>>> violation, right?
>>>>>>>> It is, but it's not relevant to what this commit tries to address. I
>>>>>>>> thought the legacy guest still needs to be supported.
>>>>>>>>
>>>>>>>> Having said, a separate patch has to be posted to fix the guest driver
>>>>>>>> issue where this discrepancy is introduced to
>>>>>>>> virtnet_validate() (since
>>>>>>>> commit fe36cbe067). But it's not technically related to this patch.
>>>>>>>>
>>>>>>>> -Siwei
>>>>>>> I think it's a bug to read config space in validate, we should
>>>>>>> move it to
>>>>>>> virtnet_probe().
>>>>>>>
>>>>>>> Thanks
>>>>>> I take it back, reading but not writing seems to be explicitly
>>>>>> allowed by spec.
>>>>>> So our way to detect a legacy guest is bogus, need to think what is
>>>>>> the best way to handle this.
>>>>> Then maybe revert commit fe36cbe067 and friends, and have QEMU detect
>>>>> legacy guest? Supposedly only config space write access needs to be
>>>>> guarded before setting FEATURES_OK.
>>>>
>>>> I agree. My understanding is that all vDPA must be modern device (since
>>>> VIRITO_F_ACCESS_PLATFORM is mandated) instead of transitional device.
>>>>
>>>> Thanks
>>> Well mlx5 has some code to handle legacy guests ...
>>> Eli, could you comment? Is that support unused right now?
>>>
>> If you mean support for version 1.0, well the knob is there but it's not
>> set in the firmware I use. Note sure if we will support this.
> Hmm you mean it's legacy only right now?
> Well at some point you will want advanced goodies like RSS
> and all that is gated on 1.0 ;)


So if my understanding is correct the device/firmware is legacy but
require VIRTIO_F_ACCESS_PLATFORM semanic? Looks like a spec violation?

Thanks


>
>>>>> -Siwie
>>>>>
>>>>>>>>>>> Rejecting reset to 0
>>>>>>>>>>> prematurely causes correct MTU and link status unable to load
>>>>>>>>>>> for the very first config space access, rendering issues like
>>>>>>>>>>> guest showing inaccurate MTU value, or failure to reject
>>>>>>>>>>> out-of-range MTU.
>>>>>>>>>>>
>>>>>>>>>>> Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for
>>>>>>>>>>> supported mlx5 devices")
>>>>>>>>>>> Signed-off-by: Si-Wei Liu <[email protected]>
>>>>>>>>>>> ---
>>>>>>>>>>>     drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--------------
>>>>>>>>>>>     1 file changed, 1 insertion(+), 14 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>>>> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>>>> index 7c1f789..540dd67 100644
>>>>>>>>>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>>>> @@ -1490,14 +1490,6 @@ static u64
>>>>>>>>>>> mlx5_vdpa_get_features(struct vdpa_device *vdev)
>>>>>>>>>>>         return mvdev->mlx_features;
>>>>>>>>>>>     }
>>>>>>>>>>> -static int verify_min_features(struct mlx5_vdpa_dev *mvdev,
>>>>>>>>>>> u64 features)
>>>>>>>>>>> -{
>>>>>>>>>>> -    if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
>>>>>>>>>>> -        return -EOPNOTSUPP;
>>>>>>>>>>> -
>>>>>>>>>>> -    return 0;
>>>>>>>>>>> -}
>>>>>>>>>>> -
>>>>>>>>>>>     static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
>>>>>>>>>>>     {
>>>>>>>>>>>         int err;
>>>>>>>>>>> @@ -1558,18 +1550,13 @@ static int
>>>>>>>>>>> mlx5_vdpa_set_features(struct vdpa_device *vdev, u64
>>>>>>>>>>> features)
>>>>>>>>>>>     {
>>>>>>>>>>>         struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>>>>>>>>>>>         struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
>>>>>>>>>>> -    int err;
>>>>>>>>>>>         print_features(mvdev, features, true);
>>>>>>>>>>> -    err = verify_min_features(mvdev, features);
>>>>>>>>>>> -    if (err)
>>>>>>>>>>> -        return err;
>>>>>>>>>>> -
>>>>>>>>>>>         ndev->mvdev.actual_features = features &
>>>>>>>>>>> ndev->mvdev.mlx_features;
>>>>>>>>>>>         ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
>>>>>>>>>>>         ndev->config.status |= cpu_to_mlx5vdpa16(mvdev,
>>>>>>>>>>> VIRTIO_NET_S_LINK_UP);
>>>>>>>>>>> -    return err;
>>>>>>>>>>> +    return 0;
>>>>>>>>>>>     }
>>>>>>>>>>>     static void mlx5_vdpa_set_config_cb(struct vdpa_device
>>>>>>>>>>> *vdev, struct vdpa_callback *cb)

2021-02-24 12:57:07

by Eli Cohen

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

On Wed, Feb 24, 2021 at 02:55:13PM +0800, Jason Wang wrote:
>
> On 2021/2/24 2:47 下午, Michael S. Tsirkin wrote:
> > On Wed, Feb 24, 2021 at 08:45:20AM +0200, Eli Cohen wrote:
> > > On Wed, Feb 24, 2021 at 12:17:58AM -0500, Michael S. Tsirkin wrote:
> > > > On Wed, Feb 24, 2021 at 11:20:01AM +0800, Jason Wang wrote:
> > > > > On 2021/2/24 3:35 上午, Si-Wei Liu wrote:
> > > > > >
> > > > > > On 2/23/2021 5:26 AM, Michael S. Tsirkin wrote:
> > > > > > > On Tue, Feb 23, 2021 at 10:03:57AM +0800, Jason Wang wrote:
> > > > > > > > On 2021/2/23 9:12 上午, Si-Wei Liu wrote:
> > > > > > > > > On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote:
> > > > > > > > > > On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote:
> > > > > > > > > > > On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
> > > > > > > > > > > > Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
> > > > > > > > > > > > for legacy") made an exception for legacy guests to reset
> > > > > > > > > > > > features to 0, when config space is accessed before features
> > > > > > > > > > > > are set. We should relieve the verify_min_features() check
> > > > > > > > > > > > and allow features reset to 0 for this case.
> > > > > > > > > > > >
> > > > > > > > > > > > It's worth noting that not just legacy guests could access
> > > > > > > > > > > > config space before features are set. For instance, when
> > > > > > > > > > > > feature VIRTIO_NET_F_MTU is advertised some modern driver
> > > > > > > > > > > > will try to access and validate the MTU present in the config
> > > > > > > > > > > > space before virtio features are set.
> > > > > > > > > > > This looks like a spec violation:
> > > > > > > > > > >
> > > > > > > > > > > "
> > > > > > > > > > >
> > > > > > > > > > > The following driver-read-only field, mtu only exists if
> > > > > > > > > > > VIRTIO_NET_F_MTU is
> > > > > > > > > > > set.
> > > > > > > > > > > This field specifies the maximum MTU for the driver to use.
> > > > > > > > > > > "
> > > > > > > > > > >
> > > > > > > > > > > Do we really want to workaround this?
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > And also:
> > > > > > > > > >
> > > > > > > > > > The driver MUST follow this sequence to initialize a device:
> > > > > > > > > > 1. Reset the device.
> > > > > > > > > > 2. Set the ACKNOWLEDGE status bit: the guest OS has
> > > > > > > > > > noticed the device.
> > > > > > > > > > 3. Set the DRIVER status bit: the guest OS knows how to drive the
> > > > > > > > > > device.
> > > > > > > > > > 4. Read device feature bits, and write the subset of feature bits
> > > > > > > > > > understood by the OS and driver to the
> > > > > > > > > > device. During this step the driver MAY read (but MUST NOT write)
> > > > > > > > > > the device-specific configuration
> > > > > > > > > > fields to check that it can support the device before accepting it.
> > > > > > > > > > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new
> > > > > > > > > > feature bits after this step.
> > > > > > > > > > 6. Re-read device status to ensure the FEATURES_OK bit is still set:
> > > > > > > > > > otherwise, the device does not
> > > > > > > > > > support our subset of features and the device is unusable.
> > > > > > > > > > 7. Perform device-specific setup, including discovery of virtqueues
> > > > > > > > > > for the device, optional per-bus setup,
> > > > > > > > > > reading and possibly writing the device’s virtio configuration
> > > > > > > > > > space, and population of virtqueues.
> > > > > > > > > > 8. Set the DRIVER_OK status bit. At this point the device is “live”.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > so accessing config space before FEATURES_OK is a spec
> > > > > > > > > > violation, right?
> > > > > > > > > It is, but it's not relevant to what this commit tries to address. I
> > > > > > > > > thought the legacy guest still needs to be supported.
> > > > > > > > >
> > > > > > > > > Having said, a separate patch has to be posted to fix the guest driver
> > > > > > > > > issue where this discrepancy is introduced to
> > > > > > > > > virtnet_validate() (since
> > > > > > > > > commit fe36cbe067). But it's not technically related to this patch.
> > > > > > > > >
> > > > > > > > > -Siwei
> > > > > > > > I think it's a bug to read config space in validate, we should
> > > > > > > > move it to
> > > > > > > > virtnet_probe().
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > I take it back, reading but not writing seems to be explicitly
> > > > > > > allowed by spec.
> > > > > > > So our way to detect a legacy guest is bogus, need to think what is
> > > > > > > the best way to handle this.
> > > > > > Then maybe revert commit fe36cbe067 and friends, and have QEMU detect
> > > > > > legacy guest? Supposedly only config space write access needs to be
> > > > > > guarded before setting FEATURES_OK.
> > > > >
> > > > > I agree. My understanding is that all vDPA must be modern device (since
> > > > > VIRITO_F_ACCESS_PLATFORM is mandated) instead of transitional device.
> > > > >
> > > > > Thanks
> > > > Well mlx5 has some code to handle legacy guests ...
> > > > Eli, could you comment? Is that support unused right now?
> > > >
> > > If you mean support for version 1.0, well the knob is there but it's not
> > > set in the firmware I use. Note sure if we will support this.
> > Hmm you mean it's legacy only right now?
> > Well at some point you will want advanced goodies like RSS
> > and all that is gated on 1.0 ;)
>
>
> So if my understanding is correct the device/firmware is legacy but require
> VIRTIO_F_ACCESS_PLATFORM semanic? Looks like a spec violation?
>

I am checking this with some folks here.

> Thanks
>
>
> >
> > > > > > -Siwie
> > > > > >
> > > > > > > > > > > > Rejecting reset to 0
> > > > > > > > > > > > prematurely causes correct MTU and link status unable to load
> > > > > > > > > > > > for the very first config space access, rendering issues like
> > > > > > > > > > > > guest showing inaccurate MTU value, or failure to reject
> > > > > > > > > > > > out-of-range MTU.
> > > > > > > > > > > >
> > > > > > > > > > > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for
> > > > > > > > > > > > supported mlx5 devices")
> > > > > > > > > > > > Signed-off-by: Si-Wei Liu <[email protected]>
> > > > > > > > > > > > ---
> > > > > > > > > > > >     drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--------------
> > > > > > > > > > > >     1 file changed, 1 insertion(+), 14 deletions(-)
> > > > > > > > > > > >
> > > > > > > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > > > index 7c1f789..540dd67 100644
> > > > > > > > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > > > @@ -1490,14 +1490,6 @@ static u64
> > > > > > > > > > > > mlx5_vdpa_get_features(struct vdpa_device *vdev)
> > > > > > > > > > > >         return mvdev->mlx_features;
> > > > > > > > > > > >     }
> > > > > > > > > > > > -static int verify_min_features(struct mlx5_vdpa_dev *mvdev,
> > > > > > > > > > > > u64 features)
> > > > > > > > > > > > -{
> > > > > > > > > > > > -    if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
> > > > > > > > > > > > -        return -EOPNOTSUPP;
> > > > > > > > > > > > -
> > > > > > > > > > > > -    return 0;
> > > > > > > > > > > > -}
> > > > > > > > > > > > -
> > > > > > > > > > > >     static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
> > > > > > > > > > > >     {
> > > > > > > > > > > >         int err;
> > > > > > > > > > > > @@ -1558,18 +1550,13 @@ static int
> > > > > > > > > > > > mlx5_vdpa_set_features(struct vdpa_device *vdev, u64
> > > > > > > > > > > > features)
> > > > > > > > > > > >     {
> > > > > > > > > > > >         struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> > > > > > > > > > > >         struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> > > > > > > > > > > > -    int err;
> > > > > > > > > > > >         print_features(mvdev, features, true);
> > > > > > > > > > > > -    err = verify_min_features(mvdev, features);
> > > > > > > > > > > > -    if (err)
> > > > > > > > > > > > -        return err;
> > > > > > > > > > > > -
> > > > > > > > > > > >         ndev->mvdev.actual_features = features &
> > > > > > > > > > > > ndev->mvdev.mlx_features;
> > > > > > > > > > > >         ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
> > > > > > > > > > > >         ndev->config.status |= cpu_to_mlx5vdpa16(mvdev,
> > > > > > > > > > > > VIRTIO_NET_S_LINK_UP);
> > > > > > > > > > > > -    return err;
> > > > > > > > > > > > +    return 0;
> > > > > > > > > > > >     }
> > > > > > > > > > > >     static void mlx5_vdpa_set_config_cb(struct vdpa_device
> > > > > > > > > > > > *vdev, struct vdpa_callback *cb)
>

2021-02-24 12:58:42

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

On Wed, Feb 24, 2021 at 02:55:13PM +0800, Jason Wang wrote:
>
> On 2021/2/24 2:47 下午, Michael S. Tsirkin wrote:
> > On Wed, Feb 24, 2021 at 08:45:20AM +0200, Eli Cohen wrote:
> > > On Wed, Feb 24, 2021 at 12:17:58AM -0500, Michael S. Tsirkin wrote:
> > > > On Wed, Feb 24, 2021 at 11:20:01AM +0800, Jason Wang wrote:
> > > > > On 2021/2/24 3:35 上午, Si-Wei Liu wrote:
> > > > > >
> > > > > > On 2/23/2021 5:26 AM, Michael S. Tsirkin wrote:
> > > > > > > On Tue, Feb 23, 2021 at 10:03:57AM +0800, Jason Wang wrote:
> > > > > > > > On 2021/2/23 9:12 上午, Si-Wei Liu wrote:
> > > > > > > > > On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote:
> > > > > > > > > > On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote:
> > > > > > > > > > > On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
> > > > > > > > > > > > Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
> > > > > > > > > > > > for legacy") made an exception for legacy guests to reset
> > > > > > > > > > > > features to 0, when config space is accessed before features
> > > > > > > > > > > > are set. We should relieve the verify_min_features() check
> > > > > > > > > > > > and allow features reset to 0 for this case.
> > > > > > > > > > > >
> > > > > > > > > > > > It's worth noting that not just legacy guests could access
> > > > > > > > > > > > config space before features are set. For instance, when
> > > > > > > > > > > > feature VIRTIO_NET_F_MTU is advertised some modern driver
> > > > > > > > > > > > will try to access and validate the MTU present in the config
> > > > > > > > > > > > space before virtio features are set.
> > > > > > > > > > > This looks like a spec violation:
> > > > > > > > > > >
> > > > > > > > > > > "
> > > > > > > > > > >
> > > > > > > > > > > The following driver-read-only field, mtu only exists if
> > > > > > > > > > > VIRTIO_NET_F_MTU is
> > > > > > > > > > > set.
> > > > > > > > > > > This field specifies the maximum MTU for the driver to use.
> > > > > > > > > > > "
> > > > > > > > > > >
> > > > > > > > > > > Do we really want to workaround this?
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > And also:
> > > > > > > > > >
> > > > > > > > > > The driver MUST follow this sequence to initialize a device:
> > > > > > > > > > 1. Reset the device.
> > > > > > > > > > 2. Set the ACKNOWLEDGE status bit: the guest OS has
> > > > > > > > > > noticed the device.
> > > > > > > > > > 3. Set the DRIVER status bit: the guest OS knows how to drive the
> > > > > > > > > > device.
> > > > > > > > > > 4. Read device feature bits, and write the subset of feature bits
> > > > > > > > > > understood by the OS and driver to the
> > > > > > > > > > device. During this step the driver MAY read (but MUST NOT write)
> > > > > > > > > > the device-specific configuration
> > > > > > > > > > fields to check that it can support the device before accepting it.
> > > > > > > > > > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new
> > > > > > > > > > feature bits after this step.
> > > > > > > > > > 6. Re-read device status to ensure the FEATURES_OK bit is still set:
> > > > > > > > > > otherwise, the device does not
> > > > > > > > > > support our subset of features and the device is unusable.
> > > > > > > > > > 7. Perform device-specific setup, including discovery of virtqueues
> > > > > > > > > > for the device, optional per-bus setup,
> > > > > > > > > > reading and possibly writing the device’s virtio configuration
> > > > > > > > > > space, and population of virtqueues.
> > > > > > > > > > 8. Set the DRIVER_OK status bit. At this point the device is “live”.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > so accessing config space before FEATURES_OK is a spec
> > > > > > > > > > violation, right?
> > > > > > > > > It is, but it's not relevant to what this commit tries to address. I
> > > > > > > > > thought the legacy guest still needs to be supported.
> > > > > > > > >
> > > > > > > > > Having said, a separate patch has to be posted to fix the guest driver
> > > > > > > > > issue where this discrepancy is introduced to
> > > > > > > > > virtnet_validate() (since
> > > > > > > > > commit fe36cbe067). But it's not technically related to this patch.
> > > > > > > > >
> > > > > > > > > -Siwei
> > > > > > > > I think it's a bug to read config space in validate, we should
> > > > > > > > move it to
> > > > > > > > virtnet_probe().
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > I take it back, reading but not writing seems to be explicitly
> > > > > > > allowed by spec.
> > > > > > > So our way to detect a legacy guest is bogus, need to think what is
> > > > > > > the best way to handle this.
> > > > > > Then maybe revert commit fe36cbe067 and friends, and have QEMU detect
> > > > > > legacy guest? Supposedly only config space write access needs to be
> > > > > > guarded before setting FEATURES_OK.
> > > > >
> > > > > I agree. My understanding is that all vDPA must be modern device (since
> > > > > VIRITO_F_ACCESS_PLATFORM is mandated) instead of transitional device.
> > > > >
> > > > > Thanks
> > > > Well mlx5 has some code to handle legacy guests ...
> > > > Eli, could you comment? Is that support unused right now?
> > > >
> > > If you mean support for version 1.0, well the knob is there but it's not
> > > set in the firmware I use. Note sure if we will support this.
> > Hmm you mean it's legacy only right now?
> > Well at some point you will want advanced goodies like RSS
> > and all that is gated on 1.0 ;)
>
>
> So if my understanding is correct the device/firmware is legacy but require
> VIRTIO_F_ACCESS_PLATFORM semanic? Looks like a spec violation?
>
> Thanks

Legacy mode description is the spec is non-normative. As such as long as
guests work, they work ;)

>
> >
> > > > > > -Siwie
> > > > > >
> > > > > > > > > > > > Rejecting reset to 0
> > > > > > > > > > > > prematurely causes correct MTU and link status unable to load
> > > > > > > > > > > > for the very first config space access, rendering issues like
> > > > > > > > > > > > guest showing inaccurate MTU value, or failure to reject
> > > > > > > > > > > > out-of-range MTU.
> > > > > > > > > > > >
> > > > > > > > > > > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for
> > > > > > > > > > > > supported mlx5 devices")
> > > > > > > > > > > > Signed-off-by: Si-Wei Liu <[email protected]>
> > > > > > > > > > > > ---
> > > > > > > > > > > >     drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--------------
> > > > > > > > > > > >     1 file changed, 1 insertion(+), 14 deletions(-)
> > > > > > > > > > > >
> > > > > > > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > > > index 7c1f789..540dd67 100644
> > > > > > > > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > > > @@ -1490,14 +1490,6 @@ static u64
> > > > > > > > > > > > mlx5_vdpa_get_features(struct vdpa_device *vdev)
> > > > > > > > > > > >         return mvdev->mlx_features;
> > > > > > > > > > > >     }
> > > > > > > > > > > > -static int verify_min_features(struct mlx5_vdpa_dev *mvdev,
> > > > > > > > > > > > u64 features)
> > > > > > > > > > > > -{
> > > > > > > > > > > > -    if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
> > > > > > > > > > > > -        return -EOPNOTSUPP;
> > > > > > > > > > > > -
> > > > > > > > > > > > -    return 0;
> > > > > > > > > > > > -}
> > > > > > > > > > > > -
> > > > > > > > > > > >     static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
> > > > > > > > > > > >     {
> > > > > > > > > > > >         int err;
> > > > > > > > > > > > @@ -1558,18 +1550,13 @@ static int
> > > > > > > > > > > > mlx5_vdpa_set_features(struct vdpa_device *vdev, u64
> > > > > > > > > > > > features)
> > > > > > > > > > > >     {
> > > > > > > > > > > >         struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> > > > > > > > > > > >         struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> > > > > > > > > > > > -    int err;
> > > > > > > > > > > >         print_features(mvdev, features, true);
> > > > > > > > > > > > -    err = verify_min_features(mvdev, features);
> > > > > > > > > > > > -    if (err)
> > > > > > > > > > > > -        return err;
> > > > > > > > > > > > -
> > > > > > > > > > > >         ndev->mvdev.actual_features = features &
> > > > > > > > > > > > ndev->mvdev.mlx_features;
> > > > > > > > > > > >         ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
> > > > > > > > > > > >         ndev->config.status |= cpu_to_mlx5vdpa16(mvdev,
> > > > > > > > > > > > VIRTIO_NET_S_LINK_UP);
> > > > > > > > > > > > -    return err;
> > > > > > > > > > > > +    return 0;
> > > > > > > > > > > >     }
> > > > > > > > > > > >     static void mlx5_vdpa_set_config_cb(struct vdpa_device
> > > > > > > > > > > > *vdev, struct vdpa_callback *cb)

2021-02-24 13:07:42

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero


On 2021/2/24 4:43 下午, Michael S. Tsirkin wrote:
> On Wed, Feb 24, 2021 at 04:26:43PM +0800, Jason Wang wrote:
>> Basically on first guest access QEMU would tell kernel whether
>> guest is using the legacy or the modern interface.
>> E.g. virtio_pci_config_read/virtio_pci_config_write will call ioctl(ENABLE_LEGACY, 1)
>> while virtio_pci_common_read will call ioctl(ENABLE_LEGACY, 0)
>>
>>
>> But this trick work only for PCI I think?
>>
>> Thanks
> ccw has a revision it can check. mmio does not have transitional devices
> at all.


Ok, then we can do the workaround in the qemu, isn't it?

Thanks


2021-02-24 18:30:24

by Si-Wei Liu

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero



On 2/23/2021 9:04 PM, Michael S. Tsirkin wrote:
> On Tue, Feb 23, 2021 at 11:35:57AM -0800, Si-Wei Liu wrote:
>>
>> On 2/23/2021 5:26 AM, Michael S. Tsirkin wrote:
>>> On Tue, Feb 23, 2021 at 10:03:57AM +0800, Jason Wang wrote:
>>>> On 2021/2/23 9:12 上午, Si-Wei Liu wrote:
>>>>> On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote:
>>>>>> On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote:
>>>>>>> On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
>>>>>>>> Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
>>>>>>>> for legacy") made an exception for legacy guests to reset
>>>>>>>> features to 0, when config space is accessed before features
>>>>>>>> are set. We should relieve the verify_min_features() check
>>>>>>>> and allow features reset to 0 for this case.
>>>>>>>>
>>>>>>>> It's worth noting that not just legacy guests could access
>>>>>>>> config space before features are set. For instance, when
>>>>>>>> feature VIRTIO_NET_F_MTU is advertised some modern driver
>>>>>>>> will try to access and validate the MTU present in the config
>>>>>>>> space before virtio features are set.
>>>>>>> This looks like a spec violation:
>>>>>>>
>>>>>>> "
>>>>>>>
>>>>>>> The following driver-read-only field, mtu only exists if
>>>>>>> VIRTIO_NET_F_MTU is
>>>>>>> set.
>>>>>>> This field specifies the maximum MTU for the driver to use.
>>>>>>> "
>>>>>>>
>>>>>>> Do we really want to workaround this?
>>>>>>>
>>>>>>> Thanks
>>>>>> And also:
>>>>>>
>>>>>> The driver MUST follow this sequence to initialize a device:
>>>>>> 1. Reset the device.
>>>>>> 2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
>>>>>> 3. Set the DRIVER status bit: the guest OS knows how to drive the
>>>>>> device.
>>>>>> 4. Read device feature bits, and write the subset of feature bits
>>>>>> understood by the OS and driver to the
>>>>>> device. During this step the driver MAY read (but MUST NOT write)
>>>>>> the device-specific configuration
>>>>>> fields to check that it can support the device before accepting it.
>>>>>> 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new
>>>>>> feature bits after this step.
>>>>>> 6. Re-read device status to ensure the FEATURES_OK bit is still set:
>>>>>> otherwise, the device does not
>>>>>> support our subset of features and the device is unusable.
>>>>>> 7. Perform device-specific setup, including discovery of virtqueues
>>>>>> for the device, optional per-bus setup,
>>>>>> reading and possibly writing the device’s virtio configuration
>>>>>> space, and population of virtqueues.
>>>>>> 8. Set the DRIVER_OK status bit. At this point the device is “live”.
>>>>>>
>>>>>>
>>>>>> so accessing config space before FEATURES_OK is a spec violation, right?
>>>>> It is, but it's not relevant to what this commit tries to address. I
>>>>> thought the legacy guest still needs to be supported.
>>>>>
>>>>> Having said, a separate patch has to be posted to fix the guest driver
>>>>> issue where this discrepancy is introduced to virtnet_validate() (since
>>>>> commit fe36cbe067). But it's not technically related to this patch.
>>>>>
>>>>> -Siwei
>>>> I think it's a bug to read config space in validate, we should move it to
>>>> virtnet_probe().
>>>>
>>>> Thanks
>>> I take it back, reading but not writing seems to be explicitly allowed by spec.
>>> So our way to detect a legacy guest is bogus, need to think what is
>>> the best way to handle this.
>> Then maybe revert commit fe36cbe067 and friends, and have QEMU detect legacy
>> guest? Supposedly only config space write access needs to be guarded before
>> setting FEATURES_OK.
>>
>> -Siwie
> Detecting it isn't enough though, we will need a new ioctl to notify
> the kernel that it's a legacy guest. Ugh :(
Well, although I think adding an ioctl is doable, may I know what the
use case there will be for kernel to leverage such info directly? Is
there a case QEMU can't do with dedicate ioctls later if there's indeed
differentiation (legacy v.s. modern) needed?

One of the reason I asked is if this ioctl becomes a mandate for
vhost-vdpa kernel. QEMU would reject initialize vhost-vdpa if doesn't
see this ioctl coming?

If it's optional, suppose the kernel may need it only when it becomes
necessary?

Thanks,
-Siwei


>
>
>>>>>>>> Rejecting reset to 0
>>>>>>>> prematurely causes correct MTU and link status unable to load
>>>>>>>> for the very first config space access, rendering issues like
>>>>>>>> guest showing inaccurate MTU value, or failure to reject
>>>>>>>> out-of-range MTU.
>>>>>>>>
>>>>>>>> Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for
>>>>>>>> supported mlx5 devices")
>>>>>>>> Signed-off-by: Si-Wei Liu <[email protected]>
>>>>>>>> ---
>>>>>>>>    drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--------------
>>>>>>>>    1 file changed, 1 insertion(+), 14 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>> index 7c1f789..540dd67 100644
>>>>>>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>> @@ -1490,14 +1490,6 @@ static u64
>>>>>>>> mlx5_vdpa_get_features(struct vdpa_device *vdev)
>>>>>>>>        return mvdev->mlx_features;
>>>>>>>>    }
>>>>>>>> -static int verify_min_features(struct mlx5_vdpa_dev *mvdev,
>>>>>>>> u64 features)
>>>>>>>> -{
>>>>>>>> -    if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
>>>>>>>> -        return -EOPNOTSUPP;
>>>>>>>> -
>>>>>>>> -    return 0;
>>>>>>>> -}
>>>>>>>> -
>>>>>>>>    static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
>>>>>>>>    {
>>>>>>>>        int err;
>>>>>>>> @@ -1558,18 +1550,13 @@ static int
>>>>>>>> mlx5_vdpa_set_features(struct vdpa_device *vdev, u64
>>>>>>>> features)
>>>>>>>>    {
>>>>>>>>        struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>>>>>>>>        struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
>>>>>>>> -    int err;
>>>>>>>>        print_features(mvdev, features, true);
>>>>>>>> -    err = verify_min_features(mvdev, features);
>>>>>>>> -    if (err)
>>>>>>>> -        return err;
>>>>>>>> -
>>>>>>>>        ndev->mvdev.actual_features = features &
>>>>>>>> ndev->mvdev.mlx_features;
>>>>>>>>        ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
>>>>>>>>        ndev->config.status |= cpu_to_mlx5vdpa16(mvdev,
>>>>>>>> VIRTIO_NET_S_LINK_UP);
>>>>>>>> -    return err;
>>>>>>>> +    return 0;
>>>>>>>>    }
>>>>>>>>    static void mlx5_vdpa_set_config_cb(struct vdpa_device
>>>>>>>> *vdev, struct vdpa_callback *cb)

2021-02-24 22:37:24

by Eli Cohen

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

On Wed, Feb 24, 2021 at 02:12:01AM -0500, Michael S. Tsirkin wrote:
> On Wed, Feb 24, 2021 at 02:55:13PM +0800, Jason Wang wrote:
> >
> > On 2021/2/24 2:47 下午, Michael S. Tsirkin wrote:
> > > On Wed, Feb 24, 2021 at 08:45:20AM +0200, Eli Cohen wrote:
> > > > On Wed, Feb 24, 2021 at 12:17:58AM -0500, Michael S. Tsirkin wrote:
> > > > > On Wed, Feb 24, 2021 at 11:20:01AM +0800, Jason Wang wrote:
> > > > > > On 2021/2/24 3:35 上午, Si-Wei Liu wrote:
> > > > > > >
> > > > > > > On 2/23/2021 5:26 AM, Michael S. Tsirkin wrote:
> > > > > > > > On Tue, Feb 23, 2021 at 10:03:57AM +0800, Jason Wang wrote:
> > > > > > > > > On 2021/2/23 9:12 上午, Si-Wei Liu wrote:
> > > > > > > > > > On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote:
> > > > > > > > > > > On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote:
> > > > > > > > > > > > On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
> > > > > > > > > > > > > Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
> > > > > > > > > > > > > for legacy") made an exception for legacy guests to reset
> > > > > > > > > > > > > features to 0, when config space is accessed before features
> > > > > > > > > > > > > are set. We should relieve the verify_min_features() check
> > > > > > > > > > > > > and allow features reset to 0 for this case.
> > > > > > > > > > > > >
> > > > > > > > > > > > > It's worth noting that not just legacy guests could access
> > > > > > > > > > > > > config space before features are set. For instance, when
> > > > > > > > > > > > > feature VIRTIO_NET_F_MTU is advertised some modern driver
> > > > > > > > > > > > > will try to access and validate the MTU present in the config
> > > > > > > > > > > > > space before virtio features are set.
> > > > > > > > > > > > This looks like a spec violation:
> > > > > > > > > > > >
> > > > > > > > > > > > "
> > > > > > > > > > > >
> > > > > > > > > > > > The following driver-read-only field, mtu only exists if
> > > > > > > > > > > > VIRTIO_NET_F_MTU is
> > > > > > > > > > > > set.
> > > > > > > > > > > > This field specifies the maximum MTU for the driver to use.
> > > > > > > > > > > > "
> > > > > > > > > > > >
> > > > > > > > > > > > Do we really want to workaround this?
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks
> > > > > > > > > > > And also:
> > > > > > > > > > >
> > > > > > > > > > > The driver MUST follow this sequence to initialize a device:
> > > > > > > > > > > 1. Reset the device.
> > > > > > > > > > > 2. Set the ACKNOWLEDGE status bit: the guest OS has
> > > > > > > > > > > noticed the device.
> > > > > > > > > > > 3. Set the DRIVER status bit: the guest OS knows how to drive the
> > > > > > > > > > > device.
> > > > > > > > > > > 4. Read device feature bits, and write the subset of feature bits
> > > > > > > > > > > understood by the OS and driver to the
> > > > > > > > > > > device. During this step the driver MAY read (but MUST NOT write)
> > > > > > > > > > > the device-specific configuration
> > > > > > > > > > > fields to check that it can support the device before accepting it.
> > > > > > > > > > > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new
> > > > > > > > > > > feature bits after this step.
> > > > > > > > > > > 6. Re-read device status to ensure the FEATURES_OK bit is still set:
> > > > > > > > > > > otherwise, the device does not
> > > > > > > > > > > support our subset of features and the device is unusable.
> > > > > > > > > > > 7. Perform device-specific setup, including discovery of virtqueues
> > > > > > > > > > > for the device, optional per-bus setup,
> > > > > > > > > > > reading and possibly writing the device’s virtio configuration
> > > > > > > > > > > space, and population of virtqueues.
> > > > > > > > > > > 8. Set the DRIVER_OK status bit. At this point the device is “live”.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > so accessing config space before FEATURES_OK is a spec
> > > > > > > > > > > violation, right?
> > > > > > > > > > It is, but it's not relevant to what this commit tries to address. I
> > > > > > > > > > thought the legacy guest still needs to be supported.
> > > > > > > > > >
> > > > > > > > > > Having said, a separate patch has to be posted to fix the guest driver
> > > > > > > > > > issue where this discrepancy is introduced to
> > > > > > > > > > virtnet_validate() (since
> > > > > > > > > > commit fe36cbe067). But it's not technically related to this patch.
> > > > > > > > > >
> > > > > > > > > > -Siwei
> > > > > > > > > I think it's a bug to read config space in validate, we should
> > > > > > > > > move it to
> > > > > > > > > virtnet_probe().
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > I take it back, reading but not writing seems to be explicitly
> > > > > > > > allowed by spec.
> > > > > > > > So our way to detect a legacy guest is bogus, need to think what is
> > > > > > > > the best way to handle this.
> > > > > > > Then maybe revert commit fe36cbe067 and friends, and have QEMU detect
> > > > > > > legacy guest? Supposedly only config space write access needs to be
> > > > > > > guarded before setting FEATURES_OK.
> > > > > >
> > > > > > I agree. My understanding is that all vDPA must be modern device (since
> > > > > > VIRITO_F_ACCESS_PLATFORM is mandated) instead of transitional device.
> > > > > >
> > > > > > Thanks
> > > > > Well mlx5 has some code to handle legacy guests ...
> > > > > Eli, could you comment? Is that support unused right now?
> > > > >
> > > > If you mean support for version 1.0, well the knob is there but it's not
> > > > set in the firmware I use. Note sure if we will support this.
> > > Hmm you mean it's legacy only right now?
> > > Well at some point you will want advanced goodies like RSS
> > > and all that is gated on 1.0 ;)
> >

Guys sorry, I checked again, the firmware capability is set and so is
VIRTIO_F_VERSION_1.

> >
> > So if my understanding is correct the device/firmware is legacy but require
> > VIRTIO_F_ACCESS_PLATFORM semanic? Looks like a spec violation?
> >
> > Thanks
>
> Legacy mode description is the spec is non-normative. As such as long as
> guests work, they work ;)
>
> >
> > >
> > > > > > > -Siwie
> > > > > > >
> > > > > > > > > > > > > Rejecting reset to 0
> > > > > > > > > > > > > prematurely causes correct MTU and link status unable to load
> > > > > > > > > > > > > for the very first config space access, rendering issues like
> > > > > > > > > > > > > guest showing inaccurate MTU value, or failure to reject
> > > > > > > > > > > > > out-of-range MTU.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for
> > > > > > > > > > > > > supported mlx5 devices")
> > > > > > > > > > > > > Signed-off-by: Si-Wei Liu <[email protected]>
> > > > > > > > > > > > > ---
> > > > > > > > > > > > >     drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--------------
> > > > > > > > > > > > >     1 file changed, 1 insertion(+), 14 deletions(-)
> > > > > > > > > > > > >
> > > > > > > > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > > > > index 7c1f789..540dd67 100644
> > > > > > > > > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > > > > @@ -1490,14 +1490,6 @@ static u64
> > > > > > > > > > > > > mlx5_vdpa_get_features(struct vdpa_device *vdev)
> > > > > > > > > > > > >         return mvdev->mlx_features;
> > > > > > > > > > > > >     }
> > > > > > > > > > > > > -static int verify_min_features(struct mlx5_vdpa_dev *mvdev,
> > > > > > > > > > > > > u64 features)
> > > > > > > > > > > > > -{
> > > > > > > > > > > > > -    if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
> > > > > > > > > > > > > -        return -EOPNOTSUPP;
> > > > > > > > > > > > > -
> > > > > > > > > > > > > -    return 0;
> > > > > > > > > > > > > -}
> > > > > > > > > > > > > -
> > > > > > > > > > > > >     static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
> > > > > > > > > > > > >     {
> > > > > > > > > > > > >         int err;
> > > > > > > > > > > > > @@ -1558,18 +1550,13 @@ static int
> > > > > > > > > > > > > mlx5_vdpa_set_features(struct vdpa_device *vdev, u64
> > > > > > > > > > > > > features)
> > > > > > > > > > > > >     {
> > > > > > > > > > > > >         struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> > > > > > > > > > > > >         struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> > > > > > > > > > > > > -    int err;
> > > > > > > > > > > > >         print_features(mvdev, features, true);
> > > > > > > > > > > > > -    err = verify_min_features(mvdev, features);
> > > > > > > > > > > > > -    if (err)
> > > > > > > > > > > > > -        return err;
> > > > > > > > > > > > > -
> > > > > > > > > > > > >         ndev->mvdev.actual_features = features &
> > > > > > > > > > > > > ndev->mvdev.mlx_features;
> > > > > > > > > > > > >         ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
> > > > > > > > > > > > >         ndev->config.status |= cpu_to_mlx5vdpa16(mvdev,
> > > > > > > > > > > > > VIRTIO_NET_S_LINK_UP);
> > > > > > > > > > > > > -    return err;
> > > > > > > > > > > > > +    return 0;
> > > > > > > > > > > > >     }
> > > > > > > > > > > > >     static void mlx5_vdpa_set_config_cb(struct vdpa_device
> > > > > > > > > > > > > *vdev, struct vdpa_callback *cb)
>

2021-02-26 01:01:30

by Si-Wei Liu

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero


Hi Michael,

Are you okay to live without this ioctl for now? I think QEMU is the one
that needs to be fixed and will have to be made legacy guest aware. I
think the kernel can just honor the feature negotiation result done by
QEMU and do as what's told to.Will you agree?

If it's fine, I would proceed to reverting commit fe36cbe067 and related
code in question from the kernel.

Thanks,
-Siwei

On 2/24/2021 10:24 AM, Si-Wei Liu wrote:
>> Detecting it isn't enough though, we will need a new ioctl to notify
>> the kernel that it's a legacy guest. Ugh :(
> Well, although I think adding an ioctl is doable, may I know what the
> use case there will be for kernel to leverage such info directly? Is
> there a case QEMU can't do with dedicate ioctls later if there's
> indeed differentiation (legacy v.s. modern) needed?
>
> One of the reason I asked is if this ioctl becomes a mandate for
> vhost-vdpa kernel. QEMU would reject initialize vhost-vdpa if doesn't
> see this ioctl coming?
>
> If it's optional, suppose the kernel may need it only when it becomes
> necessary?
>
> Thanks,
> -Siwei

2021-02-28 22:42:28

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
>
>
> On 2/23/2021 9:04 PM, Michael S. Tsirkin wrote:
> > On Tue, Feb 23, 2021 at 11:35:57AM -0800, Si-Wei Liu wrote:
> > >
> > > On 2/23/2021 5:26 AM, Michael S. Tsirkin wrote:
> > > > On Tue, Feb 23, 2021 at 10:03:57AM +0800, Jason Wang wrote:
> > > > > On 2021/2/23 9:12 上午, Si-Wei Liu wrote:
> > > > > > On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote:
> > > > > > > On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote:
> > > > > > > > On 2021/2/19 7:54 下午, Si-Wei Liu wrote:
> > > > > > > > > Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
> > > > > > > > > for legacy") made an exception for legacy guests to reset
> > > > > > > > > features to 0, when config space is accessed before features
> > > > > > > > > are set. We should relieve the verify_min_features() check
> > > > > > > > > and allow features reset to 0 for this case.
> > > > > > > > >
> > > > > > > > > It's worth noting that not just legacy guests could access
> > > > > > > > > config space before features are set. For instance, when
> > > > > > > > > feature VIRTIO_NET_F_MTU is advertised some modern driver
> > > > > > > > > will try to access and validate the MTU present in the config
> > > > > > > > > space before virtio features are set.
> > > > > > > > This looks like a spec violation:
> > > > > > > >
> > > > > > > > "
> > > > > > > >
> > > > > > > > The following driver-read-only field, mtu only exists if
> > > > > > > > VIRTIO_NET_F_MTU is
> > > > > > > > set.
> > > > > > > > This field specifies the maximum MTU for the driver to use.
> > > > > > > > "
> > > > > > > >
> > > > > > > > Do we really want to workaround this?
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > And also:
> > > > > > >
> > > > > > > The driver MUST follow this sequence to initialize a device:
> > > > > > > 1. Reset the device.
> > > > > > > 2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
> > > > > > > 3. Set the DRIVER status bit: the guest OS knows how to drive the
> > > > > > > device.
> > > > > > > 4. Read device feature bits, and write the subset of feature bits
> > > > > > > understood by the OS and driver to the
> > > > > > > device. During this step the driver MAY read (but MUST NOT write)
> > > > > > > the device-specific configuration
> > > > > > > fields to check that it can support the device before accepting it.
> > > > > > > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new
> > > > > > > feature bits after this step.
> > > > > > > 6. Re-read device status to ensure the FEATURES_OK bit is still set:
> > > > > > > otherwise, the device does not
> > > > > > > support our subset of features and the device is unusable.
> > > > > > > 7. Perform device-specific setup, including discovery of virtqueues
> > > > > > > for the device, optional per-bus setup,
> > > > > > > reading and possibly writing the device’s virtio configuration
> > > > > > > space, and population of virtqueues.
> > > > > > > 8. Set the DRIVER_OK status bit. At this point the device is “live”.
> > > > > > >
> > > > > > >
> > > > > > > so accessing config space before FEATURES_OK is a spec violation, right?
> > > > > > It is, but it's not relevant to what this commit tries to address. I
> > > > > > thought the legacy guest still needs to be supported.
> > > > > >
> > > > > > Having said, a separate patch has to be posted to fix the guest driver
> > > > > > issue where this discrepancy is introduced to virtnet_validate() (since
> > > > > > commit fe36cbe067). But it's not technically related to this patch.
> > > > > >
> > > > > > -Siwei
> > > > > I think it's a bug to read config space in validate, we should move it to
> > > > > virtnet_probe().
> > > > >
> > > > > Thanks
> > > > I take it back, reading but not writing seems to be explicitly allowed by spec.
> > > > So our way to detect a legacy guest is bogus, need to think what is
> > > > the best way to handle this.
> > > Then maybe revert commit fe36cbe067 and friends, and have QEMU detect legacy
> > > guest? Supposedly only config space write access needs to be guarded before
> > > setting FEATURES_OK.
> > >
> > > -Siwie
> > Detecting it isn't enough though, we will need a new ioctl to notify
> > the kernel that it's a legacy guest. Ugh :(
> Well, although I think adding an ioctl is doable, may I know what the use
> case there will be for kernel to leverage such info directly? Is there a
> case QEMU can't do with dedicate ioctls later if there's indeed
> differentiation (legacy v.s. modern) needed?
>
> One of the reason I asked is if this ioctl becomes a mandate for vhost-vdpa
> kernel. QEMU would reject initialize vhost-vdpa if doesn't see this ioctl
> coming?

Only on BE hosts or guests I think. With LE host and guest legacy and
modern behave the same so ioctl isn't needed.

> If it's optional, suppose the kernel may need it only when it becomes
> necessary?
>
> Thanks,
> -Siwei

2021-02-28 22:44:26

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

On Wed, Feb 24, 2021 at 05:30:37PM +0800, Jason Wang wrote:
>
> On 2021/2/24 4:43 下午, Michael S. Tsirkin wrote:
> > On Wed, Feb 24, 2021 at 04:26:43PM +0800, Jason Wang wrote:
> > > Basically on first guest access QEMU would tell kernel whether
> > > guest is using the legacy or the modern interface.
> > > E.g. virtio_pci_config_read/virtio_pci_config_write will call ioctl(ENABLE_LEGACY, 1)
> > > while virtio_pci_common_read will call ioctl(ENABLE_LEGACY, 0)
> > >
> > >
> > > But this trick work only for PCI I think?
> > >
> > > Thanks
> > ccw has a revision it can check. mmio does not have transitional devices
> > at all.
>
>
> Ok, then we can do the workaround in the qemu, isn't it?
>
> Thanks

which one do you mean?

--
MST

2021-02-28 22:45:51

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
> > Detecting it isn't enough though, we will need a new ioctl to notify
> > the kernel that it's a legacy guest. Ugh :(
> Well, although I think adding an ioctl is doable, may I know what the use
> case there will be for kernel to leverage such info directly? Is there a
> case QEMU can't do with dedicate ioctls later if there's indeed
> differentiation (legacy v.s. modern) needed?

BTW a good API could be

#define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
#define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)

we did it per vring but maybe that was a mistake ...

--
MST

2021-02-28 23:30:13

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

On Thu, Feb 25, 2021 at 04:56:42PM -0800, Si-Wei Liu wrote:
>
> Hi Michael,
>
> Are you okay to live without this ioctl for now? I think QEMU is the one
> that needs to be fixed and will have to be made legacy guest aware. I think
> the kernel can just honor the feature negotiation result done by QEMU and do
> as what's told to.Will you agree?
>
> If it's fine, I would proceed to reverting commit fe36cbe067 and related
> code in question from the kernel.
>
> Thanks,
> -Siwei


Not really, I don't see why that's a good idea. fe36cbe067 is the code
checking MTU before FEATURES_OK. Spec explicitly allows that.

--
MST

2021-03-01 03:55:44

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero


On 2021/3/1 5:30 上午, Michael S. Tsirkin wrote:
> On Wed, Feb 24, 2021 at 05:30:37PM +0800, Jason Wang wrote:
>> On 2021/2/24 4:43 下午, Michael S. Tsirkin wrote:
>>> On Wed, Feb 24, 2021 at 04:26:43PM +0800, Jason Wang wrote:
>>>> Basically on first guest access QEMU would tell kernel whether
>>>> guest is using the legacy or the modern interface.
>>>> E.g. virtio_pci_config_read/virtio_pci_config_write will call ioctl(ENABLE_LEGACY, 1)
>>>> while virtio_pci_common_read will call ioctl(ENABLE_LEGACY, 0)
>>>>
>>>>
>>>> But this trick work only for PCI I think?
>>>>
>>>> Thanks
>>> ccw has a revision it can check. mmio does not have transitional devices
>>> at all.
>>
>> Ok, then we can do the workaround in the qemu, isn't it?
>>
>> Thanks
> which one do you mean?


I meant the workaround that is done by 452639a64ad8 ("vdpa: make sure
set_features is invoked for legacy").

Thanks


>

2021-03-01 04:00:08

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero


On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
> On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
>>> Detecting it isn't enough though, we will need a new ioctl to notify
>>> the kernel that it's a legacy guest. Ugh :(
>> Well, although I think adding an ioctl is doable, may I know what the use
>> case there will be for kernel to leverage such info directly? Is there a
>> case QEMU can't do with dedicate ioctls later if there's indeed
>> differentiation (legacy v.s. modern) needed?
> BTW a good API could be
>
> #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
>
> we did it per vring but maybe that was a mistake ...


Actually, I wonder whether it's good time to just not support legacy
driver for vDPA. Consider:

1) It's definition is no-normative
2) A lot of budren of codes

So qemu can still present the legacy device since the config space or
other stuffs that is presented by vhost-vDPA is not expected to be
accessed by guest directly. Qemu can do the endian conversion when
necessary in this case?

Thanks


>

2021-03-02 21:38:58

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
>
> On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
> > On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
> > > > Detecting it isn't enough though, we will need a new ioctl to notify
> > > > the kernel that it's a legacy guest. Ugh :(
> > > Well, although I think adding an ioctl is doable, may I know what the use
> > > case there will be for kernel to leverage such info directly? Is there a
> > > case QEMU can't do with dedicate ioctls later if there's indeed
> > > differentiation (legacy v.s. modern) needed?
> > BTW a good API could be
> >
> > #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> >
> > we did it per vring but maybe that was a mistake ...
>
>
> Actually, I wonder whether it's good time to just not support legacy driver
> for vDPA. Consider:
>
> 1) It's definition is no-normative
> 2) A lot of budren of codes
>
> So qemu can still present the legacy device since the config space or other
> stuffs that is presented by vhost-vDPA is not expected to be accessed by
> guest directly. Qemu can do the endian conversion when necessary in this
> case?
>
> Thanks
>

Overall I would be fine with this approach but we need to avoid breaking
working userspace, qemu releases with vdpa support are out there and
seem to work for people. Any changes need to take that into account
and document compatibility concerns. I note that any hardware
implementation is already broken for legacy except on platforms with
strong ordering which might be helpful in reducing the scope.


--
MST

2021-03-02 22:57:31

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero


On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
> On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
>> On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
>>> On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
>>>>> Detecting it isn't enough though, we will need a new ioctl to notify
>>>>> the kernel that it's a legacy guest. Ugh :(
>>>> Well, although I think adding an ioctl is doable, may I know what the use
>>>> case there will be for kernel to leverage such info directly? Is there a
>>>> case QEMU can't do with dedicate ioctls later if there's indeed
>>>> differentiation (legacy v.s. modern) needed?
>>> BTW a good API could be
>>>
>>> #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
>>> #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
>>>
>>> we did it per vring but maybe that was a mistake ...
>>
>> Actually, I wonder whether it's good time to just not support legacy driver
>> for vDPA. Consider:
>>
>> 1) It's definition is no-normative
>> 2) A lot of budren of codes
>>
>> So qemu can still present the legacy device since the config space or other
>> stuffs that is presented by vhost-vDPA is not expected to be accessed by
>> guest directly. Qemu can do the endian conversion when necessary in this
>> case?
>>
>> Thanks
>>
> Overall I would be fine with this approach but we need to avoid breaking
> working userspace, qemu releases with vdpa support are out there and
> seem to work for people. Any changes need to take that into account
> and document compatibility concerns.


Agree, let me check.


> I note that any hardware
> implementation is already broken for legacy except on platforms with
> strong ordering which might be helpful in reducing the scope.


Yes.

Thanks


>
>

2021-03-03 16:13:59

by Si-Wei Liu

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero



On 2/28/2021 1:27 PM, Michael S. Tsirkin wrote:
> On Thu, Feb 25, 2021 at 04:56:42PM -0800, Si-Wei Liu wrote:
>> Hi Michael,
>>
>> Are you okay to live without this ioctl for now? I think QEMU is the one
>> that needs to be fixed and will have to be made legacy guest aware. I think
>> the kernel can just honor the feature negotiation result done by QEMU and do
>> as what's told to.Will you agree?
>>
>> If it's fine, I would proceed to reverting commit fe36cbe067 and related
>> code in question from the kernel.
>>
>> Thanks,
>> -Siwei
>
> Not really, I don't see why that's a good idea. fe36cbe067 is the code
> checking MTU before FEATURES_OK. Spec explicitly allows that.
>
Alright, but what I meant was this commit
452639a64ad8 ("vdpa: make sure set_features is invoked for legacy").

But I got why you need it in another email (for BE host/guest).

-Siwei

2021-12-11 01:44:31

by Si-Wei Liu

[permalink] [raw]
Subject: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)

Sorry for reviving this ancient thread. I was kinda lost for the
conclusion it ended up with. I have the following questions,

1. legacy guest support: from the past conversations it doesn't seem the
support will be completely dropped from the table, is my understanding
correct? Actually we're interested in supporting virtio v0.95 guest for
x86, which is backed by the spec at
https://ozlabs.org/~rusty/virtio-spec/virtio-0.9.5.pdf. Though I'm not
sure if there's request/need to support wilder legacy virtio versions
earlier beyond.

2. suppose some form of legacy guest support needs to be there, how do
we deal with the bogus assumption below in vdpa_get_config() in the
short term? It looks one of the intuitive fix is to move the
vdpa_set_features call out of vdpa_get_config() to vdpa_set_config().

        /*
         * Config accesses aren't supposed to trigger before features
are set.
         * If it does happen we assume a legacy guest.
         */
        if (!vdev->features_valid)
                vdpa_set_features(vdev, 0);
        ops->get_config(vdev, offset, buf, len);

I can post a patch to fix 2) if there's consensus already reached.

Thanks,
-Siwei

On 3/2/2021 2:53 AM, Jason Wang wrote:
>
> On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
>> On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
>>> On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
>>>> On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
>>>>>> Detecting it isn't enough though, we will need a new ioctl to notify
>>>>>> the kernel that it's a legacy guest. Ugh :(
>>>>> Well, although I think adding an ioctl is doable, may I know what
>>>>> the use
>>>>> case there will be for kernel to leverage such info directly? Is
>>>>> there a
>>>>> case QEMU can't do with dedicate ioctls later if there's indeed
>>>>> differentiation (legacy v.s. modern) needed?
>>>> BTW a good API could be
>>>>
>>>> #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
>>>> #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
>>>>
>>>> we did it per vring but maybe that was a mistake ...
>>>
>>> Actually, I wonder whether it's good time to just not support legacy
>>> driver
>>> for vDPA. Consider:
>>>
>>> 1) It's definition is no-normative
>>> 2) A lot of budren of codes
>>>
>>> So qemu can still present the legacy device since the config space
>>> or other
>>> stuffs that is presented by vhost-vDPA is not expected to be
>>> accessed by
>>> guest directly. Qemu can do the endian conversion when necessary in
>>> this
>>> case?
>>>
>>> Thanks
>>>
>> Overall I would be fine with this approach but we need to avoid breaking
>> working userspace, qemu releases with vdpa support are out there and
>> seem to work for people. Any changes need to take that into account
>> and document compatibility concerns.
>
>
> Agree, let me check.
>
>
>>   I note that any hardware
>> implementation is already broken for legacy except on platforms with
>> strong ordering which might be helpful in reducing the scope.
>
>
> Yes.
>
> Thanks
>
>
>>
>>
>


2021-12-12 09:26:17

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)

On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
> Sorry for reviving this ancient thread. I was kinda lost for the conclusion
> it ended up with. I have the following questions,
>
> 1. legacy guest support: from the past conversations it doesn't seem the
> support will be completely dropped from the table, is my understanding
> correct? Actually we're interested in supporting virtio v0.95 guest for x86,
> which is backed by the spec at
> https://ozlabs.org/~rusty/virtio-spec/virtio-0.9.5.pdf. Though I'm not sure
> if there's request/need to support wilder legacy virtio versions earlier
> beyond.

I personally feel it's less work to add in kernel than try to
work around it in userspace. Jason feels differently.
Maybe post the patches and this will prove to Jason it's not
too terrible?

> 2. suppose some form of legacy guest support needs to be there, how do we
> deal with the bogus assumption below in vdpa_get_config() in the short term?
> It looks one of the intuitive fix is to move the vdpa_set_features call out
> of vdpa_get_config() to vdpa_set_config().
>
>         /*
>          * Config accesses aren't supposed to trigger before features are
> set.
>          * If it does happen we assume a legacy guest.
>          */
>         if (!vdev->features_valid)
>                 vdpa_set_features(vdev, 0);
>         ops->get_config(vdev, offset, buf, len);
>
> I can post a patch to fix 2) if there's consensus already reached.
>
> Thanks,
> -Siwei

I'm not sure how important it is to change that.
In any case it only affects transitional devices, right?
Legacy only should not care ...


> On 3/2/2021 2:53 AM, Jason Wang wrote:
> >
> > On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
> > > On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
> > > > On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
> > > > > On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
> > > > > > > Detecting it isn't enough though, we will need a new ioctl to notify
> > > > > > > the kernel that it's a legacy guest. Ugh :(
> > > > > > Well, although I think adding an ioctl is doable, may I
> > > > > > know what the use
> > > > > > case there will be for kernel to leverage such info
> > > > > > directly? Is there a
> > > > > > case QEMU can't do with dedicate ioctls later if there's indeed
> > > > > > differentiation (legacy v.s. modern) needed?
> > > > > BTW a good API could be
> > > > >
> > > > > #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > > > > #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > > > >
> > > > > we did it per vring but maybe that was a mistake ...
> > > >
> > > > Actually, I wonder whether it's good time to just not support
> > > > legacy driver
> > > > for vDPA. Consider:
> > > >
> > > > 1) It's definition is no-normative
> > > > 2) A lot of budren of codes
> > > >
> > > > So qemu can still present the legacy device since the config
> > > > space or other
> > > > stuffs that is presented by vhost-vDPA is not expected to be
> > > > accessed by
> > > > guest directly. Qemu can do the endian conversion when necessary
> > > > in this
> > > > case?
> > > >
> > > > Thanks
> > > >
> > > Overall I would be fine with this approach but we need to avoid breaking
> > > working userspace, qemu releases with vdpa support are out there and
> > > seem to work for people. Any changes need to take that into account
> > > and document compatibility concerns.
> >
> >
> > Agree, let me check.
> >
> >
> > >   I note that any hardware
> > > implementation is already broken for legacy except on platforms with
> > > strong ordering which might be helpful in reducing the scope.
> >
> >
> > Yes.
> >
> > Thanks
> >
> >
> > >
> > >
> >


2021-12-13 03:03:43

by Jason Wang

[permalink] [raw]
Subject: Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)

On Sun, Dec 12, 2021 at 5:26 PM Michael S. Tsirkin <[email protected]> wrote:
>
> On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
> > Sorry for reviving this ancient thread. I was kinda lost for the conclusion
> > it ended up with. I have the following questions,
> >
> > 1. legacy guest support: from the past conversations it doesn't seem the
> > support will be completely dropped from the table, is my understanding
> > correct? Actually we're interested in supporting virtio v0.95 guest for x86,
> > which is backed by the spec at
> > https://ozlabs.org/~rusty/virtio-spec/virtio-0.9.5.pdf. Though I'm not sure
> > if there's request/need to support wilder legacy virtio versions earlier
> > beyond.
>
> I personally feel it's less work to add in kernel than try to
> work around it in userspace. Jason feels differently.
> Maybe post the patches and this will prove to Jason it's not
> too terrible?

That's one way, other than the config access before setting features,
we need to deal with other stuffs:

1) VIRTIO_F_ORDER_PLATFORM
2) there could be a parent device that only support 1.0 device

And a lot of other stuff summarized in spec 7.4 which seems not an
easy task. Various vDPA parent drivers were written under the
assumption that only modern devices are supported.

Thanks

>
> > 2. suppose some form of legacy guest support needs to be there, how do we
> > deal with the bogus assumption below in vdpa_get_config() in the short term?
> > It looks one of the intuitive fix is to move the vdpa_set_features call out
> > of vdpa_get_config() to vdpa_set_config().
> >
> > /*
> > * Config accesses aren't supposed to trigger before features are
> > set.
> > * If it does happen we assume a legacy guest.
> > */
> > if (!vdev->features_valid)
> > vdpa_set_features(vdev, 0);
> > ops->get_config(vdev, offset, buf, len);
> >
> > I can post a patch to fix 2) if there's consensus already reached.
> >
> > Thanks,
> > -Siwei
>
> I'm not sure how important it is to change that.
> In any case it only affects transitional devices, right?
> Legacy only should not care ...
>
>
> > On 3/2/2021 2:53 AM, Jason Wang wrote:
> > >
> > > On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
> > > > On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
> > > > > On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
> > > > > > On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
> > > > > > > > Detecting it isn't enough though, we will need a new ioctl to notify
> > > > > > > > the kernel that it's a legacy guest. Ugh :(
> > > > > > > Well, although I think adding an ioctl is doable, may I
> > > > > > > know what the use
> > > > > > > case there will be for kernel to leverage such info
> > > > > > > directly? Is there a
> > > > > > > case QEMU can't do with dedicate ioctls later if there's indeed
> > > > > > > differentiation (legacy v.s. modern) needed?
> > > > > > BTW a good API could be
> > > > > >
> > > > > > #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > > > > > #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > > > > >
> > > > > > we did it per vring but maybe that was a mistake ...
> > > > >
> > > > > Actually, I wonder whether it's good time to just not support
> > > > > legacy driver
> > > > > for vDPA. Consider:
> > > > >
> > > > > 1) It's definition is no-normative
> > > > > 2) A lot of budren of codes
> > > > >
> > > > > So qemu can still present the legacy device since the config
> > > > > space or other
> > > > > stuffs that is presented by vhost-vDPA is not expected to be
> > > > > accessed by
> > > > > guest directly. Qemu can do the endian conversion when necessary
> > > > > in this
> > > > > case?
> > > > >
> > > > > Thanks
> > > > >
> > > > Overall I would be fine with this approach but we need to avoid breaking
> > > > working userspace, qemu releases with vdpa support are out there and
> > > > seem to work for people. Any changes need to take that into account
> > > > and document compatibility concerns.
> > >
> > >
> > > Agree, let me check.
> > >
> > >
> > > > I note that any hardware
> > > > implementation is already broken for legacy except on platforms with
> > > > strong ordering which might be helpful in reducing the scope.
> > >
> > >
> > > Yes.
> > >
> > > Thanks
> > >
> > >
> > > >
> > > >
> > >
>


2021-12-13 08:06:58

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)

On Mon, Dec 13, 2021 at 11:02:39AM +0800, Jason Wang wrote:
> On Sun, Dec 12, 2021 at 5:26 PM Michael S. Tsirkin <[email protected]> wrote:
> >
> > On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
> > > Sorry for reviving this ancient thread. I was kinda lost for the conclusion
> > > it ended up with. I have the following questions,
> > >
> > > 1. legacy guest support: from the past conversations it doesn't seem the
> > > support will be completely dropped from the table, is my understanding
> > > correct? Actually we're interested in supporting virtio v0.95 guest for x86,
> > > which is backed by the spec at
> > > https://ozlabs.org/~rusty/virtio-spec/virtio-0.9.5.pdf. Though I'm not sure
> > > if there's request/need to support wilder legacy virtio versions earlier
> > > beyond.
> >
> > I personally feel it's less work to add in kernel than try to
> > work around it in userspace. Jason feels differently.
> > Maybe post the patches and this will prove to Jason it's not
> > too terrible?
>
> That's one way, other than the config access before setting features,
> we need to deal with other stuffs:
>
> 1) VIRTIO_F_ORDER_PLATFORM
> 2) there could be a parent device that only support 1.0 device
>
> And a lot of other stuff summarized in spec 7.4 which seems not an
> easy task. Various vDPA parent drivers were written under the
> assumption that only modern devices are supported.
>
> Thanks

Limiting things to x86 will likely address most issues though, won't it?

> >
> > > 2. suppose some form of legacy guest support needs to be there, how do we
> > > deal with the bogus assumption below in vdpa_get_config() in the short term?
> > > It looks one of the intuitive fix is to move the vdpa_set_features call out
> > > of vdpa_get_config() to vdpa_set_config().
> > >
> > > /*
> > > * Config accesses aren't supposed to trigger before features are
> > > set.
> > > * If it does happen we assume a legacy guest.
> > > */
> > > if (!vdev->features_valid)
> > > vdpa_set_features(vdev, 0);
> > > ops->get_config(vdev, offset, buf, len);
> > >
> > > I can post a patch to fix 2) if there's consensus already reached.
> > >
> > > Thanks,
> > > -Siwei
> >
> > I'm not sure how important it is to change that.
> > In any case it only affects transitional devices, right?
> > Legacy only should not care ...
> >
> >
> > > On 3/2/2021 2:53 AM, Jason Wang wrote:
> > > >
> > > > On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
> > > > > On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
> > > > > > On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
> > > > > > > On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
> > > > > > > > > Detecting it isn't enough though, we will need a new ioctl to notify
> > > > > > > > > the kernel that it's a legacy guest. Ugh :(
> > > > > > > > Well, although I think adding an ioctl is doable, may I
> > > > > > > > know what the use
> > > > > > > > case there will be for kernel to leverage such info
> > > > > > > > directly? Is there a
> > > > > > > > case QEMU can't do with dedicate ioctls later if there's indeed
> > > > > > > > differentiation (legacy v.s. modern) needed?
> > > > > > > BTW a good API could be
> > > > > > >
> > > > > > > #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > > > > > > #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > > > > > >
> > > > > > > we did it per vring but maybe that was a mistake ...
> > > > > >
> > > > > > Actually, I wonder whether it's good time to just not support
> > > > > > legacy driver
> > > > > > for vDPA. Consider:
> > > > > >
> > > > > > 1) It's definition is no-normative
> > > > > > 2) A lot of budren of codes
> > > > > >
> > > > > > So qemu can still present the legacy device since the config
> > > > > > space or other
> > > > > > stuffs that is presented by vhost-vDPA is not expected to be
> > > > > > accessed by
> > > > > > guest directly. Qemu can do the endian conversion when necessary
> > > > > > in this
> > > > > > case?
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > Overall I would be fine with this approach but we need to avoid breaking
> > > > > working userspace, qemu releases with vdpa support are out there and
> > > > > seem to work for people. Any changes need to take that into account
> > > > > and document compatibility concerns.
> > > >
> > > >
> > > > Agree, let me check.
> > > >
> > > >
> > > > > I note that any hardware
> > > > > implementation is already broken for legacy except on platforms with
> > > > > strong ordering which might be helpful in reducing the scope.
> > > >
> > > >
> > > > Yes.
> > > >
> > > > Thanks
> > > >
> > > >
> > > > >
> > > > >
> > > >
> >


2021-12-13 08:57:57

by Jason Wang

[permalink] [raw]
Subject: Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)

On Mon, Dec 13, 2021 at 4:07 PM Michael S. Tsirkin <[email protected]> wrote:
>
> On Mon, Dec 13, 2021 at 11:02:39AM +0800, Jason Wang wrote:
> > On Sun, Dec 12, 2021 at 5:26 PM Michael S. Tsirkin <[email protected]> wrote:
> > >
> > > On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
> > > > Sorry for reviving this ancient thread. I was kinda lost for the conclusion
> > > > it ended up with. I have the following questions,
> > > >
> > > > 1. legacy guest support: from the past conversations it doesn't seem the
> > > > support will be completely dropped from the table, is my understanding
> > > > correct? Actually we're interested in supporting virtio v0.95 guest for x86,
> > > > which is backed by the spec at
> > > > https://ozlabs.org/~rusty/virtio-spec/virtio-0.9.5.pdf. Though I'm not sure
> > > > if there's request/need to support wilder legacy virtio versions earlier
> > > > beyond.
> > >
> > > I personally feel it's less work to add in kernel than try to
> > > work around it in userspace. Jason feels differently.
> > > Maybe post the patches and this will prove to Jason it's not
> > > too terrible?
> >
> > That's one way, other than the config access before setting features,
> > we need to deal with other stuffs:
> >
> > 1) VIRTIO_F_ORDER_PLATFORM
> > 2) there could be a parent device that only support 1.0 device
> >
> > And a lot of other stuff summarized in spec 7.4 which seems not an
> > easy task. Various vDPA parent drivers were written under the
> > assumption that only modern devices are supported.
> >
> > Thanks
>
> Limiting things to x86 will likely address most issues though, won't it?

For the ordering, yes. But it means we need to introduce a config
option for legacy logic?

And we need to deal with, as you said in another thread, kick before DRIVER_OK:

E.g we had thing like this:

if ((status & VIRTIO_CONFIG_S_DRIVER_OK) &&
!(status_old & VIRTIO_CONFIG_S_DRIVER_OK)) {
ret = ifcvf_request_irq(adapter);
if (ret) {

Similar issues could be found in other parents.

We also need to consider whether we should encourage the vendor to
implement the logic.

I think we can try and see how hard it is.

Thanks

>
> > >
> > > > 2. suppose some form of legacy guest support needs to be there, how do we
> > > > deal with the bogus assumption below in vdpa_get_config() in the short term?
> > > > It looks one of the intuitive fix is to move the vdpa_set_features call out
> > > > of vdpa_get_config() to vdpa_set_config().
> > > >
> > > > /*
> > > > * Config accesses aren't supposed to trigger before features are
> > > > set.
> > > > * If it does happen we assume a legacy guest.
> > > > */
> > > > if (!vdev->features_valid)
> > > > vdpa_set_features(vdev, 0);
> > > > ops->get_config(vdev, offset, buf, len);
> > > >
> > > > I can post a patch to fix 2) if there's consensus already reached.
> > > >
> > > > Thanks,
> > > > -Siwei
> > >
> > > I'm not sure how important it is to change that.
> > > In any case it only affects transitional devices, right?
> > > Legacy only should not care ...
> > >
> > >
> > > > On 3/2/2021 2:53 AM, Jason Wang wrote:
> > > > >
> > > > > On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
> > > > > > On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
> > > > > > > On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
> > > > > > > > On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
> > > > > > > > > > Detecting it isn't enough though, we will need a new ioctl to notify
> > > > > > > > > > the kernel that it's a legacy guest. Ugh :(
> > > > > > > > > Well, although I think adding an ioctl is doable, may I
> > > > > > > > > know what the use
> > > > > > > > > case there will be for kernel to leverage such info
> > > > > > > > > directly? Is there a
> > > > > > > > > case QEMU can't do with dedicate ioctls later if there's indeed
> > > > > > > > > differentiation (legacy v.s. modern) needed?
> > > > > > > > BTW a good API could be
> > > > > > > >
> > > > > > > > #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > > > > > > > #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > > > > > > >
> > > > > > > > we did it per vring but maybe that was a mistake ...
> > > > > > >
> > > > > > > Actually, I wonder whether it's good time to just not support
> > > > > > > legacy driver
> > > > > > > for vDPA. Consider:
> > > > > > >
> > > > > > > 1) It's definition is no-normative
> > > > > > > 2) A lot of budren of codes
> > > > > > >
> > > > > > > So qemu can still present the legacy device since the config
> > > > > > > space or other
> > > > > > > stuffs that is presented by vhost-vDPA is not expected to be
> > > > > > > accessed by
> > > > > > > guest directly. Qemu can do the endian conversion when necessary
> > > > > > > in this
> > > > > > > case?
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > Overall I would be fine with this approach but we need to avoid breaking
> > > > > > working userspace, qemu releases with vdpa support are out there and
> > > > > > seem to work for people. Any changes need to take that into account
> > > > > > and document compatibility concerns.
> > > > >
> > > > >
> > > > > Agree, let me check.
> > > > >
> > > > >
> > > > > > I note that any hardware
> > > > > > implementation is already broken for legacy except on platforms with
> > > > > > strong ordering which might be helpful in reducing the scope.
> > > > >
> > > > >
> > > > > Yes.
> > > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > > >
> > > > > >
> > > > >
> > >
>


2021-12-13 10:43:06

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)

On Mon, Dec 13, 2021 at 04:57:38PM +0800, Jason Wang wrote:
> On Mon, Dec 13, 2021 at 4:07 PM Michael S. Tsirkin <[email protected]> wrote:
> >
> > On Mon, Dec 13, 2021 at 11:02:39AM +0800, Jason Wang wrote:
> > > On Sun, Dec 12, 2021 at 5:26 PM Michael S. Tsirkin <[email protected]> wrote:
> > > >
> > > > On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
> > > > > Sorry for reviving this ancient thread. I was kinda lost for the conclusion
> > > > > it ended up with. I have the following questions,
> > > > >
> > > > > 1. legacy guest support: from the past conversations it doesn't seem the
> > > > > support will be completely dropped from the table, is my understanding
> > > > > correct? Actually we're interested in supporting virtio v0.95 guest for x86,
> > > > > which is backed by the spec at
> > > > > https://ozlabs.org/~rusty/virtio-spec/virtio-0.9.5.pdf. Though I'm not sure
> > > > > if there's request/need to support wilder legacy virtio versions earlier
> > > > > beyond.
> > > >
> > > > I personally feel it's less work to add in kernel than try to
> > > > work around it in userspace. Jason feels differently.
> > > > Maybe post the patches and this will prove to Jason it's not
> > > > too terrible?
> > >
> > > That's one way, other than the config access before setting features,
> > > we need to deal with other stuffs:
> > >
> > > 1) VIRTIO_F_ORDER_PLATFORM
> > > 2) there could be a parent device that only support 1.0 device
> > >
> > > And a lot of other stuff summarized in spec 7.4 which seems not an
> > > easy task. Various vDPA parent drivers were written under the
> > > assumption that only modern devices are supported.
> > >
> > > Thanks
> >
> > Limiting things to x86 will likely address most issues though, won't it?
>
> For the ordering, yes. But it means we need to introduce a config
> option for legacy logic?
> And we need to deal with, as you said in another thread, kick before DRIVER_OK:
>
> E.g we had thing like this:
>
> if ((status & VIRTIO_CONFIG_S_DRIVER_OK) &&
> !(status_old & VIRTIO_CONFIG_S_DRIVER_OK)) {
> ret = ifcvf_request_irq(adapter);
> if (ret) {
>
> Similar issues could be found in other parents.

The driver ok thing is mostly an issue for block where it
expects to access the disk directly during probe.

> We also need to consider whether we should encourage the vendor to
> implement the logic.
>
> I think we can try and see how hard it is.
>
> Thanks

Right. My point exactly.

> >
> > > >
> > > > > 2. suppose some form of legacy guest support needs to be there, how do we
> > > > > deal with the bogus assumption below in vdpa_get_config() in the short term?
> > > > > It looks one of the intuitive fix is to move the vdpa_set_features call out
> > > > > of vdpa_get_config() to vdpa_set_config().
> > > > >
> > > > > /*
> > > > > * Config accesses aren't supposed to trigger before features are
> > > > > set.
> > > > > * If it does happen we assume a legacy guest.
> > > > > */
> > > > > if (!vdev->features_valid)
> > > > > vdpa_set_features(vdev, 0);
> > > > > ops->get_config(vdev, offset, buf, len);
> > > > >
> > > > > I can post a patch to fix 2) if there's consensus already reached.
> > > > >
> > > > > Thanks,
> > > > > -Siwei
> > > >
> > > > I'm not sure how important it is to change that.
> > > > In any case it only affects transitional devices, right?
> > > > Legacy only should not care ...
> > > >
> > > >
> > > > > On 3/2/2021 2:53 AM, Jason Wang wrote:
> > > > > >
> > > > > > On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
> > > > > > > On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
> > > > > > > > On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
> > > > > > > > > On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
> > > > > > > > > > > Detecting it isn't enough though, we will need a new ioctl to notify
> > > > > > > > > > > the kernel that it's a legacy guest. Ugh :(
> > > > > > > > > > Well, although I think adding an ioctl is doable, may I
> > > > > > > > > > know what the use
> > > > > > > > > > case there will be for kernel to leverage such info
> > > > > > > > > > directly? Is there a
> > > > > > > > > > case QEMU can't do with dedicate ioctls later if there's indeed
> > > > > > > > > > differentiation (legacy v.s. modern) needed?
> > > > > > > > > BTW a good API could be
> > > > > > > > >
> > > > > > > > > #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > > > > > > > > #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > > > > > > > >
> > > > > > > > > we did it per vring but maybe that was a mistake ...
> > > > > > > >
> > > > > > > > Actually, I wonder whether it's good time to just not support
> > > > > > > > legacy driver
> > > > > > > > for vDPA. Consider:
> > > > > > > >
> > > > > > > > 1) It's definition is no-normative
> > > > > > > > 2) A lot of budren of codes
> > > > > > > >
> > > > > > > > So qemu can still present the legacy device since the config
> > > > > > > > space or other
> > > > > > > > stuffs that is presented by vhost-vDPA is not expected to be
> > > > > > > > accessed by
> > > > > > > > guest directly. Qemu can do the endian conversion when necessary
> > > > > > > > in this
> > > > > > > > case?
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > Overall I would be fine with this approach but we need to avoid breaking
> > > > > > > working userspace, qemu releases with vdpa support are out there and
> > > > > > > seem to work for people. Any changes need to take that into account
> > > > > > > and document compatibility concerns.
> > > > > >
> > > > > >
> > > > > > Agree, let me check.
> > > > > >
> > > > > >
> > > > > > > I note that any hardware
> > > > > > > implementation is already broken for legacy except on platforms with
> > > > > > > strong ordering which might be helpful in reducing the scope.
> > > > > >
> > > > > >
> > > > > > Yes.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > >
> >


2021-12-14 01:14:18

by Si-Wei Liu

[permalink] [raw]
Subject: Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)



On 12/12/2021 7:02 PM, Jason Wang wrote:
> On Sun, Dec 12, 2021 at 5:26 PM Michael S. Tsirkin <[email protected]> wrote:
>> On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
>>> Sorry for reviving this ancient thread. I was kinda lost for the conclusion
>>> it ended up with. I have the following questions,
>>>
>>> 1. legacy guest support: from the past conversations it doesn't seem the
>>> support will be completely dropped from the table, is my understanding
>>> correct? Actually we're interested in supporting virtio v0.95 guest for x86,
>>> which is backed by the spec at
>>> https://urldefense.com/v3/__https://ozlabs.org/*rusty/virtio-spec/virtio-0.9.5.pdf__;fg!!ACWV5N9M2RV99hQ!f64RqPFbYWWTGBgfWLzjlpJR_89WgX9KQTTz2vd-9UvMufMzqEbsajs8dxSfg0G8$ . Though I'm not sure
>>> if there's request/need to support wilder legacy virtio versions earlier
>>> beyond.
>> I personally feel it's less work to add in kernel than try to
>> work around it in userspace. Jason feels differently.
>> Maybe post the patches and this will prove to Jason it's not
>> too terrible?
> That's one way, other than the config access before setting features,
> we need to deal with other stuffs:
>
> 1) VIRTIO_F_ORDER_PLATFORM
> 2) there could be a parent device that only support 1.0 device
We do want to involve vendor's support for a legacy (or transitional)
device datapath. Otherwise it'd be too difficult to emulate/translate in
software/QEMU. The above two might not be an issue if the vendor claims
0.95 support in virtqueue and ring layout, plus limiting to x86 support
(LE with weak ordering) seems to simplify a lot of these requirements. I
don't think emulating a legacy device model on top of a 1.0 vdpa parent
for the dataplane would be a good idea, either.

>
> And a lot of other stuff summarized in spec 7.4 which seems not an
> easy task. Various vDPA parent drivers were written under the
> assumption that only modern devices are supported.
If some of these vDPA vendors do provide the 0.95 support, especially on
the datapath and ring layout that well satisfies a transitional device
model defined in section 7.4, I guess we can scope the initial support
to these vendor drivers and x86 only. Let me know if I miss something else.

Thanks,
-Siwei


>
> Thanks
>
>>> 2. suppose some form of legacy guest support needs to be there, how do we
>>> deal with the bogus assumption below in vdpa_get_config() in the short term?
>>> It looks one of the intuitive fix is to move the vdpa_set_features call out
>>> of vdpa_get_config() to vdpa_set_config().
>>>
>>> /*
>>> * Config accesses aren't supposed to trigger before features are
>>> set.
>>> * If it does happen we assume a legacy guest.
>>> */
>>> if (!vdev->features_valid)
>>> vdpa_set_features(vdev, 0);
>>> ops->get_config(vdev, offset, buf, len);
>>>
>>> I can post a patch to fix 2) if there's consensus already reached.
>>>
>>> Thanks,
>>> -Siwei
>> I'm not sure how important it is to change that.
>> In any case it only affects transitional devices, right?
>> Legacy only should not care ...
>>
>>
>>> On 3/2/2021 2:53 AM, Jason Wang wrote:
>>>> On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
>>>>> On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
>>>>>> On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
>>>>>>> On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
>>>>>>>>> Detecting it isn't enough though, we will need a new ioctl to notify
>>>>>>>>> the kernel that it's a legacy guest. Ugh :(
>>>>>>>> Well, although I think adding an ioctl is doable, may I
>>>>>>>> know what the use
>>>>>>>> case there will be for kernel to leverage such info
>>>>>>>> directly? Is there a
>>>>>>>> case QEMU can't do with dedicate ioctls later if there's indeed
>>>>>>>> differentiation (legacy v.s. modern) needed?
>>>>>>> BTW a good API could be
>>>>>>>
>>>>>>> #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
>>>>>>> #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
>>>>>>>
>>>>>>> we did it per vring but maybe that was a mistake ...
>>>>>> Actually, I wonder whether it's good time to just not support
>>>>>> legacy driver
>>>>>> for vDPA. Consider:
>>>>>>
>>>>>> 1) It's definition is no-normative
>>>>>> 2) A lot of budren of codes
>>>>>>
>>>>>> So qemu can still present the legacy device since the config
>>>>>> space or other
>>>>>> stuffs that is presented by vhost-vDPA is not expected to be
>>>>>> accessed by
>>>>>> guest directly. Qemu can do the endian conversion when necessary
>>>>>> in this
>>>>>> case?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>> Overall I would be fine with this approach but we need to avoid breaking
>>>>> working userspace, qemu releases with vdpa support are out there and
>>>>> seem to work for people. Any changes need to take that into account
>>>>> and document compatibility concerns.
>>>>
>>>> Agree, let me check.
>>>>
>>>>
>>>>> I note that any hardware
>>>>> implementation is already broken for legacy except on platforms with
>>>>> strong ordering which might be helpful in reducing the scope.
>>>>
>>>> Yes.
>>>>
>>>> Thanks
>>>>
>>>>
>>>>>


2021-12-14 02:00:08

by Si-Wei Liu

[permalink] [raw]
Subject: Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)



On 12/12/2021 1:26 AM, Michael S. Tsirkin wrote:
> On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
>> Sorry for reviving this ancient thread. I was kinda lost for the conclusion
>> it ended up with. I have the following questions,
>>
>> 1. legacy guest support: from the past conversations it doesn't seem the
>> support will be completely dropped from the table, is my understanding
>> correct? Actually we're interested in supporting virtio v0.95 guest for x86,
>> which is backed by the spec at
>> https://urldefense.com/v3/__https://ozlabs.org/*rusty/virtio-spec/virtio-0.9.5.pdf__;fg!!ACWV5N9M2RV99hQ!dTKmzJwwRsFM7BtSuTDu1cNly5n4XCotH0WYmidzGqHSXt40i7ZU43UcNg7GYxZg$ . Though I'm not sure
>> if there's request/need to support wilder legacy virtio versions earlier
>> beyond.
> I personally feel it's less work to add in kernel than try to
> work around it in userspace. Jason feels differently.
> Maybe post the patches and this will prove to Jason it's not
> too terrible?
I suppose if the vdpa vendor does support 0.95 in the datapath and ring
layout level and is limited to x86 only, there should be easy way out. I
checked with Eli and other Mellanox/NVDIA folks for hardware/firmware
level 0.95 support, it seems all the ingredient had been there already
dated back to the DPDK days. The only major thing limiting is in the
vDPA software that the current vdpa core has the assumption around
VIRTIO_F_ACCESS_PLATFORM for a few DMA setup ops, which is virtio 1.0 only.

>
>> 2. suppose some form of legacy guest support needs to be there, how do we
>> deal with the bogus assumption below in vdpa_get_config() in the short term?
>> It looks one of the intuitive fix is to move the vdpa_set_features call out
>> of vdpa_get_config() to vdpa_set_config().
>>
>>         /*
>>          * Config accesses aren't supposed to trigger before features are
>> set.
>>          * If it does happen we assume a legacy guest.
>>          */
>>         if (!vdev->features_valid)
>>                 vdpa_set_features(vdev, 0);
>>         ops->get_config(vdev, offset, buf, len);
>>
>> I can post a patch to fix 2) if there's consensus already reached.
>>
>> Thanks,
>> -Siwei
> I'm not sure how important it is to change that.
> In any case it only affects transitional devices, right?
> Legacy only should not care ...
Yes I'd like to distinguish legacy driver (suppose it is 0.95) against
the modern one in a transitional device model rather than being legacy
only. That way a v0.95 and v1.0 supporting vdpa parent can support both
types of guests without having to reconfigure. Or are you suggesting
limit to legacy only at the time of vdpa creation would simplify the
implementation a lot?

Thanks,
-Siwei

>
>> On 3/2/2021 2:53 AM, Jason Wang wrote:
>>> On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
>>>> On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
>>>>> On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
>>>>>> On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
>>>>>>>> Detecting it isn't enough though, we will need a new ioctl to notify
>>>>>>>> the kernel that it's a legacy guest. Ugh :(
>>>>>>> Well, although I think adding an ioctl is doable, may I
>>>>>>> know what the use
>>>>>>> case there will be for kernel to leverage such info
>>>>>>> directly? Is there a
>>>>>>> case QEMU can't do with dedicate ioctls later if there's indeed
>>>>>>> differentiation (legacy v.s. modern) needed?
>>>>>> BTW a good API could be
>>>>>>
>>>>>> #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
>>>>>> #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
>>>>>>
>>>>>> we did it per vring but maybe that was a mistake ...
>>>>> Actually, I wonder whether it's good time to just not support
>>>>> legacy driver
>>>>> for vDPA. Consider:
>>>>>
>>>>> 1) It's definition is no-normative
>>>>> 2) A lot of budren of codes
>>>>>
>>>>> So qemu can still present the legacy device since the config
>>>>> space or other
>>>>> stuffs that is presented by vhost-vDPA is not expected to be
>>>>> accessed by
>>>>> guest directly. Qemu can do the endian conversion when necessary
>>>>> in this
>>>>> case?
>>>>>
>>>>> Thanks
>>>>>
>>>> Overall I would be fine with this approach but we need to avoid breaking
>>>> working userspace, qemu releases with vdpa support are out there and
>>>> seem to work for people. Any changes need to take that into account
>>>> and document compatibility concerns.
>>>
>>> Agree, let me check.
>>>
>>>
>>>>   I note that any hardware
>>>> implementation is already broken for legacy except on platforms with
>>>> strong ordering which might be helpful in reducing the scope.
>>>
>>> Yes.
>>>
>>> Thanks
>>>
>>>
>>>>


2021-12-14 03:02:16

by Jason Wang

[permalink] [raw]
Subject: Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)

On Tue, Dec 14, 2021 at 10:00 AM Si-Wei Liu <[email protected]> wrote:
>
>
>
> On 12/12/2021 1:26 AM, Michael S. Tsirkin wrote:
> > On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
> >> Sorry for reviving this ancient thread. I was kinda lost for the conclusion
> >> it ended up with. I have the following questions,
> >>
> >> 1. legacy guest support: from the past conversations it doesn't seem the
> >> support will be completely dropped from the table, is my understanding
> >> correct? Actually we're interested in supporting virtio v0.95 guest for x86,
> >> which is backed by the spec at
> >> https://urldefense.com/v3/__https://ozlabs.org/*rusty/virtio-spec/virtio-0.9.5.pdf__;fg!!ACWV5N9M2RV99hQ!dTKmzJwwRsFM7BtSuTDu1cNly5n4XCotH0WYmidzGqHSXt40i7ZU43UcNg7GYxZg$ . Though I'm not sure
> >> if there's request/need to support wilder legacy virtio versions earlier
> >> beyond.
> > I personally feel it's less work to add in kernel than try to
> > work around it in userspace. Jason feels differently.
> > Maybe post the patches and this will prove to Jason it's not
> > too terrible?
> I suppose if the vdpa vendor does support 0.95 in the datapath and ring
> layout level and is limited to x86 only, there should be easy way out.

Note that thought I try to mandate 1.0 device when writing the codes
but the core vdpa doesn't mandate it, and we've already had one parent
which is based on the 0.95 spec which is the eni_vdpa:

1) it depends on X86 (so no endian and ordering issues)
2) it has various subtle things like it can't work well without
mrg_rxbuf features negotiated since the device assumes a fixed vnet
header length.
3) it can only be used by legacy drivers in the guest (no VERSION_1
since the device mandates a 4096 alignment which doesn't comply with
1.0)

So it's a proof of 0.95 parent support in the vDPA core.

And we had a modern only parent, that is the vp_vdpa parent (though
it's not hard to add legacy support).

So for all the other vendors, assuming it has full support for
transitional devices for x86. As discussed, we need to handle:

1) config access before features
2) kick before driver_ok

Anything else? If not, it looks easier to do them in the userspace.
The only advantages for doing it in the kernel is to make it work for
virtio-vdpa. But virito-vdpa doesn't need transitional devices.

> I
> checked with Eli and other Mellanox/NVDIA folks for hardware/firmware
> level 0.95 support, it seems all the ingredient had been there already
> dated back to the DPDK days. The only major thing limiting is in the
> vDPA software that the current vdpa core has the assumption around
> VIRTIO_F_ACCESS_PLATFORM for a few DMA setup ops, which is virtio 1.0 only.

The code doesn't have such an assumption or anything I missed? Or you
meant the vhost-vdpa that tries to talk with the IOMMU layer directly,
it should be ok since host IOMMU is hidden from guest anyway.

>
> >
> >> 2. suppose some form of legacy guest support needs to be there, how do we
> >> deal with the bogus assumption below in vdpa_get_config() in the short term?
> >> It looks one of the intuitive fix is to move the vdpa_set_features call out
> >> of vdpa_get_config() to vdpa_set_config().
> >>
> >> /*
> >> * Config accesses aren't supposed to trigger before features are
> >> set.
> >> * If it does happen we assume a legacy guest.
> >> */
> >> if (!vdev->features_valid)
> >> vdpa_set_features(vdev, 0);
> >> ops->get_config(vdev, offset, buf, len);
> >>
> >> I can post a patch to fix 2) if there's consensus already reached.
> >>
> >> Thanks,
> >> -Siwei
> > I'm not sure how important it is to change that.
> > In any case it only affects transitional devices, right?
> > Legacy only should not care ...
> Yes I'd like to distinguish legacy driver (suppose it is 0.95) against
> the modern one in a transitional device model rather than being legacy
> only. That way a v0.95 and v1.0 supporting vdpa parent can support both
> types of guests without having to reconfigure.

I think this is what a transitional device is expected to work.

Thanks

> Or are you suggesting
> limit to legacy only at the time of vdpa creation would simplify the
> implementation a lot?
>
> Thanks,
> -Siwei
>
> >
> >> On 3/2/2021 2:53 AM, Jason Wang wrote:
> >>> On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
> >>>> On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
> >>>>> On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
> >>>>>> On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
> >>>>>>>> Detecting it isn't enough though, we will need a new ioctl to notify
> >>>>>>>> the kernel that it's a legacy guest. Ugh :(
> >>>>>>> Well, although I think adding an ioctl is doable, may I
> >>>>>>> know what the use
> >>>>>>> case there will be for kernel to leverage such info
> >>>>>>> directly? Is there a
> >>>>>>> case QEMU can't do with dedicate ioctls later if there's indeed
> >>>>>>> differentiation (legacy v.s. modern) needed?
> >>>>>> BTW a good API could be
> >>>>>>
> >>>>>> #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> >>>>>> #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> >>>>>>
> >>>>>> we did it per vring but maybe that was a mistake ...
> >>>>> Actually, I wonder whether it's good time to just not support
> >>>>> legacy driver
> >>>>> for vDPA. Consider:
> >>>>>
> >>>>> 1) It's definition is no-normative
> >>>>> 2) A lot of budren of codes
> >>>>>
> >>>>> So qemu can still present the legacy device since the config
> >>>>> space or other
> >>>>> stuffs that is presented by vhost-vDPA is not expected to be
> >>>>> accessed by
> >>>>> guest directly. Qemu can do the endian conversion when necessary
> >>>>> in this
> >>>>> case?
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>> Overall I would be fine with this approach but we need to avoid breaking
> >>>> working userspace, qemu releases with vdpa support are out there and
> >>>> seem to work for people. Any changes need to take that into account
> >>>> and document compatibility concerns.
> >>>
> >>> Agree, let me check.
> >>>
> >>>
> >>>> I note that any hardware
> >>>> implementation is already broken for legacy except on platforms with
> >>>> strong ordering which might be helpful in reducing the scope.
> >>>
> >>> Yes.
> >>>
> >>> Thanks
> >>>
> >>>
> >>>>
>


2021-12-14 05:06:19

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)

On Mon, Dec 13, 2021 at 05:59:45PM -0800, Si-Wei Liu wrote:
>
>
> On 12/12/2021 1:26 AM, Michael S. Tsirkin wrote:
> > On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
> > > Sorry for reviving this ancient thread. I was kinda lost for the conclusion
> > > it ended up with. I have the following questions,
> > >
> > > 1. legacy guest support: from the past conversations it doesn't seem the
> > > support will be completely dropped from the table, is my understanding
> > > correct? Actually we're interested in supporting virtio v0.95 guest for x86,
> > > which is backed by the spec at
> > > https://urldefense.com/v3/__https://ozlabs.org/*rusty/virtio-spec/virtio-0.9.5.pdf__;fg!!ACWV5N9M2RV99hQ!dTKmzJwwRsFM7BtSuTDu1cNly5n4XCotH0WYmidzGqHSXt40i7ZU43UcNg7GYxZg$ . Though I'm not sure
> > > if there's request/need to support wilder legacy virtio versions earlier
> > > beyond.
> > I personally feel it's less work to add in kernel than try to
> > work around it in userspace. Jason feels differently.
> > Maybe post the patches and this will prove to Jason it's not
> > too terrible?
> I suppose if the vdpa vendor does support 0.95 in the datapath and ring
> layout level and is limited to x86 only, there should be easy way out.

Note a subtle difference: what matters is that guest, not host is x86.
Matters for emulators which might reorder memory accesses.
I guess this enforcement belongs in QEMU then?

> I
> checked with Eli and other Mellanox/NVDIA folks for hardware/firmware level
> 0.95 support, it seems all the ingredient had been there already dated back
> to the DPDK days. The only major thing limiting is in the vDPA software that
> the current vdpa core has the assumption around VIRTIO_F_ACCESS_PLATFORM for
> a few DMA setup ops, which is virtio 1.0 only.
>
> >
> > > 2. suppose some form of legacy guest support needs to be there, how do we
> > > deal with the bogus assumption below in vdpa_get_config() in the short term?
> > > It looks one of the intuitive fix is to move the vdpa_set_features call out
> > > of vdpa_get_config() to vdpa_set_config().
> > >
> > >         /*
> > >          * Config accesses aren't supposed to trigger before features are
> > > set.
> > >          * If it does happen we assume a legacy guest.
> > >          */
> > >         if (!vdev->features_valid)
> > >                 vdpa_set_features(vdev, 0);
> > >         ops->get_config(vdev, offset, buf, len);
> > >
> > > I can post a patch to fix 2) if there's consensus already reached.
> > >
> > > Thanks,
> > > -Siwei
> > I'm not sure how important it is to change that.
> > In any case it only affects transitional devices, right?
> > Legacy only should not care ...
> Yes I'd like to distinguish legacy driver (suppose it is 0.95) against the
> modern one in a transitional device model rather than being legacy only.
> That way a v0.95 and v1.0 supporting vdpa parent can support both types of
> guests without having to reconfigure. Or are you suggesting limit to legacy
> only at the time of vdpa creation would simplify the implementation a lot?
>
> Thanks,
> -Siwei


I don't know for sure. Take a look at the work Halil was doing
to try and support transitional devices with BE guests.


> >
> > > On 3/2/2021 2:53 AM, Jason Wang wrote:
> > > > On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
> > > > > On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
> > > > > > On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
> > > > > > > On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
> > > > > > > > > Detecting it isn't enough though, we will need a new ioctl to notify
> > > > > > > > > the kernel that it's a legacy guest. Ugh :(
> > > > > > > > Well, although I think adding an ioctl is doable, may I
> > > > > > > > know what the use
> > > > > > > > case there will be for kernel to leverage such info
> > > > > > > > directly? Is there a
> > > > > > > > case QEMU can't do with dedicate ioctls later if there's indeed
> > > > > > > > differentiation (legacy v.s. modern) needed?
> > > > > > > BTW a good API could be
> > > > > > >
> > > > > > > #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > > > > > > #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > > > > > >
> > > > > > > we did it per vring but maybe that was a mistake ...
> > > > > > Actually, I wonder whether it's good time to just not support
> > > > > > legacy driver
> > > > > > for vDPA. Consider:
> > > > > >
> > > > > > 1) It's definition is no-normative
> > > > > > 2) A lot of budren of codes
> > > > > >
> > > > > > So qemu can still present the legacy device since the config
> > > > > > space or other
> > > > > > stuffs that is presented by vhost-vDPA is not expected to be
> > > > > > accessed by
> > > > > > guest directly. Qemu can do the endian conversion when necessary
> > > > > > in this
> > > > > > case?
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > Overall I would be fine with this approach but we need to avoid breaking
> > > > > working userspace, qemu releases with vdpa support are out there and
> > > > > seem to work for people. Any changes need to take that into account
> > > > > and document compatibility concerns.
> > > >
> > > > Agree, let me check.
> > > >
> > > >
> > > > >   I note that any hardware
> > > > > implementation is already broken for legacy except on platforms with
> > > > > strong ordering which might be helpful in reducing the scope.
> > > >
> > > > Yes.
> > > >
> > > > Thanks
> > > >
> > > >
> > > > >


2021-12-15 01:05:59

by Si-Wei Liu

[permalink] [raw]
Subject: Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)



On 12/13/2021 9:06 PM, Michael S. Tsirkin wrote:
> On Mon, Dec 13, 2021 at 05:59:45PM -0800, Si-Wei Liu wrote:
>>
>> On 12/12/2021 1:26 AM, Michael S. Tsirkin wrote:
>>> On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
>>>> Sorry for reviving this ancient thread. I was kinda lost for the conclusion
>>>> it ended up with. I have the following questions,
>>>>
>>>> 1. legacy guest support: from the past conversations it doesn't seem the
>>>> support will be completely dropped from the table, is my understanding
>>>> correct? Actually we're interested in supporting virtio v0.95 guest for x86,
>>>> which is backed by the spec at
>>>> https://urldefense.com/v3/__https://ozlabs.org/*rusty/virtio-spec/virtio-0.9.5.pdf__;fg!!ACWV5N9M2RV99hQ!dTKmzJwwRsFM7BtSuTDu1cNly5n4XCotH0WYmidzGqHSXt40i7ZU43UcNg7GYxZg$ . Though I'm not sure
>>>> if there's request/need to support wilder legacy virtio versions earlier
>>>> beyond.
>>> I personally feel it's less work to add in kernel than try to
>>> work around it in userspace. Jason feels differently.
>>> Maybe post the patches and this will prove to Jason it's not
>>> too terrible?
>> I suppose if the vdpa vendor does support 0.95 in the datapath and ring
>> layout level and is limited to x86 only, there should be easy way out.
> Note a subtle difference: what matters is that guest, not host is x86.
> Matters for emulators which might reorder memory accesses.
> I guess this enforcement belongs in QEMU then?
Right, I mean to get started, the initial guest driver support and the
corresponding QEMU support for transitional vdpa backend can be limited
to x86 guest/host only. Since the config space is emulated in QEMU, I
suppose it's not hard to enforce in QEMU. QEMU can drive GET_LEGACY,
GET_ENDIAN et al ioctls in advance to get the capability from the
individual vendor driver. For that, we need another negotiation protocol
similar to vhost_user's protocol_features between the vdpa kernel and
QEMU, way before the guest driver is ever probed and its feature
negotiation kicks in. Not sure we need a GET_MEMORY_ORDER ioctl call
from the device, but we can assume weak ordering for legacy at this
point (x86 only)?

>
>> I
>> checked with Eli and other Mellanox/NVDIA folks for hardware/firmware level
>> 0.95 support, it seems all the ingredient had been there already dated back
>> to the DPDK days. The only major thing limiting is in the vDPA software that
>> the current vdpa core has the assumption around VIRTIO_F_ACCESS_PLATFORM for
>> a few DMA setup ops, which is virtio 1.0 only.
>>
>>>> 2. suppose some form of legacy guest support needs to be there, how do we
>>>> deal with the bogus assumption below in vdpa_get_config() in the short term?
>>>> It looks one of the intuitive fix is to move the vdpa_set_features call out
>>>> of vdpa_get_config() to vdpa_set_config().
>>>>
>>>>         /*
>>>>          * Config accesses aren't supposed to trigger before features are
>>>> set.
>>>>          * If it does happen we assume a legacy guest.
>>>>          */
>>>>         if (!vdev->features_valid)
>>>>                 vdpa_set_features(vdev, 0);
>>>>         ops->get_config(vdev, offset, buf, len);
>>>>
>>>> I can post a patch to fix 2) if there's consensus already reached.
>>>>
>>>> Thanks,
>>>> -Siwei
>>> I'm not sure how important it is to change that.
>>> In any case it only affects transitional devices, right?
>>> Legacy only should not care ...
>> Yes I'd like to distinguish legacy driver (suppose it is 0.95) against the
>> modern one in a transitional device model rather than being legacy only.
>> That way a v0.95 and v1.0 supporting vdpa parent can support both types of
>> guests without having to reconfigure. Or are you suggesting limit to legacy
>> only at the time of vdpa creation would simplify the implementation a lot?
>>
>> Thanks,
>> -Siwei
>
> I don't know for sure. Take a look at the work Halil was doing
> to try and support transitional devices with BE guests.
Hmmm, we can have those endianness ioctls defined but the initial QEMU
implementation can be started to support x86 guest/host with little
endian and weak memory ordering first. The real trick is to detect
legacy guest - I am not sure if it's feasible to shift all the legacy
detection work to QEMU, or the kernel has to be part of the detection
(e.g. the kick before DRIVER_OK thing we have to duplicate the tracking
effort in QEMU) as well. Let me take a further look and get back.

Meanwhile, I'll check internally to see if a legacy only model would
work. Thanks.

Thanks,
-Siwei


>
>
>>>> On 3/2/2021 2:53 AM, Jason Wang wrote:
>>>>> On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
>>>>>> On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
>>>>>>> On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
>>>>>>>> On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
>>>>>>>>>> Detecting it isn't enough though, we will need a new ioctl to notify
>>>>>>>>>> the kernel that it's a legacy guest. Ugh :(
>>>>>>>>> Well, although I think adding an ioctl is doable, may I
>>>>>>>>> know what the use
>>>>>>>>> case there will be for kernel to leverage such info
>>>>>>>>> directly? Is there a
>>>>>>>>> case QEMU can't do with dedicate ioctls later if there's indeed
>>>>>>>>> differentiation (legacy v.s. modern) needed?
>>>>>>>> BTW a good API could be
>>>>>>>>
>>>>>>>> #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
>>>>>>>> #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
>>>>>>>>
>>>>>>>> we did it per vring but maybe that was a mistake ...
>>>>>>> Actually, I wonder whether it's good time to just not support
>>>>>>> legacy driver
>>>>>>> for vDPA. Consider:
>>>>>>>
>>>>>>> 1) It's definition is no-normative
>>>>>>> 2) A lot of budren of codes
>>>>>>>
>>>>>>> So qemu can still present the legacy device since the config
>>>>>>> space or other
>>>>>>> stuffs that is presented by vhost-vDPA is not expected to be
>>>>>>> accessed by
>>>>>>> guest directly. Qemu can do the endian conversion when necessary
>>>>>>> in this
>>>>>>> case?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>> Overall I would be fine with this approach but we need to avoid breaking
>>>>>> working userspace, qemu releases with vdpa support are out there and
>>>>>> seem to work for people. Any changes need to take that into account
>>>>>> and document compatibility concerns.
>>>>> Agree, let me check.
>>>>>
>>>>>
>>>>>>   I note that any hardware
>>>>>> implementation is already broken for legacy except on platforms with
>>>>>> strong ordering which might be helpful in reducing the scope.
>>>>> Yes.
>>>>>
>>>>> Thanks
>>>>>
>>>>>


2021-12-15 02:06:50

by Jason Wang

[permalink] [raw]
Subject: Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)

On Wed, Dec 15, 2021 at 9:05 AM Si-Wei Liu <[email protected]> wrote:
>
>
>
> On 12/13/2021 9:06 PM, Michael S. Tsirkin wrote:
> > On Mon, Dec 13, 2021 at 05:59:45PM -0800, Si-Wei Liu wrote:
> >>
> >> On 12/12/2021 1:26 AM, Michael S. Tsirkin wrote:
> >>> On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
> >>>> Sorry for reviving this ancient thread. I was kinda lost for the conclusion
> >>>> it ended up with. I have the following questions,
> >>>>
> >>>> 1. legacy guest support: from the past conversations it doesn't seem the
> >>>> support will be completely dropped from the table, is my understanding
> >>>> correct? Actually we're interested in supporting virtio v0.95 guest for x86,
> >>>> which is backed by the spec at
> >>>> https://urldefense.com/v3/__https://ozlabs.org/*rusty/virtio-spec/virtio-0.9.5.pdf__;fg!!ACWV5N9M2RV99hQ!dTKmzJwwRsFM7BtSuTDu1cNly5n4XCotH0WYmidzGqHSXt40i7ZU43UcNg7GYxZg$ . Though I'm not sure
> >>>> if there's request/need to support wilder legacy virtio versions earlier
> >>>> beyond.
> >>> I personally feel it's less work to add in kernel than try to
> >>> work around it in userspace. Jason feels differently.
> >>> Maybe post the patches and this will prove to Jason it's not
> >>> too terrible?
> >> I suppose if the vdpa vendor does support 0.95 in the datapath and ring
> >> layout level and is limited to x86 only, there should be easy way out.
> > Note a subtle difference: what matters is that guest, not host is x86.
> > Matters for emulators which might reorder memory accesses.
> > I guess this enforcement belongs in QEMU then?
> Right, I mean to get started, the initial guest driver support and the
> corresponding QEMU support for transitional vdpa backend can be limited
> to x86 guest/host only. Since the config space is emulated in QEMU, I
> suppose it's not hard to enforce in QEMU.

It's more than just config space, most devices have headers before the buffer.

> QEMU can drive GET_LEGACY,
> GET_ENDIAN et al ioctls in advance to get the capability from the
> individual vendor driver. For that, we need another negotiation protocol
> similar to vhost_user's protocol_features between the vdpa kernel and
> QEMU, way before the guest driver is ever probed and its feature
> negotiation kicks in. Not sure we need a GET_MEMORY_ORDER ioctl call
> from the device, but we can assume weak ordering for legacy at this
> point (x86 only)?

I'm lost here, we have get_features() so:

1) VERSION_1 means the device uses LE if provided, otherwise natvie
2) ORDER_PLATFORM means device requires platform ordering

Any reason for having a new API for this?

>
> >
> >> I
> >> checked with Eli and other Mellanox/NVDIA folks for hardware/firmware level
> >> 0.95 support, it seems all the ingredient had been there already dated back
> >> to the DPDK days. The only major thing limiting is in the vDPA software that
> >> the current vdpa core has the assumption around VIRTIO_F_ACCESS_PLATFORM for
> >> a few DMA setup ops, which is virtio 1.0 only.
> >>
> >>>> 2. suppose some form of legacy guest support needs to be there, how do we
> >>>> deal with the bogus assumption below in vdpa_get_config() in the short term?
> >>>> It looks one of the intuitive fix is to move the vdpa_set_features call out
> >>>> of vdpa_get_config() to vdpa_set_config().
> >>>>
> >>>> /*
> >>>> * Config accesses aren't supposed to trigger before features are
> >>>> set.
> >>>> * If it does happen we assume a legacy guest.
> >>>> */
> >>>> if (!vdev->features_valid)
> >>>> vdpa_set_features(vdev, 0);
> >>>> ops->get_config(vdev, offset, buf, len);
> >>>>
> >>>> I can post a patch to fix 2) if there's consensus already reached.
> >>>>
> >>>> Thanks,
> >>>> -Siwei
> >>> I'm not sure how important it is to change that.
> >>> In any case it only affects transitional devices, right?
> >>> Legacy only should not care ...
> >> Yes I'd like to distinguish legacy driver (suppose it is 0.95) against the
> >> modern one in a transitional device model rather than being legacy only.
> >> That way a v0.95 and v1.0 supporting vdpa parent can support both types of
> >> guests without having to reconfigure. Or are you suggesting limit to legacy
> >> only at the time of vdpa creation would simplify the implementation a lot?
> >>
> >> Thanks,
> >> -Siwei
> >
> > I don't know for sure. Take a look at the work Halil was doing
> > to try and support transitional devices with BE guests.
> Hmmm, we can have those endianness ioctls defined but the initial QEMU
> implementation can be started to support x86 guest/host with little
> endian and weak memory ordering first. The real trick is to detect
> legacy guest - I am not sure if it's feasible to shift all the legacy
> detection work to QEMU, or the kernel has to be part of the detection
> (e.g. the kick before DRIVER_OK thing we have to duplicate the tracking
> effort in QEMU) as well. Let me take a further look and get back.

Michael may think differently but I think doing this in Qemu is much easier.

Thanks



>
> Meanwhile, I'll check internally to see if a legacy only model would
> work. Thanks.
>
> Thanks,
> -Siwei
>
>
> >
> >
> >>>> On 3/2/2021 2:53 AM, Jason Wang wrote:
> >>>>> On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
> >>>>>> On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
> >>>>>>> On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
> >>>>>>>> On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
> >>>>>>>>>> Detecting it isn't enough though, we will need a new ioctl to notify
> >>>>>>>>>> the kernel that it's a legacy guest. Ugh :(
> >>>>>>>>> Well, although I think adding an ioctl is doable, may I
> >>>>>>>>> know what the use
> >>>>>>>>> case there will be for kernel to leverage such info
> >>>>>>>>> directly? Is there a
> >>>>>>>>> case QEMU can't do with dedicate ioctls later if there's indeed
> >>>>>>>>> differentiation (legacy v.s. modern) needed?
> >>>>>>>> BTW a good API could be
> >>>>>>>>
> >>>>>>>> #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> >>>>>>>> #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> >>>>>>>>
> >>>>>>>> we did it per vring but maybe that was a mistake ...
> >>>>>>> Actually, I wonder whether it's good time to just not support
> >>>>>>> legacy driver
> >>>>>>> for vDPA. Consider:
> >>>>>>>
> >>>>>>> 1) It's definition is no-normative
> >>>>>>> 2) A lot of budren of codes
> >>>>>>>
> >>>>>>> So qemu can still present the legacy device since the config
> >>>>>>> space or other
> >>>>>>> stuffs that is presented by vhost-vDPA is not expected to be
> >>>>>>> accessed by
> >>>>>>> guest directly. Qemu can do the endian conversion when necessary
> >>>>>>> in this
> >>>>>>> case?
> >>>>>>>
> >>>>>>> Thanks
> >>>>>>>
> >>>>>> Overall I would be fine with this approach but we need to avoid breaking
> >>>>>> working userspace, qemu releases with vdpa support are out there and
> >>>>>> seem to work for people. Any changes need to take that into account
> >>>>>> and document compatibility concerns.
> >>>>> Agree, let me check.
> >>>>>
> >>>>>
> >>>>>> I note that any hardware
> >>>>>> implementation is already broken for legacy except on platforms with
> >>>>>> strong ordering which might be helpful in reducing the scope.
> >>>>> Yes.
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>>>
>


2021-12-15 20:52:39

by Si-Wei Liu

[permalink] [raw]
Subject: Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)



On 12/14/2021 6:06 PM, Jason Wang wrote:
> On Wed, Dec 15, 2021 at 9:05 AM Si-Wei Liu <[email protected]> wrote:
>>
>>
>> On 12/13/2021 9:06 PM, Michael S. Tsirkin wrote:
>>> On Mon, Dec 13, 2021 at 05:59:45PM -0800, Si-Wei Liu wrote:
>>>> On 12/12/2021 1:26 AM, Michael S. Tsirkin wrote:
>>>>> On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
>>>>>> Sorry for reviving this ancient thread. I was kinda lost for the conclusion
>>>>>> it ended up with. I have the following questions,
>>>>>>
>>>>>> 1. legacy guest support: from the past conversations it doesn't seem the
>>>>>> support will be completely dropped from the table, is my understanding
>>>>>> correct? Actually we're interested in supporting virtio v0.95 guest for x86,
>>>>>> which is backed by the spec at
>>>>>> https://urldefense.com/v3/__https://ozlabs.org/*rusty/virtio-spec/virtio-0.9.5.pdf__;fg!!ACWV5N9M2RV99hQ!dTKmzJwwRsFM7BtSuTDu1cNly5n4XCotH0WYmidzGqHSXt40i7ZU43UcNg7GYxZg$ . Though I'm not sure
>>>>>> if there's request/need to support wilder legacy virtio versions earlier
>>>>>> beyond.
>>>>> I personally feel it's less work to add in kernel than try to
>>>>> work around it in userspace. Jason feels differently.
>>>>> Maybe post the patches and this will prove to Jason it's not
>>>>> too terrible?
>>>> I suppose if the vdpa vendor does support 0.95 in the datapath and ring
>>>> layout level and is limited to x86 only, there should be easy way out.
>>> Note a subtle difference: what matters is that guest, not host is x86.
>>> Matters for emulators which might reorder memory accesses.
>>> I guess this enforcement belongs in QEMU then?
>> Right, I mean to get started, the initial guest driver support and the
>> corresponding QEMU support for transitional vdpa backend can be limited
>> to x86 guest/host only. Since the config space is emulated in QEMU, I
>> suppose it's not hard to enforce in QEMU.
> It's more than just config space, most devices have headers before the buffer.
The ordering in datapath (data VQs) would have to rely on vendor's
support. Since ORDER_PLATFORM is pretty new (v1.1), I guess vdpa h/w
vendor nowadays can/should well support the case when ORDER_PLATFORM is
not acked by the driver (actually this feature is filtered out by the
QEMU vhost-vdpa driver today), even with v1.0 spec conforming and modern
only vDPA device. The control VQ is implemented in software in the
kernel, which can be easily accommodated/fixed when needed.

>
>> QEMU can drive GET_LEGACY,
>> GET_ENDIAN et al ioctls in advance to get the capability from the
>> individual vendor driver. For that, we need another negotiation protocol
>> similar to vhost_user's protocol_features between the vdpa kernel and
>> QEMU, way before the guest driver is ever probed and its feature
>> negotiation kicks in. Not sure we need a GET_MEMORY_ORDER ioctl call
>> from the device, but we can assume weak ordering for legacy at this
>> point (x86 only)?
> I'm lost here, we have get_features() so:
I assume here you refer to get_device_features() that Eli just changed
the name.
>
> 1) VERSION_1 means the device uses LE if provided, otherwise natvie
> 2) ORDER_PLATFORM means device requires platform ordering
>
> Any reason for having a new API for this?
Are you going to enforce all vDPA hardware vendors to support the
transitional model for legacy guest? meaning guest not acknowledging
VERSION_1 would use the legacy interfaces captured in the spec section
7.4 (regarding ring layout, native endianness, message framing, vq
alignment of 4096, 32bit feature, no features_ok bit in status, IO port
interface i.e. all the things) instead? Noted we don't yet have a
set_device_features() that allows the vdpa device to tell whether it is
operating in transitional or modern-only mode. For software virtio, all
support for the legacy part in a transitional model has been built up
there already, however, it's not easy for vDPA vendors to implement all
the requirements for an all-or-nothing legacy guest support (big endian
guest for example). To these vendors, the legacy support within a
transitional model is more of feature to them and it's best to leave
some flexibility for them to implement partial support for legacy. That
in turn calls out the need for a vhost-user protocol feature like
negotiation API that can prohibit those unsupported guest setups to as
early as backend_init before launching the VM.


>
>>>> I
>>>> checked with Eli and other Mellanox/NVDIA folks for hardware/firmware level
>>>> 0.95 support, it seems all the ingredient had been there already dated back
>>>> to the DPDK days. The only major thing limiting is in the vDPA software that
>>>> the current vdpa core has the assumption around VIRTIO_F_ACCESS_PLATFORM for
>>>> a few DMA setup ops, which is virtio 1.0 only.
>>>>
>>>>>> 2. suppose some form of legacy guest support needs to be there, how do we
>>>>>> deal with the bogus assumption below in vdpa_get_config() in the short term?
>>>>>> It looks one of the intuitive fix is to move the vdpa_set_features call out
>>>>>> of vdpa_get_config() to vdpa_set_config().
>>>>>>
>>>>>> /*
>>>>>> * Config accesses aren't supposed to trigger before features are
>>>>>> set.
>>>>>> * If it does happen we assume a legacy guest.
>>>>>> */
>>>>>> if (!vdev->features_valid)
>>>>>> vdpa_set_features(vdev, 0);
>>>>>> ops->get_config(vdev, offset, buf, len);
>>>>>>
>>>>>> I can post a patch to fix 2) if there's consensus already reached.
>>>>>>
>>>>>> Thanks,
>>>>>> -Siwei
>>>>> I'm not sure how important it is to change that.
>>>>> In any case it only affects transitional devices, right?
>>>>> Legacy only should not care ...
>>>> Yes I'd like to distinguish legacy driver (suppose it is 0.95) against the
>>>> modern one in a transitional device model rather than being legacy only.
>>>> That way a v0.95 and v1.0 supporting vdpa parent can support both types of
>>>> guests without having to reconfigure. Or are you suggesting limit to legacy
>>>> only at the time of vdpa creation would simplify the implementation a lot?
>>>>
>>>> Thanks,
>>>> -Siwei
>>> I don't know for sure. Take a look at the work Halil was doing
>>> to try and support transitional devices with BE guests.
>> Hmmm, we can have those endianness ioctls defined but the initial QEMU
>> implementation can be started to support x86 guest/host with little
>> endian and weak memory ordering first. The real trick is to detect
>> legacy guest - I am not sure if it's feasible to shift all the legacy
>> detection work to QEMU, or the kernel has to be part of the detection
>> (e.g. the kick before DRIVER_OK thing we have to duplicate the tracking
>> effort in QEMU) as well. Let me take a further look and get back.
> Michael may think differently but I think doing this in Qemu is much easier.
I think the key is whether we position emulating legacy interfaces in
QEMU doing translation on top of a v1.0 modern-only device in the
kernel, or we allow vdpa core (or you can say vhost-vdpa) and vendor
driver to support a transitional model in the kernel that is able to
work for both v0.95 and v1.0 drivers, with some slight aid from QEMU for
detecting/emulation/shadowing (for e.g CVQ, I/O port relay). I guess for
the former we still rely on vendor for a performant data vqs
implementation, leaving the question to what it may end up eventually in
the kernel is effectively the latter).

Thanks,
-Siwei

>
> Thanks
>
>
>
>> Meanwhile, I'll check internally to see if a legacy only model would
>> work. Thanks.
>>
>> Thanks,
>> -Siwei
>>
>>
>>>
>>>>>> On 3/2/2021 2:53 AM, Jason Wang wrote:
>>>>>>> On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
>>>>>>>> On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
>>>>>>>>> On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
>>>>>>>>>> On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
>>>>>>>>>>>> Detecting it isn't enough though, we will need a new ioctl to notify
>>>>>>>>>>>> the kernel that it's a legacy guest. Ugh :(
>>>>>>>>>>> Well, although I think adding an ioctl is doable, may I
>>>>>>>>>>> know what the use
>>>>>>>>>>> case there will be for kernel to leverage such info
>>>>>>>>>>> directly? Is there a
>>>>>>>>>>> case QEMU can't do with dedicate ioctls later if there's indeed
>>>>>>>>>>> differentiation (legacy v.s. modern) needed?
>>>>>>>>>> BTW a good API could be
>>>>>>>>>>
>>>>>>>>>> #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
>>>>>>>>>> #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
>>>>>>>>>>
>>>>>>>>>> we did it per vring but maybe that was a mistake ...
>>>>>>>>> Actually, I wonder whether it's good time to just not support
>>>>>>>>> legacy driver
>>>>>>>>> for vDPA. Consider:
>>>>>>>>>
>>>>>>>>> 1) It's definition is no-normative
>>>>>>>>> 2) A lot of budren of codes
>>>>>>>>>
>>>>>>>>> So qemu can still present the legacy device since the config
>>>>>>>>> space or other
>>>>>>>>> stuffs that is presented by vhost-vDPA is not expected to be
>>>>>>>>> accessed by
>>>>>>>>> guest directly. Qemu can do the endian conversion when necessary
>>>>>>>>> in this
>>>>>>>>> case?
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>> Overall I would be fine with this approach but we need to avoid breaking
>>>>>>>> working userspace, qemu releases with vdpa support are out there and
>>>>>>>> seem to work for people. Any changes need to take that into account
>>>>>>>> and document compatibility concerns.
>>>>>>> Agree, let me check.
>>>>>>>
>>>>>>>
>>>>>>>> I note that any hardware
>>>>>>>> implementation is already broken for legacy except on platforms with
>>>>>>>> strong ordering which might be helpful in reducing the scope.
>>>>>>> Yes.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>


2021-12-15 21:33:36

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)

On Wed, Dec 15, 2021 at 12:52:20PM -0800, Si-Wei Liu wrote:
>
>
> On 12/14/2021 6:06 PM, Jason Wang wrote:
> > On Wed, Dec 15, 2021 at 9:05 AM Si-Wei Liu <[email protected]> wrote:
> > >
> > >
> > > On 12/13/2021 9:06 PM, Michael S. Tsirkin wrote:
> > > > On Mon, Dec 13, 2021 at 05:59:45PM -0800, Si-Wei Liu wrote:
> > > > > On 12/12/2021 1:26 AM, Michael S. Tsirkin wrote:
> > > > > > On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
> > > > > > > Sorry for reviving this ancient thread. I was kinda lost for the conclusion
> > > > > > > it ended up with. I have the following questions,
> > > > > > >
> > > > > > > 1. legacy guest support: from the past conversations it doesn't seem the
> > > > > > > support will be completely dropped from the table, is my understanding
> > > > > > > correct? Actually we're interested in supporting virtio v0.95 guest for x86,
> > > > > > > which is backed by the spec at
> > > > > > > https://urldefense.com/v3/__https://ozlabs.org/*rusty/virtio-spec/virtio-0.9.5.pdf__;fg!!ACWV5N9M2RV99hQ!dTKmzJwwRsFM7BtSuTDu1cNly5n4XCotH0WYmidzGqHSXt40i7ZU43UcNg7GYxZg$ . Though I'm not sure
> > > > > > > if there's request/need to support wilder legacy virtio versions earlier
> > > > > > > beyond.
> > > > > > I personally feel it's less work to add in kernel than try to
> > > > > > work around it in userspace. Jason feels differently.
> > > > > > Maybe post the patches and this will prove to Jason it's not
> > > > > > too terrible?
> > > > > I suppose if the vdpa vendor does support 0.95 in the datapath and ring
> > > > > layout level and is limited to x86 only, there should be easy way out.
> > > > Note a subtle difference: what matters is that guest, not host is x86.
> > > > Matters for emulators which might reorder memory accesses.
> > > > I guess this enforcement belongs in QEMU then?
> > > Right, I mean to get started, the initial guest driver support and the
> > > corresponding QEMU support for transitional vdpa backend can be limited
> > > to x86 guest/host only. Since the config space is emulated in QEMU, I
> > > suppose it's not hard to enforce in QEMU.
> > It's more than just config space, most devices have headers before the buffer.
> The ordering in datapath (data VQs) would have to rely on vendor's support.
> Since ORDER_PLATFORM is pretty new (v1.1), I guess vdpa h/w vendor nowadays
> can/should well support the case when ORDER_PLATFORM is not acked by the
> driver (actually this feature is filtered out by the QEMU vhost-vdpa driver
> today), even with v1.0 spec conforming and modern only vDPA device. The
> control VQ is implemented in software in the kernel, which can be easily
> accommodated/fixed when needed.
>
> >
> > > QEMU can drive GET_LEGACY,
> > > GET_ENDIAN et al ioctls in advance to get the capability from the
> > > individual vendor driver. For that, we need another negotiation protocol
> > > similar to vhost_user's protocol_features between the vdpa kernel and
> > > QEMU, way before the guest driver is ever probed and its feature
> > > negotiation kicks in. Not sure we need a GET_MEMORY_ORDER ioctl call
> > > from the device, but we can assume weak ordering for legacy at this
> > > point (x86 only)?
> > I'm lost here, we have get_features() so:
> I assume here you refer to get_device_features() that Eli just changed the
> name.
> >
> > 1) VERSION_1 means the device uses LE if provided, otherwise natvie
> > 2) ORDER_PLATFORM means device requires platform ordering
> >
> > Any reason for having a new API for this?
> Are you going to enforce all vDPA hardware vendors to support the
> transitional model for legacy guest? meaning guest not acknowledging
> VERSION_1 would use the legacy interfaces captured in the spec section 7.4
> (regarding ring layout, native endianness, message framing, vq alignment of
> 4096, 32bit feature, no features_ok bit in status, IO port interface i.e.
> all the things) instead? Noted we don't yet have a set_device_features()
> that allows the vdpa device to tell whether it is operating in transitional
> or modern-only mode. For software virtio, all support for the legacy part in
> a transitional model has been built up there already, however, it's not easy
> for vDPA vendors to implement all the requirements for an all-or-nothing
> legacy guest support (big endian guest for example). To these vendors, the
> legacy support within a transitional model is more of feature to them and
> it's best to leave some flexibility for them to implement partial support
> for legacy. That in turn calls out the need for a vhost-user protocol
> feature like negotiation API that can prohibit those unsupported guest
> setups to as early as backend_init before launching the VM.

Right. Of note is the fact that it's a spec bug which I
hope yet to fix, though due to existing guest code the
fix won't be complete.

WRT ioctls, One thing we can do though is abuse set_features
where it's called by QEMU early on with just the VERSION_1
bit set, to distinguish between legacy and modern
interface. This before config space accesses and FEATURES_OK.

Halil has been working on this, pls take a look and maybe help him out.

>
> >
> > > > > I
> > > > > checked with Eli and other Mellanox/NVDIA folks for hardware/firmware level
> > > > > 0.95 support, it seems all the ingredient had been there already dated back
> > > > > to the DPDK days. The only major thing limiting is in the vDPA software that
> > > > > the current vdpa core has the assumption around VIRTIO_F_ACCESS_PLATFORM for
> > > > > a few DMA setup ops, which is virtio 1.0 only.
> > > > >
> > > > > > > 2. suppose some form of legacy guest support needs to be there, how do we
> > > > > > > deal with the bogus assumption below in vdpa_get_config() in the short term?
> > > > > > > It looks one of the intuitive fix is to move the vdpa_set_features call out
> > > > > > > of vdpa_get_config() to vdpa_set_config().
> > > > > > >
> > > > > > > /*
> > > > > > > * Config accesses aren't supposed to trigger before features are
> > > > > > > set.
> > > > > > > * If it does happen we assume a legacy guest.
> > > > > > > */
> > > > > > > if (!vdev->features_valid)
> > > > > > > vdpa_set_features(vdev, 0);
> > > > > > > ops->get_config(vdev, offset, buf, len);
> > > > > > >
> > > > > > > I can post a patch to fix 2) if there's consensus already reached.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > -Siwei
> > > > > > I'm not sure how important it is to change that.
> > > > > > In any case it only affects transitional devices, right?
> > > > > > Legacy only should not care ...
> > > > > Yes I'd like to distinguish legacy driver (suppose it is 0.95) against the
> > > > > modern one in a transitional device model rather than being legacy only.
> > > > > That way a v0.95 and v1.0 supporting vdpa parent can support both types of
> > > > > guests without having to reconfigure. Or are you suggesting limit to legacy
> > > > > only at the time of vdpa creation would simplify the implementation a lot?
> > > > >
> > > > > Thanks,
> > > > > -Siwei
> > > > I don't know for sure. Take a look at the work Halil was doing
> > > > to try and support transitional devices with BE guests.
> > > Hmmm, we can have those endianness ioctls defined but the initial QEMU
> > > implementation can be started to support x86 guest/host with little
> > > endian and weak memory ordering first. The real trick is to detect
> > > legacy guest - I am not sure if it's feasible to shift all the legacy
> > > detection work to QEMU, or the kernel has to be part of the detection
> > > (e.g. the kick before DRIVER_OK thing we have to duplicate the tracking
> > > effort in QEMU) as well. Let me take a further look and get back.
> > Michael may think differently but I think doing this in Qemu is much easier.
> I think the key is whether we position emulating legacy interfaces in QEMU
> doing translation on top of a v1.0 modern-only device in the kernel, or we
> allow vdpa core (or you can say vhost-vdpa) and vendor driver to support a
> transitional model in the kernel that is able to work for both v0.95 and
> v1.0 drivers, with some slight aid from QEMU for
> detecting/emulation/shadowing (for e.g CVQ, I/O port relay). I guess for the
> former we still rely on vendor for a performant data vqs implementation,
> leaving the question to what it may end up eventually in the kernel is
> effectively the latter).
>
> Thanks,
> -Siwei


My suggestion is post the kernel patches, and we can evaluate
how much work they are.

> >
> > Thanks
> >
> >
> >
> > > Meanwhile, I'll check internally to see if a legacy only model would
> > > work. Thanks.
> > >
> > > Thanks,
> > > -Siwei
> > >
> > >
> > > >
> > > > > > > On 3/2/2021 2:53 AM, Jason Wang wrote:
> > > > > > > > On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
> > > > > > > > > On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
> > > > > > > > > > On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
> > > > > > > > > > > On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
> > > > > > > > > > > > > Detecting it isn't enough though, we will need a new ioctl to notify
> > > > > > > > > > > > > the kernel that it's a legacy guest. Ugh :(
> > > > > > > > > > > > Well, although I think adding an ioctl is doable, may I
> > > > > > > > > > > > know what the use
> > > > > > > > > > > > case there will be for kernel to leverage such info
> > > > > > > > > > > > directly? Is there a
> > > > > > > > > > > > case QEMU can't do with dedicate ioctls later if there's indeed
> > > > > > > > > > > > differentiation (legacy v.s. modern) needed?
> > > > > > > > > > > BTW a good API could be
> > > > > > > > > > >
> > > > > > > > > > > #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > > > > > > > > > > #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > > > > > > > > > >
> > > > > > > > > > > we did it per vring but maybe that was a mistake ...
> > > > > > > > > > Actually, I wonder whether it's good time to just not support
> > > > > > > > > > legacy driver
> > > > > > > > > > for vDPA. Consider:
> > > > > > > > > >
> > > > > > > > > > 1) It's definition is no-normative
> > > > > > > > > > 2) A lot of budren of codes
> > > > > > > > > >
> > > > > > > > > > So qemu can still present the legacy device since the config
> > > > > > > > > > space or other
> > > > > > > > > > stuffs that is presented by vhost-vDPA is not expected to be
> > > > > > > > > > accessed by
> > > > > > > > > > guest directly. Qemu can do the endian conversion when necessary
> > > > > > > > > > in this
> > > > > > > > > > case?
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > Overall I would be fine with this approach but we need to avoid breaking
> > > > > > > > > working userspace, qemu releases with vdpa support are out there and
> > > > > > > > > seem to work for people. Any changes need to take that into account
> > > > > > > > > and document compatibility concerns.
> > > > > > > > Agree, let me check.
> > > > > > > >
> > > > > > > >
> > > > > > > > > I note that any hardware
> > > > > > > > > implementation is already broken for legacy except on platforms with
> > > > > > > > > strong ordering which might be helpful in reducing the scope.
> > > > > > > > Yes.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > >


2021-12-16 02:02:15

by Si-Wei Liu

[permalink] [raw]
Subject: Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)



On 12/15/2021 1:33 PM, Michael S. Tsirkin wrote:
> On Wed, Dec 15, 2021 at 12:52:20PM -0800, Si-Wei Liu wrote:
>>
>> On 12/14/2021 6:06 PM, Jason Wang wrote:
>>> On Wed, Dec 15, 2021 at 9:05 AM Si-Wei Liu <[email protected]> wrote:
>>>>
>>>> On 12/13/2021 9:06 PM, Michael S. Tsirkin wrote:
>>>>> On Mon, Dec 13, 2021 at 05:59:45PM -0800, Si-Wei Liu wrote:
>>>>>> On 12/12/2021 1:26 AM, Michael S. Tsirkin wrote:
>>>>>>> On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
>>>>>>>> Sorry for reviving this ancient thread. I was kinda lost for the conclusion
>>>>>>>> it ended up with. I have the following questions,
>>>>>>>>
>>>>>>>> 1. legacy guest support: from the past conversations it doesn't seem the
>>>>>>>> support will be completely dropped from the table, is my understanding
>>>>>>>> correct? Actually we're interested in supporting virtio v0.95 guest for x86,
>>>>>>>> which is backed by the spec at
>>>>>>>> https://urldefense.com/v3/__https://ozlabs.org/*rusty/virtio-spec/virtio-0.9.5.pdf__;fg!!ACWV5N9M2RV99hQ!dTKmzJwwRsFM7BtSuTDu1cNly5n4XCotH0WYmidzGqHSXt40i7ZU43UcNg7GYxZg$ . Though I'm not sure
>>>>>>>> if there's request/need to support wilder legacy virtio versions earlier
>>>>>>>> beyond.
>>>>>>> I personally feel it's less work to add in kernel than try to
>>>>>>> work around it in userspace. Jason feels differently.
>>>>>>> Maybe post the patches and this will prove to Jason it's not
>>>>>>> too terrible?
>>>>>> I suppose if the vdpa vendor does support 0.95 in the datapath and ring
>>>>>> layout level and is limited to x86 only, there should be easy way out.
>>>>> Note a subtle difference: what matters is that guest, not host is x86.
>>>>> Matters for emulators which might reorder memory accesses.
>>>>> I guess this enforcement belongs in QEMU then?
>>>> Right, I mean to get started, the initial guest driver support and the
>>>> corresponding QEMU support for transitional vdpa backend can be limited
>>>> to x86 guest/host only. Since the config space is emulated in QEMU, I
>>>> suppose it's not hard to enforce in QEMU.
>>> It's more than just config space, most devices have headers before the buffer.
>> The ordering in datapath (data VQs) would have to rely on vendor's support.
>> Since ORDER_PLATFORM is pretty new (v1.1), I guess vdpa h/w vendor nowadays
>> can/should well support the case when ORDER_PLATFORM is not acked by the
>> driver (actually this feature is filtered out by the QEMU vhost-vdpa driver
>> today), even with v1.0 spec conforming and modern only vDPA device. The
>> control VQ is implemented in software in the kernel, which can be easily
>> accommodated/fixed when needed.
>>
>>>> QEMU can drive GET_LEGACY,
>>>> GET_ENDIAN et al ioctls in advance to get the capability from the
>>>> individual vendor driver. For that, we need another negotiation protocol
>>>> similar to vhost_user's protocol_features between the vdpa kernel and
>>>> QEMU, way before the guest driver is ever probed and its feature
>>>> negotiation kicks in. Not sure we need a GET_MEMORY_ORDER ioctl call
>>>> from the device, but we can assume weak ordering for legacy at this
>>>> point (x86 only)?
>>> I'm lost here, we have get_features() so:
>> I assume here you refer to get_device_features() that Eli just changed the
>> name.
>>> 1) VERSION_1 means the device uses LE if provided, otherwise natvie
>>> 2) ORDER_PLATFORM means device requires platform ordering
>>>
>>> Any reason for having a new API for this?
>> Are you going to enforce all vDPA hardware vendors to support the
>> transitional model for legacy guest? meaning guest not acknowledging
>> VERSION_1 would use the legacy interfaces captured in the spec section 7.4
>> (regarding ring layout, native endianness, message framing, vq alignment of
>> 4096, 32bit feature, no features_ok bit in status, IO port interface i.e.
>> all the things) instead? Noted we don't yet have a set_device_features()
>> that allows the vdpa device to tell whether it is operating in transitional
>> or modern-only mode. For software virtio, all support for the legacy part in
>> a transitional model has been built up there already, however, it's not easy
>> for vDPA vendors to implement all the requirements for an all-or-nothing
>> legacy guest support (big endian guest for example). To these vendors, the
>> legacy support within a transitional model is more of feature to them and
>> it's best to leave some flexibility for them to implement partial support
>> for legacy. That in turn calls out the need for a vhost-user protocol
>> feature like negotiation API that can prohibit those unsupported guest
>> setups to as early as backend_init before launching the VM.
> Right. Of note is the fact that it's a spec bug which I
> hope yet to fix, though due to existing guest code the
> fix won't be complete.
I thought at one point you pointed out to me that the spec does allow
config space read before claiming features_ok, and only config write
before features_ok is prohibited. I haven't read up the full thread of
Halil's VERSION_1 for transitional big endian device yet, but what is
the spec bug you hope to fix?

>
> WRT ioctls, One thing we can do though is abuse set_features
> where it's called by QEMU early on with just the VERSION_1
> bit set, to distinguish between legacy and modern
> interface. This before config space accesses and FEATURES_OK.
>
> Halil has been working on this, pls take a look and maybe help him out.
Interesting thread, am reading now and see how I may leverage or help there.

>>>>>> I
>>>>>> checked with Eli and other Mellanox/NVDIA folks for hardware/firmware level
>>>>>> 0.95 support, it seems all the ingredient had been there already dated back
>>>>>> to the DPDK days. The only major thing limiting is in the vDPA software that
>>>>>> the current vdpa core has the assumption around VIRTIO_F_ACCESS_PLATFORM for
>>>>>> a few DMA setup ops, which is virtio 1.0 only.
>>>>>>
>>>>>>>> 2. suppose some form of legacy guest support needs to be there, how do we
>>>>>>>> deal with the bogus assumption below in vdpa_get_config() in the short term?
>>>>>>>> It looks one of the intuitive fix is to move the vdpa_set_features call out
>>>>>>>> of vdpa_get_config() to vdpa_set_config().
>>>>>>>>
>>>>>>>> /*
>>>>>>>> * Config accesses aren't supposed to trigger before features are
>>>>>>>> set.
>>>>>>>> * If it does happen we assume a legacy guest.
>>>>>>>> */
>>>>>>>> if (!vdev->features_valid)
>>>>>>>> vdpa_set_features(vdev, 0);
>>>>>>>> ops->get_config(vdev, offset, buf, len);
>>>>>>>>
>>>>>>>> I can post a patch to fix 2) if there's consensus already reached.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> -Siwei
>>>>>>> I'm not sure how important it is to change that.
>>>>>>> In any case it only affects transitional devices, right?
>>>>>>> Legacy only should not care ...
>>>>>> Yes I'd like to distinguish legacy driver (suppose it is 0.95) against the
>>>>>> modern one in a transitional device model rather than being legacy only.
>>>>>> That way a v0.95 and v1.0 supporting vdpa parent can support both types of
>>>>>> guests without having to reconfigure. Or are you suggesting limit to legacy
>>>>>> only at the time of vdpa creation would simplify the implementation a lot?
>>>>>>
>>>>>> Thanks,
>>>>>> -Siwei
>>>>> I don't know for sure. Take a look at the work Halil was doing
>>>>> to try and support transitional devices with BE guests.
>>>> Hmmm, we can have those endianness ioctls defined but the initial QEMU
>>>> implementation can be started to support x86 guest/host with little
>>>> endian and weak memory ordering first. The real trick is to detect
>>>> legacy guest - I am not sure if it's feasible to shift all the legacy
>>>> detection work to QEMU, or the kernel has to be part of the detection
>>>> (e.g. the kick before DRIVER_OK thing we have to duplicate the tracking
>>>> effort in QEMU) as well. Let me take a further look and get back.
>>> Michael may think differently but I think doing this in Qemu is much easier.
>> I think the key is whether we position emulating legacy interfaces in QEMU
>> doing translation on top of a v1.0 modern-only device in the kernel, or we
>> allow vdpa core (or you can say vhost-vdpa) and vendor driver to support a
>> transitional model in the kernel that is able to work for both v0.95 and
>> v1.0 drivers, with some slight aid from QEMU for
>> detecting/emulation/shadowing (for e.g CVQ, I/O port relay). I guess for the
>> former we still rely on vendor for a performant data vqs implementation,
>> leaving the question to what it may end up eventually in the kernel is
>> effectively the latter).
>>
>> Thanks,
>> -Siwei
>
> My suggestion is post the kernel patches, and we can evaluate
> how much work they are.
Thanks for the feedback. I will take some read then get back, probably
after the winter break. Stay tuned.

Thanks,
-Siwei

>
>>> Thanks
>>>
>>>
>>>
>>>> Meanwhile, I'll check internally to see if a legacy only model would
>>>> work. Thanks.
>>>>
>>>> Thanks,
>>>> -Siwei
>>>>
>>>>
>>>>>>>> On 3/2/2021 2:53 AM, Jason Wang wrote:
>>>>>>>>> On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
>>>>>>>>>> On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
>>>>>>>>>>> On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
>>>>>>>>>>>> On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
>>>>>>>>>>>>>> Detecting it isn't enough though, we will need a new ioctl to notify
>>>>>>>>>>>>>> the kernel that it's a legacy guest. Ugh :(
>>>>>>>>>>>>> Well, although I think adding an ioctl is doable, may I
>>>>>>>>>>>>> know what the use
>>>>>>>>>>>>> case there will be for kernel to leverage such info
>>>>>>>>>>>>> directly? Is there a
>>>>>>>>>>>>> case QEMU can't do with dedicate ioctls later if there's indeed
>>>>>>>>>>>>> differentiation (legacy v.s. modern) needed?
>>>>>>>>>>>> BTW a good API could be
>>>>>>>>>>>>
>>>>>>>>>>>> #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
>>>>>>>>>>>> #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
>>>>>>>>>>>>
>>>>>>>>>>>> we did it per vring but maybe that was a mistake ...
>>>>>>>>>>> Actually, I wonder whether it's good time to just not support
>>>>>>>>>>> legacy driver
>>>>>>>>>>> for vDPA. Consider:
>>>>>>>>>>>
>>>>>>>>>>> 1) It's definition is no-normative
>>>>>>>>>>> 2) A lot of budren of codes
>>>>>>>>>>>
>>>>>>>>>>> So qemu can still present the legacy device since the config
>>>>>>>>>>> space or other
>>>>>>>>>>> stuffs that is presented by vhost-vDPA is not expected to be
>>>>>>>>>>> accessed by
>>>>>>>>>>> guest directly. Qemu can do the endian conversion when necessary
>>>>>>>>>>> in this
>>>>>>>>>>> case?
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>> Overall I would be fine with this approach but we need to avoid breaking
>>>>>>>>>> working userspace, qemu releases with vdpa support are out there and
>>>>>>>>>> seem to work for people. Any changes need to take that into account
>>>>>>>>>> and document compatibility concerns.
>>>>>>>>> Agree, let me check.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> I note that any hardware
>>>>>>>>>> implementation is already broken for legacy except on platforms with
>>>>>>>>>> strong ordering which might be helpful in reducing the scope.
>>>>>>>>> Yes.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>>


2021-12-16 02:53:36

by Jason Wang

[permalink] [raw]
Subject: Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)

On Thu, Dec 16, 2021 at 10:02 AM Si-Wei Liu <[email protected]> wrote:
>
>
>
> On 12/15/2021 1:33 PM, Michael S. Tsirkin wrote:
> > On Wed, Dec 15, 2021 at 12:52:20PM -0800, Si-Wei Liu wrote:
> >>
> >> On 12/14/2021 6:06 PM, Jason Wang wrote:
> >>> On Wed, Dec 15, 2021 at 9:05 AM Si-Wei Liu <[email protected]> wrote:
> >>>>
> >>>> On 12/13/2021 9:06 PM, Michael S. Tsirkin wrote:
> >>>>> On Mon, Dec 13, 2021 at 05:59:45PM -0800, Si-Wei Liu wrote:
> >>>>>> On 12/12/2021 1:26 AM, Michael S. Tsirkin wrote:
> >>>>>>> On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
> >>>>>>>> Sorry for reviving this ancient thread. I was kinda lost for the conclusion
> >>>>>>>> it ended up with. I have the following questions,
> >>>>>>>>
> >>>>>>>> 1. legacy guest support: from the past conversations it doesn't seem the
> >>>>>>>> support will be completely dropped from the table, is my understanding
> >>>>>>>> correct? Actually we're interested in supporting virtio v0.95 guest for x86,
> >>>>>>>> which is backed by the spec at
> >>>>>>>> https://urldefense.com/v3/__https://ozlabs.org/*rusty/virtio-spec/virtio-0.9.5.pdf__;fg!!ACWV5N9M2RV99hQ!dTKmzJwwRsFM7BtSuTDu1cNly5n4XCotH0WYmidzGqHSXt40i7ZU43UcNg7GYxZg$ . Though I'm not sure
> >>>>>>>> if there's request/need to support wilder legacy virtio versions earlier
> >>>>>>>> beyond.
> >>>>>>> I personally feel it's less work to add in kernel than try to
> >>>>>>> work around it in userspace. Jason feels differently.
> >>>>>>> Maybe post the patches and this will prove to Jason it's not
> >>>>>>> too terrible?
> >>>>>> I suppose if the vdpa vendor does support 0.95 in the datapath and ring
> >>>>>> layout level and is limited to x86 only, there should be easy way out.
> >>>>> Note a subtle difference: what matters is that guest, not host is x86.
> >>>>> Matters for emulators which might reorder memory accesses.
> >>>>> I guess this enforcement belongs in QEMU then?
> >>>> Right, I mean to get started, the initial guest driver support and the
> >>>> corresponding QEMU support for transitional vdpa backend can be limited
> >>>> to x86 guest/host only. Since the config space is emulated in QEMU, I
> >>>> suppose it's not hard to enforce in QEMU.
> >>> It's more than just config space, most devices have headers before the buffer.
> >> The ordering in datapath (data VQs) would have to rely on vendor's support.
> >> Since ORDER_PLATFORM is pretty new (v1.1), I guess vdpa h/w vendor nowadays
> >> can/should well support the case when ORDER_PLATFORM is not acked by the
> >> driver (actually this feature is filtered out by the QEMU vhost-vdpa driver
> >> today), even with v1.0 spec conforming and modern only vDPA device. The
> >> control VQ is implemented in software in the kernel, which can be easily
> >> accommodated/fixed when needed.
> >>
> >>>> QEMU can drive GET_LEGACY,
> >>>> GET_ENDIAN et al ioctls in advance to get the capability from the
> >>>> individual vendor driver. For that, we need another negotiation protocol
> >>>> similar to vhost_user's protocol_features between the vdpa kernel and
> >>>> QEMU, way before the guest driver is ever probed and its feature
> >>>> negotiation kicks in. Not sure we need a GET_MEMORY_ORDER ioctl call
> >>>> from the device, but we can assume weak ordering for legacy at this
> >>>> point (x86 only)?
> >>> I'm lost here, we have get_features() so:
> >> I assume here you refer to get_device_features() that Eli just changed the
> >> name.
> >>> 1) VERSION_1 means the device uses LE if provided, otherwise natvie
> >>> 2) ORDER_PLATFORM means device requires platform ordering
> >>>
> >>> Any reason for having a new API for this?
> >> Are you going to enforce all vDPA hardware vendors to support the
> >> transitional model for legacy guest?

Do we really have other choices?

I suspect the legacy device is never implemented by any vendor:

1) no virtio way to detect host endian
2) bypass IOMMU with translated requests
3) PIO port

Yes we have enp_vdpa, but it's more like a "transitional device" for
legacy only guests.

> meaning guest not acknowledging
> >> VERSION_1 would use the legacy interfaces captured in the spec section 7.4
> >> (regarding ring layout, native endianness, message framing, vq alignment of
> >> 4096, 32bit feature, no features_ok bit in status, IO port interface i.e.
> >> all the things) instead?

Note that we only care about the datapath, control path is mediated anyhow.

So feature_ok and IO port isn't an issue. The rest looks like a must
for the hardware.

> Noted we don't yet have a set_device_features()
> >> that allows the vdpa device to tell whether it is operating in transitional
> >> or modern-only mode.

So the device feature should be provisioned via the netlink protocol.
And what we want is not "set_device_feature()" but
"set_device_mandatory_feautre()", then the parent can choose to fail
the negotiation when VERSION_1 is not negotiated. Qemu then knows for
sure it talks to a transitional device or modern only device.

Thanks

> For software virtio, all support for the legacy part in
> >> a transitional model has been built up there already, however, it's not easy
> >> for vDPA vendors to implement all the requirements for an all-or-nothing
> >> legacy guest support (big endian guest for example). To these vendors, the
> >> legacy support within a transitional model is more of feature to them and
> >> it's best to leave some flexibility for them to implement partial support
> >> for legacy. That in turn calls out the need for a vhost-user protocol
> >> feature like negotiation API that can prohibit those unsupported guest
> >> setups to as early as backend_init before launching the VM.
> > Right. Of note is the fact that it's a spec bug which I
> > hope yet to fix, though due to existing guest code the
> > fix won't be complete.
> I thought at one point you pointed out to me that the spec does allow
> config space read before claiming features_ok, and only config write
> before features_ok is prohibited. I haven't read up the full thread of
> Halil's VERSION_1 for transitional big endian device yet, but what is
> the spec bug you hope to fix?
>
> >
> > WRT ioctls, One thing we can do though is abuse set_features
> > where it's called by QEMU early on with just the VERSION_1
> > bit set, to distinguish between legacy and modern
> > interface. This before config space accesses and FEATURES_OK.
> >
> > Halil has been working on this, pls take a look and maybe help him out.
> Interesting thread, am reading now and see how I may leverage or help there.
>
> >>>>>> I
> >>>>>> checked with Eli and other Mellanox/NVDIA folks for hardware/firmware level
> >>>>>> 0.95 support, it seems all the ingredient had been there already dated back
> >>>>>> to the DPDK days. The only major thing limiting is in the vDPA software that
> >>>>>> the current vdpa core has the assumption around VIRTIO_F_ACCESS_PLATFORM for
> >>>>>> a few DMA setup ops, which is virtio 1.0 only.
> >>>>>>
> >>>>>>>> 2. suppose some form of legacy guest support needs to be there, how do we
> >>>>>>>> deal with the bogus assumption below in vdpa_get_config() in the short term?
> >>>>>>>> It looks one of the intuitive fix is to move the vdpa_set_features call out
> >>>>>>>> of vdpa_get_config() to vdpa_set_config().
> >>>>>>>>
> >>>>>>>> /*
> >>>>>>>> * Config accesses aren't supposed to trigger before features are
> >>>>>>>> set.
> >>>>>>>> * If it does happen we assume a legacy guest.
> >>>>>>>> */
> >>>>>>>> if (!vdev->features_valid)
> >>>>>>>> vdpa_set_features(vdev, 0);
> >>>>>>>> ops->get_config(vdev, offset, buf, len);
> >>>>>>>>
> >>>>>>>> I can post a patch to fix 2) if there's consensus already reached.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> -Siwei
> >>>>>>> I'm not sure how important it is to change that.
> >>>>>>> In any case it only affects transitional devices, right?
> >>>>>>> Legacy only should not care ...
> >>>>>> Yes I'd like to distinguish legacy driver (suppose it is 0.95) against the
> >>>>>> modern one in a transitional device model rather than being legacy only.
> >>>>>> That way a v0.95 and v1.0 supporting vdpa parent can support both types of
> >>>>>> guests without having to reconfigure. Or are you suggesting limit to legacy
> >>>>>> only at the time of vdpa creation would simplify the implementation a lot?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> -Siwei
> >>>>> I don't know for sure. Take a look at the work Halil was doing
> >>>>> to try and support transitional devices with BE guests.
> >>>> Hmmm, we can have those endianness ioctls defined but the initial QEMU
> >>>> implementation can be started to support x86 guest/host with little
> >>>> endian and weak memory ordering first. The real trick is to detect
> >>>> legacy guest - I am not sure if it's feasible to shift all the legacy
> >>>> detection work to QEMU, or the kernel has to be part of the detection
> >>>> (e.g. the kick before DRIVER_OK thing we have to duplicate the tracking
> >>>> effort in QEMU) as well. Let me take a further look and get back.
> >>> Michael may think differently but I think doing this in Qemu is much easier.
> >> I think the key is whether we position emulating legacy interfaces in QEMU
> >> doing translation on top of a v1.0 modern-only device in the kernel, or we
> >> allow vdpa core (or you can say vhost-vdpa) and vendor driver to support a
> >> transitional model in the kernel that is able to work for both v0.95 and
> >> v1.0 drivers, with some slight aid from QEMU for
> >> detecting/emulation/shadowing (for e.g CVQ, I/O port relay). I guess for the
> >> former we still rely on vendor for a performant data vqs implementation,
> >> leaving the question to what it may end up eventually in the kernel is
> >> effectively the latter).
> >>
> >> Thanks,
> >> -Siwei
> >
> > My suggestion is post the kernel patches, and we can evaluate
> > how much work they are.
> Thanks for the feedback. I will take some read then get back, probably
> after the winter break. Stay tuned.
>
> Thanks,
> -Siwei
>
> >
> >>> Thanks
> >>>
> >>>
> >>>
> >>>> Meanwhile, I'll check internally to see if a legacy only model would
> >>>> work. Thanks.
> >>>>
> >>>> Thanks,
> >>>> -Siwei
> >>>>
> >>>>
> >>>>>>>> On 3/2/2021 2:53 AM, Jason Wang wrote:
> >>>>>>>>> On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
> >>>>>>>>>> On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
> >>>>>>>>>>> On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
> >>>>>>>>>>>> On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
> >>>>>>>>>>>>>> Detecting it isn't enough though, we will need a new ioctl to notify
> >>>>>>>>>>>>>> the kernel that it's a legacy guest. Ugh :(
> >>>>>>>>>>>>> Well, although I think adding an ioctl is doable, may I
> >>>>>>>>>>>>> know what the use
> >>>>>>>>>>>>> case there will be for kernel to leverage such info
> >>>>>>>>>>>>> directly? Is there a
> >>>>>>>>>>>>> case QEMU can't do with dedicate ioctls later if there's indeed
> >>>>>>>>>>>>> differentiation (legacy v.s. modern) needed?
> >>>>>>>>>>>> BTW a good API could be
> >>>>>>>>>>>>
> >>>>>>>>>>>> #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> >>>>>>>>>>>> #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> >>>>>>>>>>>>
> >>>>>>>>>>>> we did it per vring but maybe that was a mistake ...
> >>>>>>>>>>> Actually, I wonder whether it's good time to just not support
> >>>>>>>>>>> legacy driver
> >>>>>>>>>>> for vDPA. Consider:
> >>>>>>>>>>>
> >>>>>>>>>>> 1) It's definition is no-normative
> >>>>>>>>>>> 2) A lot of budren of codes
> >>>>>>>>>>>
> >>>>>>>>>>> So qemu can still present the legacy device since the config
> >>>>>>>>>>> space or other
> >>>>>>>>>>> stuffs that is presented by vhost-vDPA is not expected to be
> >>>>>>>>>>> accessed by
> >>>>>>>>>>> guest directly. Qemu can do the endian conversion when necessary
> >>>>>>>>>>> in this
> >>>>>>>>>>> case?
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks
> >>>>>>>>>>>
> >>>>>>>>>> Overall I would be fine with this approach but we need to avoid breaking
> >>>>>>>>>> working userspace, qemu releases with vdpa support are out there and
> >>>>>>>>>> seem to work for people. Any changes need to take that into account
> >>>>>>>>>> and document compatibility concerns.
> >>>>>>>>> Agree, let me check.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> I note that any hardware
> >>>>>>>>>> implementation is already broken for legacy except on platforms with
> >>>>>>>>>> strong ordering which might be helpful in reducing the scope.
> >>>>>>>>> Yes.
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>>
> >>>>>>>>>
>


2021-12-16 03:43:54

by Jason Wang

[permalink] [raw]
Subject: Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)

On Thu, Dec 16, 2021 at 4:52 AM Si-Wei Liu <[email protected]> wrote:
>
>
>
> On 12/14/2021 6:06 PM, Jason Wang wrote:
> > On Wed, Dec 15, 2021 at 9:05 AM Si-Wei Liu <[email protected]> wrote:
> >>
> >>
> >> On 12/13/2021 9:06 PM, Michael S. Tsirkin wrote:
> >>> On Mon, Dec 13, 2021 at 05:59:45PM -0800, Si-Wei Liu wrote:
> >>>> On 12/12/2021 1:26 AM, Michael S. Tsirkin wrote:
> >>>>> On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
> >>>>>> Sorry for reviving this ancient thread. I was kinda lost for the conclusion
> >>>>>> it ended up with. I have the following questions,
> >>>>>>
> >>>>>> 1. legacy guest support: from the past conversations it doesn't seem the
> >>>>>> support will be completely dropped from the table, is my understanding
> >>>>>> correct? Actually we're interested in supporting virtio v0.95 guest for x86,
> >>>>>> which is backed by the spec at
> >>>>>> https://urldefense.com/v3/__https://ozlabs.org/*rusty/virtio-spec/virtio-0.9.5.pdf__;fg!!ACWV5N9M2RV99hQ!dTKmzJwwRsFM7BtSuTDu1cNly5n4XCotH0WYmidzGqHSXt40i7ZU43UcNg7GYxZg$ . Though I'm not sure
> >>>>>> if there's request/need to support wilder legacy virtio versions earlier
> >>>>>> beyond.
> >>>>> I personally feel it's less work to add in kernel than try to
> >>>>> work around it in userspace. Jason feels differently.
> >>>>> Maybe post the patches and this will prove to Jason it's not
> >>>>> too terrible?
> >>>> I suppose if the vdpa vendor does support 0.95 in the datapath and ring
> >>>> layout level and is limited to x86 only, there should be easy way out.
> >>> Note a subtle difference: what matters is that guest, not host is x86.
> >>> Matters for emulators which might reorder memory accesses.
> >>> I guess this enforcement belongs in QEMU then?
> >> Right, I mean to get started, the initial guest driver support and the
> >> corresponding QEMU support for transitional vdpa backend can be limited
> >> to x86 guest/host only. Since the config space is emulated in QEMU, I
> >> suppose it's not hard to enforce in QEMU.
> > It's more than just config space, most devices have headers before the buffer.
> The ordering in datapath (data VQs) would have to rely on vendor's
> support. Since ORDER_PLATFORM is pretty new (v1.1), I guess vdpa h/w
> vendor nowadays can/should well support the case when ORDER_PLATFORM is
> not acked by the driver (actually this feature is filtered out by the
> QEMU vhost-vdpa driver today), even with v1.0 spec conforming and modern
> only vDPA device.

That's a bug that needs to be fixed.

> The control VQ is implemented in software in the
> kernel, which can be easily accommodated/fixed when needed.
>
> >
> >> QEMU can drive GET_LEGACY,
> >> GET_ENDIAN et al ioctls in advance to get the capability from the
> >> individual vendor driver. For that, we need another negotiation protocol
> >> similar to vhost_user's protocol_features between the vdpa kernel and
> >> QEMU, way before the guest driver is ever probed and its feature
> >> negotiation kicks in. Not sure we need a GET_MEMORY_ORDER ioctl call
> >> from the device, but we can assume weak ordering for legacy at this
> >> point (x86 only)?
> > I'm lost here, we have get_features() so:
> I assume here you refer to get_device_features() that Eli just changed
> the name.
> >
> > 1) VERSION_1 means the device uses LE if provided, otherwise natvie
> > 2) ORDER_PLATFORM means device requires platform ordering
> >
> > Any reason for having a new API for this?
> Are you going to enforce all vDPA hardware vendors to support the
> transitional model for legacy guest? meaning guest not acknowledging
> VERSION_1 would use the legacy interfaces captured in the spec section
> 7.4 (regarding ring layout, native endianness, message framing, vq
> alignment of 4096, 32bit feature, no features_ok bit in status, IO port
> interface i.e. all the things) instead? Noted we don't yet have a
> set_device_features() that allows the vdpa device to tell whether it is
> operating in transitional or modern-only mode. For software virtio, all
> support for the legacy part in a transitional model has been built up
> there already, however, it's not easy for vDPA vendors to implement all
> the requirements for an all-or-nothing legacy guest support (big endian
> guest for example). To these vendors, the legacy support within a
> transitional model is more of feature to them and it's best to leave
> some flexibility for them to implement partial support for legacy. That
> in turn calls out the need for a vhost-user protocol feature like
> negotiation API that can prohibit those unsupported guest setups to as
> early as backend_init before launching the VM.
>
>
> >
> >>>> I
> >>>> checked with Eli and other Mellanox/NVDIA folks for hardware/firmware level
> >>>> 0.95 support, it seems all the ingredient had been there already dated back
> >>>> to the DPDK days. The only major thing limiting is in the vDPA software that
> >>>> the current vdpa core has the assumption around VIRTIO_F_ACCESS_PLATFORM for
> >>>> a few DMA setup ops, which is virtio 1.0 only.
> >>>>
> >>>>>> 2. suppose some form of legacy guest support needs to be there, how do we
> >>>>>> deal with the bogus assumption below in vdpa_get_config() in the short term?
> >>>>>> It looks one of the intuitive fix is to move the vdpa_set_features call out
> >>>>>> of vdpa_get_config() to vdpa_set_config().
> >>>>>>
> >>>>>> /*
> >>>>>> * Config accesses aren't supposed to trigger before features are
> >>>>>> set.
> >>>>>> * If it does happen we assume a legacy guest.
> >>>>>> */
> >>>>>> if (!vdev->features_valid)
> >>>>>> vdpa_set_features(vdev, 0);
> >>>>>> ops->get_config(vdev, offset, buf, len);
> >>>>>>
> >>>>>> I can post a patch to fix 2) if there's consensus already reached.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> -Siwei
> >>>>> I'm not sure how important it is to change that.
> >>>>> In any case it only affects transitional devices, right?
> >>>>> Legacy only should not care ...
> >>>> Yes I'd like to distinguish legacy driver (suppose it is 0.95) against the
> >>>> modern one in a transitional device model rather than being legacy only.
> >>>> That way a v0.95 and v1.0 supporting vdpa parent can support both types of
> >>>> guests without having to reconfigure. Or are you suggesting limit to legacy
> >>>> only at the time of vdpa creation would simplify the implementation a lot?
> >>>>
> >>>> Thanks,
> >>>> -Siwei
> >>> I don't know for sure. Take a look at the work Halil was doing
> >>> to try and support transitional devices with BE guests.
> >> Hmmm, we can have those endianness ioctls defined but the initial QEMU
> >> implementation can be started to support x86 guest/host with little
> >> endian and weak memory ordering first. The real trick is to detect
> >> legacy guest - I am not sure if it's feasible to shift all the legacy
> >> detection work to QEMU, or the kernel has to be part of the detection
> >> (e.g. the kick before DRIVER_OK thing we have to duplicate the tracking
> >> effort in QEMU) as well. Let me take a further look and get back.
> > Michael may think differently but I think doing this in Qemu is much easier.
> I think the key is whether we position emulating legacy interfaces in
> QEMU doing translation on top of a v1.0 modern-only device in the
> kernel, or we allow vdpa core (or you can say vhost-vdpa) and vendor
> driver to support a transitional model in the kernel that is able to
> work for both v0.95 and v1.0 drivers, with some slight aid from QEMU for
> detecting/emulation/shadowing (for e.g CVQ, I/O port relay). I guess for
> the former we still rely on vendor for a performant data vqs
> implementation, leaving the question to what it may end up eventually in
> the kernel is effectively the latter).

I think we can do the legacy interface emulation on top of the shadow
VQ. And we know it works for sure. But I agree, it would be much
easier if we depend on the vendor to implement a transitional device.

So assuming we depend on the vendor, I don't see anything that is
strictly needed in the kernel, the kick or config access before
DRIVER_OK can all be handled easily in Qemu unless I miss something.
The only value to do that in the kernel is that it can work for
virtio-vdpa, but modern only virito-vpda is sufficient; we don't need
any legacy stuff for that.

Thanks

>
> Thanks,
> -Siwei
>
> >
> > Thanks
> >
> >
> >
> >> Meanwhile, I'll check internally to see if a legacy only model would
> >> work. Thanks.
> >>
> >> Thanks,
> >> -Siwei
> >>
> >>
> >>>
> >>>>>> On 3/2/2021 2:53 AM, Jason Wang wrote:
> >>>>>>> On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
> >>>>>>>> On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
> >>>>>>>>> On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
> >>>>>>>>>> On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
> >>>>>>>>>>>> Detecting it isn't enough though, we will need a new ioctl to notify
> >>>>>>>>>>>> the kernel that it's a legacy guest. Ugh :(
> >>>>>>>>>>> Well, although I think adding an ioctl is doable, may I
> >>>>>>>>>>> know what the use
> >>>>>>>>>>> case there will be for kernel to leverage such info
> >>>>>>>>>>> directly? Is there a
> >>>>>>>>>>> case QEMU can't do with dedicate ioctls later if there's indeed
> >>>>>>>>>>> differentiation (legacy v.s. modern) needed?
> >>>>>>>>>> BTW a good API could be
> >>>>>>>>>>
> >>>>>>>>>> #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> >>>>>>>>>> #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> >>>>>>>>>>
> >>>>>>>>>> we did it per vring but maybe that was a mistake ...
> >>>>>>>>> Actually, I wonder whether it's good time to just not support
> >>>>>>>>> legacy driver
> >>>>>>>>> for vDPA. Consider:
> >>>>>>>>>
> >>>>>>>>> 1) It's definition is no-normative
> >>>>>>>>> 2) A lot of budren of codes
> >>>>>>>>>
> >>>>>>>>> So qemu can still present the legacy device since the config
> >>>>>>>>> space or other
> >>>>>>>>> stuffs that is presented by vhost-vDPA is not expected to be
> >>>>>>>>> accessed by
> >>>>>>>>> guest directly. Qemu can do the endian conversion when necessary
> >>>>>>>>> in this
> >>>>>>>>> case?
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>>
> >>>>>>>> Overall I would be fine with this approach but we need to avoid breaking
> >>>>>>>> working userspace, qemu releases with vdpa support are out there and
> >>>>>>>> seem to work for people. Any changes need to take that into account
> >>>>>>>> and document compatibility concerns.
> >>>>>>> Agree, let me check.
> >>>>>>>
> >>>>>>>
> >>>>>>>> I note that any hardware
> >>>>>>>> implementation is already broken for legacy except on platforms with
> >>>>>>>> strong ordering which might be helpful in reducing the scope.
> >>>>>>> Yes.
> >>>>>>>
> >>>>>>> Thanks
> >>>>>>>
> >>>>>>>
>


2021-12-16 06:35:17

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)

On Wed, Dec 15, 2021 at 06:01:55PM -0800, Si-Wei Liu wrote:
>
>
> On 12/15/2021 1:33 PM, Michael S. Tsirkin wrote:
> > On Wed, Dec 15, 2021 at 12:52:20PM -0800, Si-Wei Liu wrote:
> > >
> > > On 12/14/2021 6:06 PM, Jason Wang wrote:
> > > > On Wed, Dec 15, 2021 at 9:05 AM Si-Wei Liu <[email protected]> wrote:
> > > > >
> > > > > On 12/13/2021 9:06 PM, Michael S. Tsirkin wrote:
> > > > > > On Mon, Dec 13, 2021 at 05:59:45PM -0800, Si-Wei Liu wrote:
> > > > > > > On 12/12/2021 1:26 AM, Michael S. Tsirkin wrote:
> > > > > > > > On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
> > > > > > > > > Sorry for reviving this ancient thread. I was kinda lost for the conclusion
> > > > > > > > > it ended up with. I have the following questions,
> > > > > > > > >
> > > > > > > > > 1. legacy guest support: from the past conversations it doesn't seem the
> > > > > > > > > support will be completely dropped from the table, is my understanding
> > > > > > > > > correct? Actually we're interested in supporting virtio v0.95 guest for x86,
> > > > > > > > > which is backed by the spec at
> > > > > > > > > https://urldefense.com/v3/__https://ozlabs.org/*rusty/virtio-spec/virtio-0.9.5.pdf__;fg!!ACWV5N9M2RV99hQ!dTKmzJwwRsFM7BtSuTDu1cNly5n4XCotH0WYmidzGqHSXt40i7ZU43UcNg7GYxZg$ . Though I'm not sure
> > > > > > > > > if there's request/need to support wilder legacy virtio versions earlier
> > > > > > > > > beyond.
> > > > > > > > I personally feel it's less work to add in kernel than try to
> > > > > > > > work around it in userspace. Jason feels differently.
> > > > > > > > Maybe post the patches and this will prove to Jason it's not
> > > > > > > > too terrible?
> > > > > > > I suppose if the vdpa vendor does support 0.95 in the datapath and ring
> > > > > > > layout level and is limited to x86 only, there should be easy way out.
> > > > > > Note a subtle difference: what matters is that guest, not host is x86.
> > > > > > Matters for emulators which might reorder memory accesses.
> > > > > > I guess this enforcement belongs in QEMU then?
> > > > > Right, I mean to get started, the initial guest driver support and the
> > > > > corresponding QEMU support for transitional vdpa backend can be limited
> > > > > to x86 guest/host only. Since the config space is emulated in QEMU, I
> > > > > suppose it's not hard to enforce in QEMU.
> > > > It's more than just config space, most devices have headers before the buffer.
> > > The ordering in datapath (data VQs) would have to rely on vendor's support.
> > > Since ORDER_PLATFORM is pretty new (v1.1), I guess vdpa h/w vendor nowadays
> > > can/should well support the case when ORDER_PLATFORM is not acked by the
> > > driver (actually this feature is filtered out by the QEMU vhost-vdpa driver
> > > today), even with v1.0 spec conforming and modern only vDPA device. The
> > > control VQ is implemented in software in the kernel, which can be easily
> > > accommodated/fixed when needed.
> > >
> > > > > QEMU can drive GET_LEGACY,
> > > > > GET_ENDIAN et al ioctls in advance to get the capability from the
> > > > > individual vendor driver. For that, we need another negotiation protocol
> > > > > similar to vhost_user's protocol_features between the vdpa kernel and
> > > > > QEMU, way before the guest driver is ever probed and its feature
> > > > > negotiation kicks in. Not sure we need a GET_MEMORY_ORDER ioctl call
> > > > > from the device, but we can assume weak ordering for legacy at this
> > > > > point (x86 only)?
> > > > I'm lost here, we have get_features() so:
> > > I assume here you refer to get_device_features() that Eli just changed the
> > > name.
> > > > 1) VERSION_1 means the device uses LE if provided, otherwise natvie
> > > > 2) ORDER_PLATFORM means device requires platform ordering
> > > >
> > > > Any reason for having a new API for this?
> > > Are you going to enforce all vDPA hardware vendors to support the
> > > transitional model for legacy guest? meaning guest not acknowledging
> > > VERSION_1 would use the legacy interfaces captured in the spec section 7.4
> > > (regarding ring layout, native endianness, message framing, vq alignment of
> > > 4096, 32bit feature, no features_ok bit in status, IO port interface i.e.
> > > all the things) instead? Noted we don't yet have a set_device_features()
> > > that allows the vdpa device to tell whether it is operating in transitional
> > > or modern-only mode. For software virtio, all support for the legacy part in
> > > a transitional model has been built up there already, however, it's not easy
> > > for vDPA vendors to implement all the requirements for an all-or-nothing
> > > legacy guest support (big endian guest for example). To these vendors, the
> > > legacy support within a transitional model is more of feature to them and
> > > it's best to leave some flexibility for them to implement partial support
> > > for legacy. That in turn calls out the need for a vhost-user protocol
> > > feature like negotiation API that can prohibit those unsupported guest
> > > setups to as early as backend_init before launching the VM.
> > Right. Of note is the fact that it's a spec bug which I
> > hope yet to fix, though due to existing guest code the
> > fix won't be complete.
> I thought at one point you pointed out to me that the spec does allow config
> space read before claiming features_ok, and only config write before
> features_ok is prohibited. I haven't read up the full thread of Halil's
> VERSION_1 for transitional big endian device yet, but what is the spec bug
> you hope to fix?

Allowing config space reads before features_ok seemed useful years ago
but in practice is only causing bugs and complicating device design.

>
> >
> > WRT ioctls, One thing we can do though is abuse set_features
> > where it's called by QEMU early on with just the VERSION_1
> > bit set, to distinguish between legacy and modern
> > interface. This before config space accesses and FEATURES_OK.
> >
> > Halil has been working on this, pls take a look and maybe help him out.
> Interesting thread, am reading now and see how I may leverage or help there.
>
> > > > > > > I
> > > > > > > checked with Eli and other Mellanox/NVDIA folks for hardware/firmware level
> > > > > > > 0.95 support, it seems all the ingredient had been there already dated back
> > > > > > > to the DPDK days. The only major thing limiting is in the vDPA software that
> > > > > > > the current vdpa core has the assumption around VIRTIO_F_ACCESS_PLATFORM for
> > > > > > > a few DMA setup ops, which is virtio 1.0 only.
> > > > > > >
> > > > > > > > > 2. suppose some form of legacy guest support needs to be there, how do we
> > > > > > > > > deal with the bogus assumption below in vdpa_get_config() in the short term?
> > > > > > > > > It looks one of the intuitive fix is to move the vdpa_set_features call out
> > > > > > > > > of vdpa_get_config() to vdpa_set_config().
> > > > > > > > >
> > > > > > > > > /*
> > > > > > > > > * Config accesses aren't supposed to trigger before features are
> > > > > > > > > set.
> > > > > > > > > * If it does happen we assume a legacy guest.
> > > > > > > > > */
> > > > > > > > > if (!vdev->features_valid)
> > > > > > > > > vdpa_set_features(vdev, 0);
> > > > > > > > > ops->get_config(vdev, offset, buf, len);
> > > > > > > > >
> > > > > > > > > I can post a patch to fix 2) if there's consensus already reached.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > -Siwei
> > > > > > > > I'm not sure how important it is to change that.
> > > > > > > > In any case it only affects transitional devices, right?
> > > > > > > > Legacy only should not care ...
> > > > > > > Yes I'd like to distinguish legacy driver (suppose it is 0.95) against the
> > > > > > > modern one in a transitional device model rather than being legacy only.
> > > > > > > That way a v0.95 and v1.0 supporting vdpa parent can support both types of
> > > > > > > guests without having to reconfigure. Or are you suggesting limit to legacy
> > > > > > > only at the time of vdpa creation would simplify the implementation a lot?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > -Siwei
> > > > > > I don't know for sure. Take a look at the work Halil was doing
> > > > > > to try and support transitional devices with BE guests.
> > > > > Hmmm, we can have those endianness ioctls defined but the initial QEMU
> > > > > implementation can be started to support x86 guest/host with little
> > > > > endian and weak memory ordering first. The real trick is to detect
> > > > > legacy guest - I am not sure if it's feasible to shift all the legacy
> > > > > detection work to QEMU, or the kernel has to be part of the detection
> > > > > (e.g. the kick before DRIVER_OK thing we have to duplicate the tracking
> > > > > effort in QEMU) as well. Let me take a further look and get back.
> > > > Michael may think differently but I think doing this in Qemu is much easier.
> > > I think the key is whether we position emulating legacy interfaces in QEMU
> > > doing translation on top of a v1.0 modern-only device in the kernel, or we
> > > allow vdpa core (or you can say vhost-vdpa) and vendor driver to support a
> > > transitional model in the kernel that is able to work for both v0.95 and
> > > v1.0 drivers, with some slight aid from QEMU for
> > > detecting/emulation/shadowing (for e.g CVQ, I/O port relay). I guess for the
> > > former we still rely on vendor for a performant data vqs implementation,
> > > leaving the question to what it may end up eventually in the kernel is
> > > effectively the latter).
> > >
> > > Thanks,
> > > -Siwei
> >
> > My suggestion is post the kernel patches, and we can evaluate
> > how much work they are.
> Thanks for the feedback. I will take some read then get back, probably after
> the winter break. Stay tuned.
>
> Thanks,
> -Siwei
>
> >
> > > > Thanks
> > > >
> > > >
> > > >
> > > > > Meanwhile, I'll check internally to see if a legacy only model would
> > > > > work. Thanks.
> > > > >
> > > > > Thanks,
> > > > > -Siwei
> > > > >
> > > > >
> > > > > > > > > On 3/2/2021 2:53 AM, Jason Wang wrote:
> > > > > > > > > > On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
> > > > > > > > > > > On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
> > > > > > > > > > > > On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
> > > > > > > > > > > > > On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
> > > > > > > > > > > > > > > Detecting it isn't enough though, we will need a new ioctl to notify
> > > > > > > > > > > > > > > the kernel that it's a legacy guest. Ugh :(
> > > > > > > > > > > > > > Well, although I think adding an ioctl is doable, may I
> > > > > > > > > > > > > > know what the use
> > > > > > > > > > > > > > case there will be for kernel to leverage such info
> > > > > > > > > > > > > > directly? Is there a
> > > > > > > > > > > > > > case QEMU can't do with dedicate ioctls later if there's indeed
> > > > > > > > > > > > > > differentiation (legacy v.s. modern) needed?
> > > > > > > > > > > > > BTW a good API could be
> > > > > > > > > > > > >
> > > > > > > > > > > > > #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > > > > > > > > > > > > #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > > > > > > > > > > > >
> > > > > > > > > > > > > we did it per vring but maybe that was a mistake ...
> > > > > > > > > > > > Actually, I wonder whether it's good time to just not support
> > > > > > > > > > > > legacy driver
> > > > > > > > > > > > for vDPA. Consider:
> > > > > > > > > > > >
> > > > > > > > > > > > 1) It's definition is no-normative
> > > > > > > > > > > > 2) A lot of budren of codes
> > > > > > > > > > > >
> > > > > > > > > > > > So qemu can still present the legacy device since the config
> > > > > > > > > > > > space or other
> > > > > > > > > > > > stuffs that is presented by vhost-vDPA is not expected to be
> > > > > > > > > > > > accessed by
> > > > > > > > > > > > guest directly. Qemu can do the endian conversion when necessary
> > > > > > > > > > > > in this
> > > > > > > > > > > > case?
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks
> > > > > > > > > > > >
> > > > > > > > > > > Overall I would be fine with this approach but we need to avoid breaking
> > > > > > > > > > > working userspace, qemu releases with vdpa support are out there and
> > > > > > > > > > > seem to work for people. Any changes need to take that into account
> > > > > > > > > > > and document compatibility concerns.
> > > > > > > > > > Agree, let me check.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > I note that any hardware
> > > > > > > > > > > implementation is already broken for legacy except on platforms with
> > > > > > > > > > > strong ordering which might be helpful in reducing the scope.
> > > > > > > > > > Yes.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > >


2021-12-16 22:32:51

by Si-Wei Liu

[permalink] [raw]
Subject: Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)



On 12/15/2021 6:53 PM, Jason Wang wrote:
> On Thu, Dec 16, 2021 at 10:02 AM Si-Wei Liu <[email protected]> wrote:
>>
>>
>> On 12/15/2021 1:33 PM, Michael S. Tsirkin wrote:
>>> On Wed, Dec 15, 2021 at 12:52:20PM -0800, Si-Wei Liu wrote:
>>>> On 12/14/2021 6:06 PM, Jason Wang wrote:
>>>>> On Wed, Dec 15, 2021 at 9:05 AM Si-Wei Liu <[email protected]> wrote:
>>>>>> On 12/13/2021 9:06 PM, Michael S. Tsirkin wrote:
>>>>>>> On Mon, Dec 13, 2021 at 05:59:45PM -0800, Si-Wei Liu wrote:
>>>>>>>> On 12/12/2021 1:26 AM, Michael S. Tsirkin wrote:
>>>>>>>>> On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
>>>>>>>>>> Sorry for reviving this ancient thread. I was kinda lost for the conclusion
>>>>>>>>>> it ended up with. I have the following questions,
>>>>>>>>>>
>>>>>>>>>> 1. legacy guest support: from the past conversations it doesn't seem the
>>>>>>>>>> support will be completely dropped from the table, is my understanding
>>>>>>>>>> correct? Actually we're interested in supporting virtio v0.95 guest for x86,
>>>>>>>>>> which is backed by the spec at
>>>>>>>>>> https://urldefense.com/v3/__https://ozlabs.org/*rusty/virtio-spec/virtio-0.9.5.pdf__;fg!!ACWV5N9M2RV99hQ!dTKmzJwwRsFM7BtSuTDu1cNly5n4XCotH0WYmidzGqHSXt40i7ZU43UcNg7GYxZg$ . Though I'm not sure
>>>>>>>>>> if there's request/need to support wilder legacy virtio versions earlier
>>>>>>>>>> beyond.
>>>>>>>>> I personally feel it's less work to add in kernel than try to
>>>>>>>>> work around it in userspace. Jason feels differently.
>>>>>>>>> Maybe post the patches and this will prove to Jason it's not
>>>>>>>>> too terrible?
>>>>>>>> I suppose if the vdpa vendor does support 0.95 in the datapath and ring
>>>>>>>> layout level and is limited to x86 only, there should be easy way out.
>>>>>>> Note a subtle difference: what matters is that guest, not host is x86.
>>>>>>> Matters for emulators which might reorder memory accesses.
>>>>>>> I guess this enforcement belongs in QEMU then?
>>>>>> Right, I mean to get started, the initial guest driver support and the
>>>>>> corresponding QEMU support for transitional vdpa backend can be limited
>>>>>> to x86 guest/host only. Since the config space is emulated in QEMU, I
>>>>>> suppose it's not hard to enforce in QEMU.
>>>>> It's more than just config space, most devices have headers before the buffer.
>>>> The ordering in datapath (data VQs) would have to rely on vendor's support.
>>>> Since ORDER_PLATFORM is pretty new (v1.1), I guess vdpa h/w vendor nowadays
>>>> can/should well support the case when ORDER_PLATFORM is not acked by the
>>>> driver (actually this feature is filtered out by the QEMU vhost-vdpa driver
>>>> today), even with v1.0 spec conforming and modern only vDPA device. The
>>>> control VQ is implemented in software in the kernel, which can be easily
>>>> accommodated/fixed when needed.
>>>>
>>>>>> QEMU can drive GET_LEGACY,
>>>>>> GET_ENDIAN et al ioctls in advance to get the capability from the
>>>>>> individual vendor driver. For that, we need another negotiation protocol
>>>>>> similar to vhost_user's protocol_features between the vdpa kernel and
>>>>>> QEMU, way before the guest driver is ever probed and its feature
>>>>>> negotiation kicks in. Not sure we need a GET_MEMORY_ORDER ioctl call
>>>>>> from the device, but we can assume weak ordering for legacy at this
>>>>>> point (x86 only)?
>>>>> I'm lost here, we have get_features() so:
>>>> I assume here you refer to get_device_features() that Eli just changed the
>>>> name.
>>>>> 1) VERSION_1 means the device uses LE if provided, otherwise natvie
>>>>> 2) ORDER_PLATFORM means device requires platform ordering
>>>>>
>>>>> Any reason for having a new API for this?
>>>> Are you going to enforce all vDPA hardware vendors to support the
>>>> transitional model for legacy guest?
> Do we really have other choices?
>
> I suspect the legacy device is never implemented by any vendor:
>
> 1) no virtio way to detect host endian
This is even true for transitional device that is conforming to the
spec, right? The transport specific way to detect host endian is still
being discussed and the spec revision is not finalized yet so far as I
see. Why this suddenly becomes a requirement/blocker for h/w vendors to
implement the transitional model? Even if the spec is out, this is
pretty new and I suspect not all vendor would follow right away. I hope
the software framework can be tolerant with h/w vendors not supporting
host endianess (BE specifically) or not detecting it if they would like
to support a transitional device for legacy.

> 2) bypass IOMMU with translated requests
> 3) PIO port
>
> Yes we have enp_vdpa, but it's more like a "transitional device" for
> legacy only guests.
>
>> meaning guest not acknowledging
>>>> VERSION_1 would use the legacy interfaces captured in the spec section 7.4
>>>> (regarding ring layout, native endianness, message framing, vq alignment of
>>>> 4096, 32bit feature, no features_ok bit in status, IO port interface i.e.
>>>> all the things) instead?
> Note that we only care about the datapath, control path is mediated anyhow.
>
> So feature_ok and IO port isn't an issue. The rest looks like a must
> for the hardware.
H/W vendors can opt out not implementing transitional interfaces at all
which limits itself a modern only device. Set endianess detection (via
transport specific means) aside, for vendors that wishes to support
transitional device with legacy interface, is it a hard stop to drop
supporting BE host if everything else is there? The spec today doesn't
define virtio specific means to detect host memory ordering or device
memory coherency, will it yet become a stopper another day for h/w
vendor to support more platforms?

>
>> Noted we don't yet have a set_device_features()
>>>> that allows the vdpa device to tell whether it is operating in transitional
>>>> or modern-only mode.
> So the device feature should be provisioned via the netlink protocol.
Such netlink interface will only be used to limit feature exposure,
right? i.e. you can limit a transitional supporting vendor driver to
offering modern-only interface, but you never want to make a modern-only
vendor driver to support transitional (I'm not sure if it's a good idea
to support all the translation in software, esp. for datapath).
> And what we want is not "set_device_feature()" but
> "set_device_mandatory_feautre()", then the parent can choose to fail
> the negotiation when VERSION_1 is not negotiated.
This assumes the transport specific detection of BE host is in place,
right? I am not clear who initiates the set_device_mandatory_feautre()
call, QEMU during guest feature negotiation, or admin user setting it
ahead via netlink?

Thanks,
-Siwei

> Qemu then knows for
> sure it talks to a transitional device or modern only device.
>
> Thanks
>
>> For software virtio, all support for the legacy part in
>>>> a transitional model has been built up there already, however, it's not easy
>>>> for vDPA vendors to implement all the requirements for an all-or-nothing
>>>> legacy guest support (big endian guest for example). To these vendors, the
>>>> legacy support within a transitional model is more of feature to them and
>>>> it's best to leave some flexibility for them to implement partial support
>>>> for legacy. That in turn calls out the need for a vhost-user protocol
>>>> feature like negotiation API that can prohibit those unsupported guest
>>>> setups to as early as backend_init before launching the VM.
>>> Right. Of note is the fact that it's a spec bug which I
>>> hope yet to fix, though due to existing guest code the
>>> fix won't be complete.
>> I thought at one point you pointed out to me that the spec does allow
>> config space read before claiming features_ok, and only config write
>> before features_ok is prohibited. I haven't read up the full thread of
>> Halil's VERSION_1 for transitional big endian device yet, but what is
>> the spec bug you hope to fix?
>>
>>> WRT ioctls, One thing we can do though is abuse set_features
>>> where it's called by QEMU early on with just the VERSION_1
>>> bit set, to distinguish between legacy and modern
>>> interface. This before config space accesses and FEATURES_OK.
>>>
>>> Halil has been working on this, pls take a look and maybe help him out.
>> Interesting thread, am reading now and see how I may leverage or help there.
>>
>>>>>>>> I
>>>>>>>> checked with Eli and other Mellanox/NVDIA folks for hardware/firmware level
>>>>>>>> 0.95 support, it seems all the ingredient had been there already dated back
>>>>>>>> to the DPDK days. The only major thing limiting is in the vDPA software that
>>>>>>>> the current vdpa core has the assumption around VIRTIO_F_ACCESS_PLATFORM for
>>>>>>>> a few DMA setup ops, which is virtio 1.0 only.
>>>>>>>>
>>>>>>>>>> 2. suppose some form of legacy guest support needs to be there, how do we
>>>>>>>>>> deal with the bogus assumption below in vdpa_get_config() in the short term?
>>>>>>>>>> It looks one of the intuitive fix is to move the vdpa_set_features call out
>>>>>>>>>> of vdpa_get_config() to vdpa_set_config().
>>>>>>>>>>
>>>>>>>>>> /*
>>>>>>>>>> * Config accesses aren't supposed to trigger before features are
>>>>>>>>>> set.
>>>>>>>>>> * If it does happen we assume a legacy guest.
>>>>>>>>>> */
>>>>>>>>>> if (!vdev->features_valid)
>>>>>>>>>> vdpa_set_features(vdev, 0);
>>>>>>>>>> ops->get_config(vdev, offset, buf, len);
>>>>>>>>>>
>>>>>>>>>> I can post a patch to fix 2) if there's consensus already reached.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> -Siwei
>>>>>>>>> I'm not sure how important it is to change that.
>>>>>>>>> In any case it only affects transitional devices, right?
>>>>>>>>> Legacy only should not care ...
>>>>>>>> Yes I'd like to distinguish legacy driver (suppose it is 0.95) against the
>>>>>>>> modern one in a transitional device model rather than being legacy only.
>>>>>>>> That way a v0.95 and v1.0 supporting vdpa parent can support both types of
>>>>>>>> guests without having to reconfigure. Or are you suggesting limit to legacy
>>>>>>>> only at the time of vdpa creation would simplify the implementation a lot?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> -Siwei
>>>>>>> I don't know for sure. Take a look at the work Halil was doing
>>>>>>> to try and support transitional devices with BE guests.
>>>>>> Hmmm, we can have those endianness ioctls defined but the initial QEMU
>>>>>> implementation can be started to support x86 guest/host with little
>>>>>> endian and weak memory ordering first. The real trick is to detect
>>>>>> legacy guest - I am not sure if it's feasible to shift all the legacy
>>>>>> detection work to QEMU, or the kernel has to be part of the detection
>>>>>> (e.g. the kick before DRIVER_OK thing we have to duplicate the tracking
>>>>>> effort in QEMU) as well. Let me take a further look and get back.
>>>>> Michael may think differently but I think doing this in Qemu is much easier.
>>>> I think the key is whether we position emulating legacy interfaces in QEMU
>>>> doing translation on top of a v1.0 modern-only device in the kernel, or we
>>>> allow vdpa core (or you can say vhost-vdpa) and vendor driver to support a
>>>> transitional model in the kernel that is able to work for both v0.95 and
>>>> v1.0 drivers, with some slight aid from QEMU for
>>>> detecting/emulation/shadowing (for e.g CVQ, I/O port relay). I guess for the
>>>> former we still rely on vendor for a performant data vqs implementation,
>>>> leaving the question to what it may end up eventually in the kernel is
>>>> effectively the latter).
>>>>
>>>> Thanks,
>>>> -Siwei
>>> My suggestion is post the kernel patches, and we can evaluate
>>> how much work they are.
>> Thanks for the feedback. I will take some read then get back, probably
>> after the winter break. Stay tuned.
>>
>> Thanks,
>> -Siwei
>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>
>>>>>> Meanwhile, I'll check internally to see if a legacy only model would
>>>>>> work. Thanks.
>>>>>>
>>>>>> Thanks,
>>>>>> -Siwei
>>>>>>
>>>>>>
>>>>>>>>>> On 3/2/2021 2:53 AM, Jason Wang wrote:
>>>>>>>>>>> On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
>>>>>>>>>>>> On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
>>>>>>>>>>>>> On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
>>>>>>>>>>>>>> On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
>>>>>>>>>>>>>>>> Detecting it isn't enough though, we will need a new ioctl to notify
>>>>>>>>>>>>>>>> the kernel that it's a legacy guest. Ugh :(
>>>>>>>>>>>>>>> Well, although I think adding an ioctl is doable, may I
>>>>>>>>>>>>>>> know what the use
>>>>>>>>>>>>>>> case there will be for kernel to leverage such info
>>>>>>>>>>>>>>> directly? Is there a
>>>>>>>>>>>>>>> case QEMU can't do with dedicate ioctls later if there's indeed
>>>>>>>>>>>>>>> differentiation (legacy v.s. modern) needed?
>>>>>>>>>>>>>> BTW a good API could be
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
>>>>>>>>>>>>>> #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> we did it per vring but maybe that was a mistake ...
>>>>>>>>>>>>> Actually, I wonder whether it's good time to just not support
>>>>>>>>>>>>> legacy driver
>>>>>>>>>>>>> for vDPA. Consider:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1) It's definition is no-normative
>>>>>>>>>>>>> 2) A lot of budren of codes
>>>>>>>>>>>>>
>>>>>>>>>>>>> So qemu can still present the legacy device since the config
>>>>>>>>>>>>> space or other
>>>>>>>>>>>>> stuffs that is presented by vhost-vDPA is not expected to be
>>>>>>>>>>>>> accessed by
>>>>>>>>>>>>> guest directly. Qemu can do the endian conversion when necessary
>>>>>>>>>>>>> in this
>>>>>>>>>>>>> case?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>
>>>>>>>>>>>> Overall I would be fine with this approach but we need to avoid breaking
>>>>>>>>>>>> working userspace, qemu releases with vdpa support are out there and
>>>>>>>>>>>> seem to work for people. Any changes need to take that into account
>>>>>>>>>>>> and document compatibility concerns.
>>>>>>>>>>> Agree, let me check.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> I note that any hardware
>>>>>>>>>>>> implementation is already broken for legacy except on platforms with
>>>>>>>>>>>> strong ordering which might be helpful in reducing the scope.
>>>>>>>>>>> Yes.
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>>


2021-12-17 01:08:44

by Si-Wei Liu

[permalink] [raw]
Subject: Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)



On 12/15/2021 7:43 PM, Jason Wang wrote:
> On Thu, Dec 16, 2021 at 4:52 AM Si-Wei Liu <[email protected]> wrote:
>>
>>
>> On 12/14/2021 6:06 PM, Jason Wang wrote:
>>> On Wed, Dec 15, 2021 at 9:05 AM Si-Wei Liu <[email protected]> wrote:
>>>>
>>>> On 12/13/2021 9:06 PM, Michael S. Tsirkin wrote:
>>>>> On Mon, Dec 13, 2021 at 05:59:45PM -0800, Si-Wei Liu wrote:
>>>>>> On 12/12/2021 1:26 AM, Michael S. Tsirkin wrote:
>>>>>>> On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
>>>>>>>> Sorry for reviving this ancient thread. I was kinda lost for the conclusion
>>>>>>>> it ended up with. I have the following questions,
>>>>>>>>
>>>>>>>> 1. legacy guest support: from the past conversations it doesn't seem the
>>>>>>>> support will be completely dropped from the table, is my understanding
>>>>>>>> correct? Actually we're interested in supporting virtio v0.95 guest for x86,
>>>>>>>> which is backed by the spec at
>>>>>>>> https://urldefense.com/v3/__https://ozlabs.org/*rusty/virtio-spec/virtio-0.9.5.pdf__;fg!!ACWV5N9M2RV99hQ!dTKmzJwwRsFM7BtSuTDu1cNly5n4XCotH0WYmidzGqHSXt40i7ZU43UcNg7GYxZg$ . Though I'm not sure
>>>>>>>> if there's request/need to support wilder legacy virtio versions earlier
>>>>>>>> beyond.
>>>>>>> I personally feel it's less work to add in kernel than try to
>>>>>>> work around it in userspace. Jason feels differently.
>>>>>>> Maybe post the patches and this will prove to Jason it's not
>>>>>>> too terrible?
>>>>>> I suppose if the vdpa vendor does support 0.95 in the datapath and ring
>>>>>> layout level and is limited to x86 only, there should be easy way out.
>>>>> Note a subtle difference: what matters is that guest, not host is x86.
>>>>> Matters for emulators which might reorder memory accesses.
>>>>> I guess this enforcement belongs in QEMU then?
>>>> Right, I mean to get started, the initial guest driver support and the
>>>> corresponding QEMU support for transitional vdpa backend can be limited
>>>> to x86 guest/host only. Since the config space is emulated in QEMU, I
>>>> suppose it's not hard to enforce in QEMU.
>>> It's more than just config space, most devices have headers before the buffer.
>> The ordering in datapath (data VQs) would have to rely on vendor's
>> support. Since ORDER_PLATFORM is pretty new (v1.1), I guess vdpa h/w
>> vendor nowadays can/should well support the case when ORDER_PLATFORM is
>> not acked by the driver (actually this feature is filtered out by the
>> QEMU vhost-vdpa driver today), even with v1.0 spec conforming and modern
>> only vDPA device.
> That's a bug that needs to be fixed.
>
>> The control VQ is implemented in software in the
>> kernel, which can be easily accommodated/fixed when needed.
>>
>>>> QEMU can drive GET_LEGACY,
>>>> GET_ENDIAN et al ioctls in advance to get the capability from the
>>>> individual vendor driver. For that, we need another negotiation protocol
>>>> similar to vhost_user's protocol_features between the vdpa kernel and
>>>> QEMU, way before the guest driver is ever probed and its feature
>>>> negotiation kicks in. Not sure we need a GET_MEMORY_ORDER ioctl call
>>>> from the device, but we can assume weak ordering for legacy at this
>>>> point (x86 only)?
>>> I'm lost here, we have get_features() so:
>> I assume here you refer to get_device_features() that Eli just changed
>> the name.
>>> 1) VERSION_1 means the device uses LE if provided, otherwise natvie
>>> 2) ORDER_PLATFORM means device requires platform ordering
>>>
>>> Any reason for having a new API for this?
>> Are you going to enforce all vDPA hardware vendors to support the
>> transitional model for legacy guest? meaning guest not acknowledging
>> VERSION_1 would use the legacy interfaces captured in the spec section
>> 7.4 (regarding ring layout, native endianness, message framing, vq
>> alignment of 4096, 32bit feature, no features_ok bit in status, IO port
>> interface i.e. all the things) instead? Noted we don't yet have a
>> set_device_features() that allows the vdpa device to tell whether it is
>> operating in transitional or modern-only mode. For software virtio, all
>> support for the legacy part in a transitional model has been built up
>> there already, however, it's not easy for vDPA vendors to implement all
>> the requirements for an all-or-nothing legacy guest support (big endian
>> guest for example). To these vendors, the legacy support within a
>> transitional model is more of feature to them and it's best to leave
>> some flexibility for them to implement partial support for legacy. That
>> in turn calls out the need for a vhost-user protocol feature like
>> negotiation API that can prohibit those unsupported guest setups to as
>> early as backend_init before launching the VM.
>>
>>
>>>>>> I
>>>>>> checked with Eli and other Mellanox/NVDIA folks for hardware/firmware level
>>>>>> 0.95 support, it seems all the ingredient had been there already dated back
>>>>>> to the DPDK days. The only major thing limiting is in the vDPA software that
>>>>>> the current vdpa core has the assumption around VIRTIO_F_ACCESS_PLATFORM for
>>>>>> a few DMA setup ops, which is virtio 1.0 only.
>>>>>>
>>>>>>>> 2. suppose some form of legacy guest support needs to be there, how do we
>>>>>>>> deal with the bogus assumption below in vdpa_get_config() in the short term?
>>>>>>>> It looks one of the intuitive fix is to move the vdpa_set_features call out
>>>>>>>> of vdpa_get_config() to vdpa_set_config().
>>>>>>>>
>>>>>>>> /*
>>>>>>>> * Config accesses aren't supposed to trigger before features are
>>>>>>>> set.
>>>>>>>> * If it does happen we assume a legacy guest.
>>>>>>>> */
>>>>>>>> if (!vdev->features_valid)
>>>>>>>> vdpa_set_features(vdev, 0);
>>>>>>>> ops->get_config(vdev, offset, buf, len);
>>>>>>>>
>>>>>>>> I can post a patch to fix 2) if there's consensus already reached.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> -Siwei
>>>>>>> I'm not sure how important it is to change that.
>>>>>>> In any case it only affects transitional devices, right?
>>>>>>> Legacy only should not care ...
>>>>>> Yes I'd like to distinguish legacy driver (suppose it is 0.95) against the
>>>>>> modern one in a transitional device model rather than being legacy only.
>>>>>> That way a v0.95 and v1.0 supporting vdpa parent can support both types of
>>>>>> guests without having to reconfigure. Or are you suggesting limit to legacy
>>>>>> only at the time of vdpa creation would simplify the implementation a lot?
>>>>>>
>>>>>> Thanks,
>>>>>> -Siwei
>>>>> I don't know for sure. Take a look at the work Halil was doing
>>>>> to try and support transitional devices with BE guests.
>>>> Hmmm, we can have those endianness ioctls defined but the initial QEMU
>>>> implementation can be started to support x86 guest/host with little
>>>> endian and weak memory ordering first. The real trick is to detect
>>>> legacy guest - I am not sure if it's feasible to shift all the legacy
>>>> detection work to QEMU, or the kernel has to be part of the detection
>>>> (e.g. the kick before DRIVER_OK thing we have to duplicate the tracking
>>>> effort in QEMU) as well. Let me take a further look and get back.
>>> Michael may think differently but I think doing this in Qemu is much easier.
>> I think the key is whether we position emulating legacy interfaces in
>> QEMU doing translation on top of a v1.0 modern-only device in the
>> kernel, or we allow vdpa core (or you can say vhost-vdpa) and vendor
>> driver to support a transitional model in the kernel that is able to
>> work for both v0.95 and v1.0 drivers, with some slight aid from QEMU for
>> detecting/emulation/shadowing (for e.g CVQ, I/O port relay). I guess for
>> the former we still rely on vendor for a performant data vqs
>> implementation, leaving the question to what it may end up eventually in
>> the kernel is effectively the latter).
> I think we can do the legacy interface emulation on top of the shadow
> VQ. And we know it works for sure. But I agree, it would be much
> easier if we depend on the vendor to implement a transitional device.
First I am not sure if there's a convincing case for users to deploy
vDPA with shadow (data) VQ against the pure software based backend.
Please enlighten me if there is.

For us, the point to deploy vDPA for legacy guest is the acceleration
(what "A" stands for in "vDPA") part of it so that we can leverage the
hardware potential if at all possible. Not sure how the shadow VQ
implementation can easily deal with datapath acceleration without losing
too much performance?

> So assuming we depend on the vendor, I don't see anything that is
> strictly needed in the kernel, the kick or config access before
> DRIVER_OK can all be handled easily in Qemu unless I miss something.
Right, that's what I think too it's not quite a lot of work in the
kernel if vendor device offers the aid/support for transitional. The
kernel only provides the abstraction of device model (transitional or
modern-only), while vendor driver may implement early platform feature
discovery and apply legacy specific quirks (unsupported endianness,
mismatched page size, unsupported host memory ordering model) that the
device can't adapt to. I don't say we have to depend on the vendor, but
the point is that we must assume fully spec compliant transitional
support (the datapath in particular) from the vendor to get started, as
I guess it's probably the main motivation for users to deploy it -
acceleration of legacy guest workload without exhausting host computing
resource. Even if we get started with shadow VQ to mediate and translate
the datapath, eventually it may evolve towards leveraging datapath
offload to hardware if acceleration is the only convincing use case for
legacy support.

Thanks,
-Siwei
> The only value to do that in the kernel is that it can work for
> virtio-vdpa, but modern only virito-vpda is sufficient; we don't need
> any legacy stuff for that.
>
> Thanks
>
>> Thanks,
>> -Siwei
>>
>>> Thanks
>>>
>>>
>>>
>>>> Meanwhile, I'll check internally to see if a legacy only model would
>>>> work. Thanks.
>>>>
>>>> Thanks,
>>>> -Siwei
>>>>
>>>>
>>>>>>>> On 3/2/2021 2:53 AM, Jason Wang wrote:
>>>>>>>>> On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
>>>>>>>>>> On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
>>>>>>>>>>> On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
>>>>>>>>>>>> On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
>>>>>>>>>>>>>> Detecting it isn't enough though, we will need a new ioctl to notify
>>>>>>>>>>>>>> the kernel that it's a legacy guest. Ugh :(
>>>>>>>>>>>>> Well, although I think adding an ioctl is doable, may I
>>>>>>>>>>>>> know what the use
>>>>>>>>>>>>> case there will be for kernel to leverage such info
>>>>>>>>>>>>> directly? Is there a
>>>>>>>>>>>>> case QEMU can't do with dedicate ioctls later if there's indeed
>>>>>>>>>>>>> differentiation (legacy v.s. modern) needed?
>>>>>>>>>>>> BTW a good API could be
>>>>>>>>>>>>
>>>>>>>>>>>> #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
>>>>>>>>>>>> #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
>>>>>>>>>>>>
>>>>>>>>>>>> we did it per vring but maybe that was a mistake ...
>>>>>>>>>>> Actually, I wonder whether it's good time to just not support
>>>>>>>>>>> legacy driver
>>>>>>>>>>> for vDPA. Consider:
>>>>>>>>>>>
>>>>>>>>>>> 1) It's definition is no-normative
>>>>>>>>>>> 2) A lot of budren of codes
>>>>>>>>>>>
>>>>>>>>>>> So qemu can still present the legacy device since the config
>>>>>>>>>>> space or other
>>>>>>>>>>> stuffs that is presented by vhost-vDPA is not expected to be
>>>>>>>>>>> accessed by
>>>>>>>>>>> guest directly. Qemu can do the endian conversion when necessary
>>>>>>>>>>> in this
>>>>>>>>>>> case?
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>> Overall I would be fine with this approach but we need to avoid breaking
>>>>>>>>>> working userspace, qemu releases with vdpa support are out there and
>>>>>>>>>> seem to work for people. Any changes need to take that into account
>>>>>>>>>> and document compatibility concerns.
>>>>>>>>> Agree, let me check.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> I note that any hardware
>>>>>>>>>> implementation is already broken for legacy except on platforms with
>>>>>>>>>> strong ordering which might be helpful in reducing the scope.
>>>>>>>>> Yes.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>>


2021-12-17 01:57:57

by Jason Wang

[permalink] [raw]
Subject: Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)

On Fri, Dec 17, 2021 at 6:32 AM Si-Wei Liu <[email protected]> wrote:
>
>
>
> On 12/15/2021 6:53 PM, Jason Wang wrote:
> > On Thu, Dec 16, 2021 at 10:02 AM Si-Wei Liu <[email protected]> wrote:
> >>
> >>
> >> On 12/15/2021 1:33 PM, Michael S. Tsirkin wrote:
> >>> On Wed, Dec 15, 2021 at 12:52:20PM -0800, Si-Wei Liu wrote:
> >>>> On 12/14/2021 6:06 PM, Jason Wang wrote:
> >>>>> On Wed, Dec 15, 2021 at 9:05 AM Si-Wei Liu <[email protected]> wrote:
> >>>>>> On 12/13/2021 9:06 PM, Michael S. Tsirkin wrote:
> >>>>>>> On Mon, Dec 13, 2021 at 05:59:45PM -0800, Si-Wei Liu wrote:
> >>>>>>>> On 12/12/2021 1:26 AM, Michael S. Tsirkin wrote:
> >>>>>>>>> On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
> >>>>>>>>>> Sorry for reviving this ancient thread. I was kinda lost for the conclusion
> >>>>>>>>>> it ended up with. I have the following questions,
> >>>>>>>>>>
> >>>>>>>>>> 1. legacy guest support: from the past conversations it doesn't seem the
> >>>>>>>>>> support will be completely dropped from the table, is my understanding
> >>>>>>>>>> correct? Actually we're interested in supporting virtio v0.95 guest for x86,
> >>>>>>>>>> which is backed by the spec at
> >>>>>>>>>> https://urldefense.com/v3/__https://ozlabs.org/*rusty/virtio-spec/virtio-0.9.5.pdf__;fg!!ACWV5N9M2RV99hQ!dTKmzJwwRsFM7BtSuTDu1cNly5n4XCotH0WYmidzGqHSXt40i7ZU43UcNg7GYxZg$ . Though I'm not sure
> >>>>>>>>>> if there's request/need to support wilder legacy virtio versions earlier
> >>>>>>>>>> beyond.
> >>>>>>>>> I personally feel it's less work to add in kernel than try to
> >>>>>>>>> work around it in userspace. Jason feels differently.
> >>>>>>>>> Maybe post the patches and this will prove to Jason it's not
> >>>>>>>>> too terrible?
> >>>>>>>> I suppose if the vdpa vendor does support 0.95 in the datapath and ring
> >>>>>>>> layout level and is limited to x86 only, there should be easy way out.
> >>>>>>> Note a subtle difference: what matters is that guest, not host is x86.
> >>>>>>> Matters for emulators which might reorder memory accesses.
> >>>>>>> I guess this enforcement belongs in QEMU then?
> >>>>>> Right, I mean to get started, the initial guest driver support and the
> >>>>>> corresponding QEMU support for transitional vdpa backend can be limited
> >>>>>> to x86 guest/host only. Since the config space is emulated in QEMU, I
> >>>>>> suppose it's not hard to enforce in QEMU.
> >>>>> It's more than just config space, most devices have headers before the buffer.
> >>>> The ordering in datapath (data VQs) would have to rely on vendor's support.
> >>>> Since ORDER_PLATFORM is pretty new (v1.1), I guess vdpa h/w vendor nowadays
> >>>> can/should well support the case when ORDER_PLATFORM is not acked by the
> >>>> driver (actually this feature is filtered out by the QEMU vhost-vdpa driver
> >>>> today), even with v1.0 spec conforming and modern only vDPA device. The
> >>>> control VQ is implemented in software in the kernel, which can be easily
> >>>> accommodated/fixed when needed.
> >>>>
> >>>>>> QEMU can drive GET_LEGACY,
> >>>>>> GET_ENDIAN et al ioctls in advance to get the capability from the
> >>>>>> individual vendor driver. For that, we need another negotiation protocol
> >>>>>> similar to vhost_user's protocol_features between the vdpa kernel and
> >>>>>> QEMU, way before the guest driver is ever probed and its feature
> >>>>>> negotiation kicks in. Not sure we need a GET_MEMORY_ORDER ioctl call
> >>>>>> from the device, but we can assume weak ordering for legacy at this
> >>>>>> point (x86 only)?
> >>>>> I'm lost here, we have get_features() so:
> >>>> I assume here you refer to get_device_features() that Eli just changed the
> >>>> name.
> >>>>> 1) VERSION_1 means the device uses LE if provided, otherwise natvie
> >>>>> 2) ORDER_PLATFORM means device requires platform ordering
> >>>>>
> >>>>> Any reason for having a new API for this?
> >>>> Are you going to enforce all vDPA hardware vendors to support the
> >>>> transitional model for legacy guest?
> > Do we really have other choices?
> >
> > I suspect the legacy device is never implemented by any vendor:
> >
> > 1) no virtio way to detect host endian
> This is even true for transitional device that is conforming to the
> spec, right?

For hardware, yes.

> The transport specific way to detect host endian is still
> being discussed and the spec revision is not finalized yet so far as I
> see. Why this suddenly becomes a requirement/blocker for h/w vendors to
> implement the transitional model?

It's not a sudden blocker, the problem has existed since day 0 if I
was not wrong. That's why the problem looks a little bit complicated
and why it would be much simpler if we stick to modern devices.

> Even if the spec is out, this is
> pretty new and I suspect not all vendor would follow right away. I hope
> the software framework can be tolerant with h/w vendors not supporting
> host endianess (BE specifically) or not detecting it if they would like
> to support a transitional device for legacy.

Well, if we know we don't want to support the BE host it would be fine.

>
> > 2) bypass IOMMU with translated requests
> > 3) PIO port
> >
> > Yes we have enp_vdpa, but it's more like a "transitional device" for
> > legacy only guests.
> >
> >> meaning guest not acknowledging
> >>>> VERSION_1 would use the legacy interfaces captured in the spec section 7.4
> >>>> (regarding ring layout, native endianness, message framing, vq alignment of
> >>>> 4096, 32bit feature, no features_ok bit in status, IO port interface i.e.
> >>>> all the things) instead?
> > Note that we only care about the datapath, control path is mediated anyhow.
> >
> > So feature_ok and IO port isn't an issue. The rest looks like a must
> > for the hardware.
> H/W vendors can opt out not implementing transitional interfaces at all
> which limits itself a modern only device. Set endianess detection (via
> transport specific means) aside, for vendors that wishes to support
> transitional device with legacy interface, is it a hard stop to drop
> supporting BE host if everything else is there? The spec today doesn't
> define virtio specific means to detect host memory ordering or device
> memory coherency,

Any reason that we need to care about memory coherency at the virtio
level. I'd expect it's the task of transport.

> will it yet become a stopper another day for h/w
> vendor to support more platforms?

Let's differentiate virtio from vdpa here. For virtio, there's no way
to add any feature for legacy devices. We can only add memory features
detecting for modern devices.

But for vDPA, we can introduce any API that can help vendors to
present a transitional device. But we can force those APIs since it's
too late to do that. So transitional devices support is optional for
sure.

>
> >
> >> Noted we don't yet have a set_device_features()
> >>>> that allows the vdpa device to tell whether it is operating in transitional
> >>>> or modern-only mode.
> > So the device feature should be provisioned via the netlink protocol.
> Such netlink interface will only be used to limit feature exposure,
> right? i.e. you can limit a transitional supporting vendor driver to
> offering modern-only interface,

There's no way for the management to force a feature, like VERSION_1
via the current protocol.

> but you never want to make a modern-only
> vendor driver to support transitional (I'm not sure if it's a good idea
> to support all the translation in software, esp. for datapath).

You may hit this problem for sure, you can't force all vendors to
support transitional devices especially considering spec said legacy
is optional. We don't want to end up with a userspace code that can
only work for some specific vendors.

> > And what we want is not "set_device_feature()" but
> > "set_device_mandatory_feautre()", then the parent can choose to fail
> > the negotiation when VERSION_1 is not negotiated.
> This assumes the transport specific detection of BE host is in place,
> right?

Again, the point is, we can not assume such detection works for all of
the vendors. And assume BE detection is ready, we still need this for
modern devices, isn't it?

> I am not clear who initiates the set_device_mandatory_feautre()
> call, QEMU during guest feature negotiation, or admin user setting it
> ahead via netlink?

Netlink, actually, the spec needs to be extended as well, we saw
similar requests in the past. E.g there could be a device that works
in a packed layout only.

Thanks

>
> Thanks,
> -Siwei
>
> > Qemu then knows for
> > sure it talks to a transitional device or modern only device.
> >
> > Thanks
> >
> >> For software virtio, all support for the legacy part in
> >>>> a transitional model has been built up there already, however, it's not easy
> >>>> for vDPA vendors to implement all the requirements for an all-or-nothing
> >>>> legacy guest support (big endian guest for example). To these vendors, the
> >>>> legacy support within a transitional model is more of feature to them and
> >>>> it's best to leave some flexibility for them to implement partial support
> >>>> for legacy. That in turn calls out the need for a vhost-user protocol
> >>>> feature like negotiation API that can prohibit those unsupported guest
> >>>> setups to as early as backend_init before launching the VM.
> >>> Right. Of note is the fact that it's a spec bug which I
> >>> hope yet to fix, though due to existing guest code the
> >>> fix won't be complete.
> >> I thought at one point you pointed out to me that the spec does allow
> >> config space read before claiming features_ok, and only config write
> >> before features_ok is prohibited. I haven't read up the full thread of
> >> Halil's VERSION_1 for transitional big endian device yet, but what is
> >> the spec bug you hope to fix?
> >>
> >>> WRT ioctls, One thing we can do though is abuse set_features
> >>> where it's called by QEMU early on with just the VERSION_1
> >>> bit set, to distinguish between legacy and modern
> >>> interface. This before config space accesses and FEATURES_OK.
> >>>
> >>> Halil has been working on this, pls take a look and maybe help him out.
> >> Interesting thread, am reading now and see how I may leverage or help there.
> >>
> >>>>>>>> I
> >>>>>>>> checked with Eli and other Mellanox/NVDIA folks for hardware/firmware level
> >>>>>>>> 0.95 support, it seems all the ingredient had been there already dated back
> >>>>>>>> to the DPDK days. The only major thing limiting is in the vDPA software that
> >>>>>>>> the current vdpa core has the assumption around VIRTIO_F_ACCESS_PLATFORM for
> >>>>>>>> a few DMA setup ops, which is virtio 1.0 only.
> >>>>>>>>
> >>>>>>>>>> 2. suppose some form of legacy guest support needs to be there, how do we
> >>>>>>>>>> deal with the bogus assumption below in vdpa_get_config() in the short term?
> >>>>>>>>>> It looks one of the intuitive fix is to move the vdpa_set_features call out
> >>>>>>>>>> of vdpa_get_config() to vdpa_set_config().
> >>>>>>>>>>
> >>>>>>>>>> /*
> >>>>>>>>>> * Config accesses aren't supposed to trigger before features are
> >>>>>>>>>> set.
> >>>>>>>>>> * If it does happen we assume a legacy guest.
> >>>>>>>>>> */
> >>>>>>>>>> if (!vdev->features_valid)
> >>>>>>>>>> vdpa_set_features(vdev, 0);
> >>>>>>>>>> ops->get_config(vdev, offset, buf, len);
> >>>>>>>>>>
> >>>>>>>>>> I can post a patch to fix 2) if there's consensus already reached.
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>> -Siwei
> >>>>>>>>> I'm not sure how important it is to change that.
> >>>>>>>>> In any case it only affects transitional devices, right?
> >>>>>>>>> Legacy only should not care ...
> >>>>>>>> Yes I'd like to distinguish legacy driver (suppose it is 0.95) against the
> >>>>>>>> modern one in a transitional device model rather than being legacy only.
> >>>>>>>> That way a v0.95 and v1.0 supporting vdpa parent can support both types of
> >>>>>>>> guests without having to reconfigure. Or are you suggesting limit to legacy
> >>>>>>>> only at the time of vdpa creation would simplify the implementation a lot?
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> -Siwei
> >>>>>>> I don't know for sure. Take a look at the work Halil was doing
> >>>>>>> to try and support transitional devices with BE guests.
> >>>>>> Hmmm, we can have those endianness ioctls defined but the initial QEMU
> >>>>>> implementation can be started to support x86 guest/host with little
> >>>>>> endian and weak memory ordering first. The real trick is to detect
> >>>>>> legacy guest - I am not sure if it's feasible to shift all the legacy
> >>>>>> detection work to QEMU, or the kernel has to be part of the detection
> >>>>>> (e.g. the kick before DRIVER_OK thing we have to duplicate the tracking
> >>>>>> effort in QEMU) as well. Let me take a further look and get back.
> >>>>> Michael may think differently but I think doing this in Qemu is much easier.
> >>>> I think the key is whether we position emulating legacy interfaces in QEMU
> >>>> doing translation on top of a v1.0 modern-only device in the kernel, or we
> >>>> allow vdpa core (or you can say vhost-vdpa) and vendor driver to support a
> >>>> transitional model in the kernel that is able to work for both v0.95 and
> >>>> v1.0 drivers, with some slight aid from QEMU for
> >>>> detecting/emulation/shadowing (for e.g CVQ, I/O port relay). I guess for the
> >>>> former we still rely on vendor for a performant data vqs implementation,
> >>>> leaving the question to what it may end up eventually in the kernel is
> >>>> effectively the latter).
> >>>>
> >>>> Thanks,
> >>>> -Siwei
> >>> My suggestion is post the kernel patches, and we can evaluate
> >>> how much work they are.
> >> Thanks for the feedback. I will take some read then get back, probably
> >> after the winter break. Stay tuned.
> >>
> >> Thanks,
> >> -Siwei
> >>
> >>>>> Thanks
> >>>>>
> >>>>>
> >>>>>
> >>>>>> Meanwhile, I'll check internally to see if a legacy only model would
> >>>>>> work. Thanks.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> -Siwei
> >>>>>>
> >>>>>>
> >>>>>>>>>> On 3/2/2021 2:53 AM, Jason Wang wrote:
> >>>>>>>>>>> On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
> >>>>>>>>>>>> On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
> >>>>>>>>>>>>> On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
> >>>>>>>>>>>>>> On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
> >>>>>>>>>>>>>>>> Detecting it isn't enough though, we will need a new ioctl to notify
> >>>>>>>>>>>>>>>> the kernel that it's a legacy guest. Ugh :(
> >>>>>>>>>>>>>>> Well, although I think adding an ioctl is doable, may I
> >>>>>>>>>>>>>>> know what the use
> >>>>>>>>>>>>>>> case there will be for kernel to leverage such info
> >>>>>>>>>>>>>>> directly? Is there a
> >>>>>>>>>>>>>>> case QEMU can't do with dedicate ioctls later if there's indeed
> >>>>>>>>>>>>>>> differentiation (legacy v.s. modern) needed?
> >>>>>>>>>>>>>> BTW a good API could be
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> >>>>>>>>>>>>>> #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> we did it per vring but maybe that was a mistake ...
> >>>>>>>>>>>>> Actually, I wonder whether it's good time to just not support
> >>>>>>>>>>>>> legacy driver
> >>>>>>>>>>>>> for vDPA. Consider:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 1) It's definition is no-normative
> >>>>>>>>>>>>> 2) A lot of budren of codes
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> So qemu can still present the legacy device since the config
> >>>>>>>>>>>>> space or other
> >>>>>>>>>>>>> stuffs that is presented by vhost-vDPA is not expected to be
> >>>>>>>>>>>>> accessed by
> >>>>>>>>>>>>> guest directly. Qemu can do the endian conversion when necessary
> >>>>>>>>>>>>> in this
> >>>>>>>>>>>>> case?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks
> >>>>>>>>>>>>>
> >>>>>>>>>>>> Overall I would be fine with this approach but we need to avoid breaking
> >>>>>>>>>>>> working userspace, qemu releases with vdpa support are out there and
> >>>>>>>>>>>> seem to work for people. Any changes need to take that into account
> >>>>>>>>>>>> and document compatibility concerns.
> >>>>>>>>>>> Agree, let me check.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> I note that any hardware
> >>>>>>>>>>>> implementation is already broken for legacy except on platforms with
> >>>>>>>>>>>> strong ordering which might be helpful in reducing the scope.
> >>>>>>>>>>> Yes.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks
> >>>>>>>>>>>
> >>>>>>>>>>>
>


2021-12-17 02:01:09

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)

On Fri, Dec 17, 2021 at 09:57:38AM +0800, Jason Wang wrote:
> On Fri, Dec 17, 2021 at 6:32 AM Si-Wei Liu <[email protected]> wrote:
> >
> >
> >
> > On 12/15/2021 6:53 PM, Jason Wang wrote:
> > > On Thu, Dec 16, 2021 at 10:02 AM Si-Wei Liu <[email protected]> wrote:
> > >>
> > >>
> > >> On 12/15/2021 1:33 PM, Michael S. Tsirkin wrote:
> > >>> On Wed, Dec 15, 2021 at 12:52:20PM -0800, Si-Wei Liu wrote:
> > >>>> On 12/14/2021 6:06 PM, Jason Wang wrote:
> > >>>>> On Wed, Dec 15, 2021 at 9:05 AM Si-Wei Liu <[email protected]> wrote:
> > >>>>>> On 12/13/2021 9:06 PM, Michael S. Tsirkin wrote:
> > >>>>>>> On Mon, Dec 13, 2021 at 05:59:45PM -0800, Si-Wei Liu wrote:
> > >>>>>>>> On 12/12/2021 1:26 AM, Michael S. Tsirkin wrote:
> > >>>>>>>>> On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
> > >>>>>>>>>> Sorry for reviving this ancient thread. I was kinda lost for the conclusion
> > >>>>>>>>>> it ended up with. I have the following questions,
> > >>>>>>>>>>
> > >>>>>>>>>> 1. legacy guest support: from the past conversations it doesn't seem the
> > >>>>>>>>>> support will be completely dropped from the table, is my understanding
> > >>>>>>>>>> correct? Actually we're interested in supporting virtio v0.95 guest for x86,
> > >>>>>>>>>> which is backed by the spec at
> > >>>>>>>>>> https://urldefense.com/v3/__https://ozlabs.org/*rusty/virtio-spec/virtio-0.9.5.pdf__;fg!!ACWV5N9M2RV99hQ!dTKmzJwwRsFM7BtSuTDu1cNly5n4XCotH0WYmidzGqHSXt40i7ZU43UcNg7GYxZg$ . Though I'm not sure
> > >>>>>>>>>> if there's request/need to support wilder legacy virtio versions earlier
> > >>>>>>>>>> beyond.
> > >>>>>>>>> I personally feel it's less work to add in kernel than try to
> > >>>>>>>>> work around it in userspace. Jason feels differently.
> > >>>>>>>>> Maybe post the patches and this will prove to Jason it's not
> > >>>>>>>>> too terrible?
> > >>>>>>>> I suppose if the vdpa vendor does support 0.95 in the datapath and ring
> > >>>>>>>> layout level and is limited to x86 only, there should be easy way out.
> > >>>>>>> Note a subtle difference: what matters is that guest, not host is x86.
> > >>>>>>> Matters for emulators which might reorder memory accesses.
> > >>>>>>> I guess this enforcement belongs in QEMU then?
> > >>>>>> Right, I mean to get started, the initial guest driver support and the
> > >>>>>> corresponding QEMU support for transitional vdpa backend can be limited
> > >>>>>> to x86 guest/host only. Since the config space is emulated in QEMU, I
> > >>>>>> suppose it's not hard to enforce in QEMU.
> > >>>>> It's more than just config space, most devices have headers before the buffer.
> > >>>> The ordering in datapath (data VQs) would have to rely on vendor's support.
> > >>>> Since ORDER_PLATFORM is pretty new (v1.1), I guess vdpa h/w vendor nowadays
> > >>>> can/should well support the case when ORDER_PLATFORM is not acked by the
> > >>>> driver (actually this feature is filtered out by the QEMU vhost-vdpa driver
> > >>>> today), even with v1.0 spec conforming and modern only vDPA device. The
> > >>>> control VQ is implemented in software in the kernel, which can be easily
> > >>>> accommodated/fixed when needed.
> > >>>>
> > >>>>>> QEMU can drive GET_LEGACY,
> > >>>>>> GET_ENDIAN et al ioctls in advance to get the capability from the
> > >>>>>> individual vendor driver. For that, we need another negotiation protocol
> > >>>>>> similar to vhost_user's protocol_features between the vdpa kernel and
> > >>>>>> QEMU, way before the guest driver is ever probed and its feature
> > >>>>>> negotiation kicks in. Not sure we need a GET_MEMORY_ORDER ioctl call
> > >>>>>> from the device, but we can assume weak ordering for legacy at this
> > >>>>>> point (x86 only)?
> > >>>>> I'm lost here, we have get_features() so:
> > >>>> I assume here you refer to get_device_features() that Eli just changed the
> > >>>> name.
> > >>>>> 1) VERSION_1 means the device uses LE if provided, otherwise natvie
> > >>>>> 2) ORDER_PLATFORM means device requires platform ordering
> > >>>>>
> > >>>>> Any reason for having a new API for this?
> > >>>> Are you going to enforce all vDPA hardware vendors to support the
> > >>>> transitional model for legacy guest?
> > > Do we really have other choices?
> > >
> > > I suspect the legacy device is never implemented by any vendor:
> > >
> > > 1) no virtio way to detect host endian
> > This is even true for transitional device that is conforming to the
> > spec, right?
>
> For hardware, yes.
>
> > The transport specific way to detect host endian is still
> > being discussed and the spec revision is not finalized yet so far as I
> > see. Why this suddenly becomes a requirement/blocker for h/w vendors to
> > implement the transitional model?
>
> It's not a sudden blocker, the problem has existed since day 0 if I
> was not wrong. That's why the problem looks a little bit complicated
> and why it would be much simpler if we stick to modern devices.
>
> > Even if the spec is out, this is
> > pretty new and I suspect not all vendor would follow right away. I hope
> > the software framework can be tolerant with h/w vendors not supporting
> > host endianess (BE specifically) or not detecting it if they would like
> > to support a transitional device for legacy.
>
> Well, if we know we don't want to support the BE host it would be fine.

I think you guys mean guest not host here. Same for memory ordering etc.
What matters is whether guest has barriers etc.

> >
> > > 2) bypass IOMMU with translated requests
> > > 3) PIO port
> > >
> > > Yes we have enp_vdpa, but it's more like a "transitional device" for
> > > legacy only guests.
> > >
> > >> meaning guest not acknowledging
> > >>>> VERSION_1 would use the legacy interfaces captured in the spec section 7.4
> > >>>> (regarding ring layout, native endianness, message framing, vq alignment of
> > >>>> 4096, 32bit feature, no features_ok bit in status, IO port interface i.e.
> > >>>> all the things) instead?
> > > Note that we only care about the datapath, control path is mediated anyhow.
> > >
> > > So feature_ok and IO port isn't an issue. The rest looks like a must
> > > for the hardware.
> > H/W vendors can opt out not implementing transitional interfaces at all
> > which limits itself a modern only device. Set endianess detection (via
> > transport specific means) aside, for vendors that wishes to support
> > transitional device with legacy interface, is it a hard stop to drop
> > supporting BE host if everything else is there? The spec today doesn't
> > define virtio specific means to detect host memory ordering or device
> > memory coherency,
>
> Any reason that we need to care about memory coherency at the virtio
> level. I'd expect it's the task of transport.
>
> > will it yet become a stopper another day for h/w
> > vendor to support more platforms?
>
> Let's differentiate virtio from vdpa here. For virtio, there's no way
> to add any feature for legacy devices. We can only add memory features
> detecting for modern devices.
>
> But for vDPA, we can introduce any API that can help vendors to
> present a transitional device. But we can force those APIs since it's
> too late to do that. So transitional devices support is optional for
> sure.
>
> >
> > >
> > >> Noted we don't yet have a set_device_features()
> > >>>> that allows the vdpa device to tell whether it is operating in transitional
> > >>>> or modern-only mode.
> > > So the device feature should be provisioned via the netlink protocol.
> > Such netlink interface will only be used to limit feature exposure,
> > right? i.e. you can limit a transitional supporting vendor driver to
> > offering modern-only interface,
>
> There's no way for the management to force a feature, like VERSION_1
> via the current protocol.
>
> > but you never want to make a modern-only
> > vendor driver to support transitional (I'm not sure if it's a good idea
> > to support all the translation in software, esp. for datapath).
>
> You may hit this problem for sure, you can't force all vendors to
> support transitional devices especially considering spec said legacy
> is optional. We don't want to end up with a userspace code that can
> only work for some specific vendors.
>
> > > And what we want is not "set_device_feature()" but
> > > "set_device_mandatory_feautre()", then the parent can choose to fail
> > > the negotiation when VERSION_1 is not negotiated.
> > This assumes the transport specific detection of BE host is in place,
> > right?
>
> Again, the point is, we can not assume such detection works for all of
> the vendors. And assume BE detection is ready, we still need this for
> modern devices, isn't it?
>
> > I am not clear who initiates the set_device_mandatory_feautre()
> > call, QEMU during guest feature negotiation, or admin user setting it
> > ahead via netlink?
>
> Netlink, actually, the spec needs to be extended as well, we saw
> similar requests in the past. E.g there could be a device that works
> in a packed layout only.
>
> Thanks
>
> >
> > Thanks,
> > -Siwei
> >
> > > Qemu then knows for
> > > sure it talks to a transitional device or modern only device.
> > >
> > > Thanks
> > >
> > >> For software virtio, all support for the legacy part in
> > >>>> a transitional model has been built up there already, however, it's not easy
> > >>>> for vDPA vendors to implement all the requirements for an all-or-nothing
> > >>>> legacy guest support (big endian guest for example). To these vendors, the
> > >>>> legacy support within a transitional model is more of feature to them and
> > >>>> it's best to leave some flexibility for them to implement partial support
> > >>>> for legacy. That in turn calls out the need for a vhost-user protocol
> > >>>> feature like negotiation API that can prohibit those unsupported guest
> > >>>> setups to as early as backend_init before launching the VM.
> > >>> Right. Of note is the fact that it's a spec bug which I
> > >>> hope yet to fix, though due to existing guest code the
> > >>> fix won't be complete.
> > >> I thought at one point you pointed out to me that the spec does allow
> > >> config space read before claiming features_ok, and only config write
> > >> before features_ok is prohibited. I haven't read up the full thread of
> > >> Halil's VERSION_1 for transitional big endian device yet, but what is
> > >> the spec bug you hope to fix?
> > >>
> > >>> WRT ioctls, One thing we can do though is abuse set_features
> > >>> where it's called by QEMU early on with just the VERSION_1
> > >>> bit set, to distinguish between legacy and modern
> > >>> interface. This before config space accesses and FEATURES_OK.
> > >>>
> > >>> Halil has been working on this, pls take a look and maybe help him out.
> > >> Interesting thread, am reading now and see how I may leverage or help there.
> > >>
> > >>>>>>>> I
> > >>>>>>>> checked with Eli and other Mellanox/NVDIA folks for hardware/firmware level
> > >>>>>>>> 0.95 support, it seems all the ingredient had been there already dated back
> > >>>>>>>> to the DPDK days. The only major thing limiting is in the vDPA software that
> > >>>>>>>> the current vdpa core has the assumption around VIRTIO_F_ACCESS_PLATFORM for
> > >>>>>>>> a few DMA setup ops, which is virtio 1.0 only.
> > >>>>>>>>
> > >>>>>>>>>> 2. suppose some form of legacy guest support needs to be there, how do we
> > >>>>>>>>>> deal with the bogus assumption below in vdpa_get_config() in the short term?
> > >>>>>>>>>> It looks one of the intuitive fix is to move the vdpa_set_features call out
> > >>>>>>>>>> of vdpa_get_config() to vdpa_set_config().
> > >>>>>>>>>>
> > >>>>>>>>>> /*
> > >>>>>>>>>> * Config accesses aren't supposed to trigger before features are
> > >>>>>>>>>> set.
> > >>>>>>>>>> * If it does happen we assume a legacy guest.
> > >>>>>>>>>> */
> > >>>>>>>>>> if (!vdev->features_valid)
> > >>>>>>>>>> vdpa_set_features(vdev, 0);
> > >>>>>>>>>> ops->get_config(vdev, offset, buf, len);
> > >>>>>>>>>>
> > >>>>>>>>>> I can post a patch to fix 2) if there's consensus already reached.
> > >>>>>>>>>>
> > >>>>>>>>>> Thanks,
> > >>>>>>>>>> -Siwei
> > >>>>>>>>> I'm not sure how important it is to change that.
> > >>>>>>>>> In any case it only affects transitional devices, right?
> > >>>>>>>>> Legacy only should not care ...
> > >>>>>>>> Yes I'd like to distinguish legacy driver (suppose it is 0.95) against the
> > >>>>>>>> modern one in a transitional device model rather than being legacy only.
> > >>>>>>>> That way a v0.95 and v1.0 supporting vdpa parent can support both types of
> > >>>>>>>> guests without having to reconfigure. Or are you suggesting limit to legacy
> > >>>>>>>> only at the time of vdpa creation would simplify the implementation a lot?
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>> -Siwei
> > >>>>>>> I don't know for sure. Take a look at the work Halil was doing
> > >>>>>>> to try and support transitional devices with BE guests.
> > >>>>>> Hmmm, we can have those endianness ioctls defined but the initial QEMU
> > >>>>>> implementation can be started to support x86 guest/host with little
> > >>>>>> endian and weak memory ordering first. The real trick is to detect
> > >>>>>> legacy guest - I am not sure if it's feasible to shift all the legacy
> > >>>>>> detection work to QEMU, or the kernel has to be part of the detection
> > >>>>>> (e.g. the kick before DRIVER_OK thing we have to duplicate the tracking
> > >>>>>> effort in QEMU) as well. Let me take a further look and get back.
> > >>>>> Michael may think differently but I think doing this in Qemu is much easier.
> > >>>> I think the key is whether we position emulating legacy interfaces in QEMU
> > >>>> doing translation on top of a v1.0 modern-only device in the kernel, or we
> > >>>> allow vdpa core (or you can say vhost-vdpa) and vendor driver to support a
> > >>>> transitional model in the kernel that is able to work for both v0.95 and
> > >>>> v1.0 drivers, with some slight aid from QEMU for
> > >>>> detecting/emulation/shadowing (for e.g CVQ, I/O port relay). I guess for the
> > >>>> former we still rely on vendor for a performant data vqs implementation,
> > >>>> leaving the question to what it may end up eventually in the kernel is
> > >>>> effectively the latter).
> > >>>>
> > >>>> Thanks,
> > >>>> -Siwei
> > >>> My suggestion is post the kernel patches, and we can evaluate
> > >>> how much work they are.
> > >> Thanks for the feedback. I will take some read then get back, probably
> > >> after the winter break. Stay tuned.
> > >>
> > >> Thanks,
> > >> -Siwei
> > >>
> > >>>>> Thanks
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>> Meanwhile, I'll check internally to see if a legacy only model would
> > >>>>>> work. Thanks.
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> -Siwei
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>> On 3/2/2021 2:53 AM, Jason Wang wrote:
> > >>>>>>>>>>> On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
> > >>>>>>>>>>>> On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
> > >>>>>>>>>>>>> On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
> > >>>>>>>>>>>>>> On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
> > >>>>>>>>>>>>>>>> Detecting it isn't enough though, we will need a new ioctl to notify
> > >>>>>>>>>>>>>>>> the kernel that it's a legacy guest. Ugh :(
> > >>>>>>>>>>>>>>> Well, although I think adding an ioctl is doable, may I
> > >>>>>>>>>>>>>>> know what the use
> > >>>>>>>>>>>>>>> case there will be for kernel to leverage such info
> > >>>>>>>>>>>>>>> directly? Is there a
> > >>>>>>>>>>>>>>> case QEMU can't do with dedicate ioctls later if there's indeed
> > >>>>>>>>>>>>>>> differentiation (legacy v.s. modern) needed?
> > >>>>>>>>>>>>>> BTW a good API could be
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > >>>>>>>>>>>>>> #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> we did it per vring but maybe that was a mistake ...
> > >>>>>>>>>>>>> Actually, I wonder whether it's good time to just not support
> > >>>>>>>>>>>>> legacy driver
> > >>>>>>>>>>>>> for vDPA. Consider:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> 1) It's definition is no-normative
> > >>>>>>>>>>>>> 2) A lot of budren of codes
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> So qemu can still present the legacy device since the config
> > >>>>>>>>>>>>> space or other
> > >>>>>>>>>>>>> stuffs that is presented by vhost-vDPA is not expected to be
> > >>>>>>>>>>>>> accessed by
> > >>>>>>>>>>>>> guest directly. Qemu can do the endian conversion when necessary
> > >>>>>>>>>>>>> in this
> > >>>>>>>>>>>>> case?
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Thanks
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>> Overall I would be fine with this approach but we need to avoid breaking
> > >>>>>>>>>>>> working userspace, qemu releases with vdpa support are out there and
> > >>>>>>>>>>>> seem to work for people. Any changes need to take that into account
> > >>>>>>>>>>>> and document compatibility concerns.
> > >>>>>>>>>>> Agree, let me check.
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>> I note that any hardware
> > >>>>>>>>>>>> implementation is already broken for legacy except on platforms with
> > >>>>>>>>>>>> strong ordering which might be helpful in reducing the scope.
> > >>>>>>>>>>> Yes.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Thanks
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> >


2021-12-17 02:01:41

by Jason Wang

[permalink] [raw]
Subject: Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)

On Fri, Dec 17, 2021 at 9:08 AM Si-Wei Liu <[email protected]> wrote:
>
>
>
> On 12/15/2021 7:43 PM, Jason Wang wrote:
> > On Thu, Dec 16, 2021 at 4:52 AM Si-Wei Liu <[email protected]> wrote:
> >>
> >>
> >> On 12/14/2021 6:06 PM, Jason Wang wrote:
> >>> On Wed, Dec 15, 2021 at 9:05 AM Si-Wei Liu <[email protected]> wrote:
> >>>>
> >>>> On 12/13/2021 9:06 PM, Michael S. Tsirkin wrote:
> >>>>> On Mon, Dec 13, 2021 at 05:59:45PM -0800, Si-Wei Liu wrote:
> >>>>>> On 12/12/2021 1:26 AM, Michael S. Tsirkin wrote:
> >>>>>>> On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
> >>>>>>>> Sorry for reviving this ancient thread. I was kinda lost for the conclusion
> >>>>>>>> it ended up with. I have the following questions,
> >>>>>>>>
> >>>>>>>> 1. legacy guest support: from the past conversations it doesn't seem the
> >>>>>>>> support will be completely dropped from the table, is my understanding
> >>>>>>>> correct? Actually we're interested in supporting virtio v0.95 guest for x86,
> >>>>>>>> which is backed by the spec at
> >>>>>>>> https://urldefense.com/v3/__https://ozlabs.org/*rusty/virtio-spec/virtio-0.9.5.pdf__;fg!!ACWV5N9M2RV99hQ!dTKmzJwwRsFM7BtSuTDu1cNly5n4XCotH0WYmidzGqHSXt40i7ZU43UcNg7GYxZg$ . Though I'm not sure
> >>>>>>>> if there's request/need to support wilder legacy virtio versions earlier
> >>>>>>>> beyond.
> >>>>>>> I personally feel it's less work to add in kernel than try to
> >>>>>>> work around it in userspace. Jason feels differently.
> >>>>>>> Maybe post the patches and this will prove to Jason it's not
> >>>>>>> too terrible?
> >>>>>> I suppose if the vdpa vendor does support 0.95 in the datapath and ring
> >>>>>> layout level and is limited to x86 only, there should be easy way out.
> >>>>> Note a subtle difference: what matters is that guest, not host is x86.
> >>>>> Matters for emulators which might reorder memory accesses.
> >>>>> I guess this enforcement belongs in QEMU then?
> >>>> Right, I mean to get started, the initial guest driver support and the
> >>>> corresponding QEMU support for transitional vdpa backend can be limited
> >>>> to x86 guest/host only. Since the config space is emulated in QEMU, I
> >>>> suppose it's not hard to enforce in QEMU.
> >>> It's more than just config space, most devices have headers before the buffer.
> >> The ordering in datapath (data VQs) would have to rely on vendor's
> >> support. Since ORDER_PLATFORM is pretty new (v1.1), I guess vdpa h/w
> >> vendor nowadays can/should well support the case when ORDER_PLATFORM is
> >> not acked by the driver (actually this feature is filtered out by the
> >> QEMU vhost-vdpa driver today), even with v1.0 spec conforming and modern
> >> only vDPA device.
> > That's a bug that needs to be fixed.
> >
> >> The control VQ is implemented in software in the
> >> kernel, which can be easily accommodated/fixed when needed.
> >>
> >>>> QEMU can drive GET_LEGACY,
> >>>> GET_ENDIAN et al ioctls in advance to get the capability from the
> >>>> individual vendor driver. For that, we need another negotiation protocol
> >>>> similar to vhost_user's protocol_features between the vdpa kernel and
> >>>> QEMU, way before the guest driver is ever probed and its feature
> >>>> negotiation kicks in. Not sure we need a GET_MEMORY_ORDER ioctl call
> >>>> from the device, but we can assume weak ordering for legacy at this
> >>>> point (x86 only)?
> >>> I'm lost here, we have get_features() so:
> >> I assume here you refer to get_device_features() that Eli just changed
> >> the name.
> >>> 1) VERSION_1 means the device uses LE if provided, otherwise natvie
> >>> 2) ORDER_PLATFORM means device requires platform ordering
> >>>
> >>> Any reason for having a new API for this?
> >> Are you going to enforce all vDPA hardware vendors to support the
> >> transitional model for legacy guest? meaning guest not acknowledging
> >> VERSION_1 would use the legacy interfaces captured in the spec section
> >> 7.4 (regarding ring layout, native endianness, message framing, vq
> >> alignment of 4096, 32bit feature, no features_ok bit in status, IO port
> >> interface i.e. all the things) instead? Noted we don't yet have a
> >> set_device_features() that allows the vdpa device to tell whether it is
> >> operating in transitional or modern-only mode. For software virtio, all
> >> support for the legacy part in a transitional model has been built up
> >> there already, however, it's not easy for vDPA vendors to implement all
> >> the requirements for an all-or-nothing legacy guest support (big endian
> >> guest for example). To these vendors, the legacy support within a
> >> transitional model is more of feature to them and it's best to leave
> >> some flexibility for them to implement partial support for legacy. That
> >> in turn calls out the need for a vhost-user protocol feature like
> >> negotiation API that can prohibit those unsupported guest setups to as
> >> early as backend_init before launching the VM.
> >>
> >>
> >>>>>> I
> >>>>>> checked with Eli and other Mellanox/NVDIA folks for hardware/firmware level
> >>>>>> 0.95 support, it seems all the ingredient had been there already dated back
> >>>>>> to the DPDK days. The only major thing limiting is in the vDPA software that
> >>>>>> the current vdpa core has the assumption around VIRTIO_F_ACCESS_PLATFORM for
> >>>>>> a few DMA setup ops, which is virtio 1.0 only.
> >>>>>>
> >>>>>>>> 2. suppose some form of legacy guest support needs to be there, how do we
> >>>>>>>> deal with the bogus assumption below in vdpa_get_config() in the short term?
> >>>>>>>> It looks one of the intuitive fix is to move the vdpa_set_features call out
> >>>>>>>> of vdpa_get_config() to vdpa_set_config().
> >>>>>>>>
> >>>>>>>> /*
> >>>>>>>> * Config accesses aren't supposed to trigger before features are
> >>>>>>>> set.
> >>>>>>>> * If it does happen we assume a legacy guest.
> >>>>>>>> */
> >>>>>>>> if (!vdev->features_valid)
> >>>>>>>> vdpa_set_features(vdev, 0);
> >>>>>>>> ops->get_config(vdev, offset, buf, len);
> >>>>>>>>
> >>>>>>>> I can post a patch to fix 2) if there's consensus already reached.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> -Siwei
> >>>>>>> I'm not sure how important it is to change that.
> >>>>>>> In any case it only affects transitional devices, right?
> >>>>>>> Legacy only should not care ...
> >>>>>> Yes I'd like to distinguish legacy driver (suppose it is 0.95) against the
> >>>>>> modern one in a transitional device model rather than being legacy only.
> >>>>>> That way a v0.95 and v1.0 supporting vdpa parent can support both types of
> >>>>>> guests without having to reconfigure. Or are you suggesting limit to legacy
> >>>>>> only at the time of vdpa creation would simplify the implementation a lot?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> -Siwei
> >>>>> I don't know for sure. Take a look at the work Halil was doing
> >>>>> to try and support transitional devices with BE guests.
> >>>> Hmmm, we can have those endianness ioctls defined but the initial QEMU
> >>>> implementation can be started to support x86 guest/host with little
> >>>> endian and weak memory ordering first. The real trick is to detect
> >>>> legacy guest - I am not sure if it's feasible to shift all the legacy
> >>>> detection work to QEMU, or the kernel has to be part of the detection
> >>>> (e.g. the kick before DRIVER_OK thing we have to duplicate the tracking
> >>>> effort in QEMU) as well. Let me take a further look and get back.
> >>> Michael may think differently but I think doing this in Qemu is much easier.
> >> I think the key is whether we position emulating legacy interfaces in
> >> QEMU doing translation on top of a v1.0 modern-only device in the
> >> kernel, or we allow vdpa core (or you can say vhost-vdpa) and vendor
> >> driver to support a transitional model in the kernel that is able to
> >> work for both v0.95 and v1.0 drivers, with some slight aid from QEMU for
> >> detecting/emulation/shadowing (for e.g CVQ, I/O port relay). I guess for
> >> the former we still rely on vendor for a performant data vqs
> >> implementation, leaving the question to what it may end up eventually in
> >> the kernel is effectively the latter).
> > I think we can do the legacy interface emulation on top of the shadow
> > VQ. And we know it works for sure. But I agree, it would be much
> > easier if we depend on the vendor to implement a transitional device.
> First I am not sure if there's a convincing case for users to deploy
> vDPA with shadow (data) VQ against the pure software based backend.
> Please enlighten me if there is.

The problem is shadow VQ is the only solution that can works for all the cases.

>
> For us, the point to deploy vDPA for legacy guest is the acceleration
> (what "A" stands for in "vDPA") part of it so that we can leverage the
> hardware potential if at all possible. Not sure how the shadow VQ
> implementation can easily deal with datapath acceleration without losing
> too much performance?

It's not easy, shadow VQ will lose performance for sure.

>
> > So assuming we depend on the vendor, I don't see anything that is
> > strictly needed in the kernel, the kick or config access before
> > DRIVER_OK can all be handled easily in Qemu unless I miss something.
> Right, that's what I think too it's not quite a lot of work in the
> kernel if vendor device offers the aid/support for transitional. The
> kernel only provides the abstraction of device model (transitional or
> modern-only), while vendor driver may implement early platform feature
> discovery and apply legacy specific quirks (unsupported endianness,
> mismatched page size, unsupported host memory ordering model) that the
> device can't adapt to. I don't say we have to depend on the vendor, but
> the point is that we must assume fully spec compliant transitional
> support (the datapath in particular) from the vendor to get started, as
> I guess it's probably the main motivation for users to deploy it -
> acceleration of legacy guest workload without exhausting host computing
> resource. Even if we get started with shadow VQ to mediate and translate
> the datapath, eventually it may evolve towards leveraging datapath
> offload to hardware if acceleration is the only convincing use case for
> legacy support.

Yes, so as discussed, I don't object the idea, kernel patches are more
than welcomed.

Thanks

>
> Thanks,
> -Siwei
> > The only value to do that in the kernel is that it can work for
> > virtio-vdpa, but modern only virito-vpda is sufficient; we don't need
> > any legacy stuff for that.
> >
> > Thanks
> >
> >> Thanks,
> >> -Siwei
> >>
> >>> Thanks
> >>>
> >>>
> >>>
> >>>> Meanwhile, I'll check internally to see if a legacy only model would
> >>>> work. Thanks.
> >>>>
> >>>> Thanks,
> >>>> -Siwei
> >>>>
> >>>>
> >>>>>>>> On 3/2/2021 2:53 AM, Jason Wang wrote:
> >>>>>>>>> On 2021/3/2 5:47 下午, Michael S. Tsirkin wrote:
> >>>>>>>>>> On Mon, Mar 01, 2021 at 11:56:50AM +0800, Jason Wang wrote:
> >>>>>>>>>>> On 2021/3/1 5:34 上午, Michael S. Tsirkin wrote:
> >>>>>>>>>>>> On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
> >>>>>>>>>>>>>> Detecting it isn't enough though, we will need a new ioctl to notify
> >>>>>>>>>>>>>> the kernel that it's a legacy guest. Ugh :(
> >>>>>>>>>>>>> Well, although I think adding an ioctl is doable, may I
> >>>>>>>>>>>>> know what the use
> >>>>>>>>>>>>> case there will be for kernel to leverage such info
> >>>>>>>>>>>>> directly? Is there a
> >>>>>>>>>>>>> case QEMU can't do with dedicate ioctls later if there's indeed
> >>>>>>>>>>>>> differentiation (legacy v.s. modern) needed?
> >>>>>>>>>>>> BTW a good API could be
> >>>>>>>>>>>>
> >>>>>>>>>>>> #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> >>>>>>>>>>>> #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> >>>>>>>>>>>>
> >>>>>>>>>>>> we did it per vring but maybe that was a mistake ...
> >>>>>>>>>>> Actually, I wonder whether it's good time to just not support
> >>>>>>>>>>> legacy driver
> >>>>>>>>>>> for vDPA. Consider:
> >>>>>>>>>>>
> >>>>>>>>>>> 1) It's definition is no-normative
> >>>>>>>>>>> 2) A lot of budren of codes
> >>>>>>>>>>>
> >>>>>>>>>>> So qemu can still present the legacy device since the config
> >>>>>>>>>>> space or other
> >>>>>>>>>>> stuffs that is presented by vhost-vDPA is not expected to be
> >>>>>>>>>>> accessed by
> >>>>>>>>>>> guest directly. Qemu can do the endian conversion when necessary
> >>>>>>>>>>> in this
> >>>>>>>>>>> case?
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks
> >>>>>>>>>>>
> >>>>>>>>>> Overall I would be fine with this approach but we need to avoid breaking
> >>>>>>>>>> working userspace, qemu releases with vdpa support are out there and
> >>>>>>>>>> seem to work for people. Any changes need to take that into account
> >>>>>>>>>> and document compatibility concerns.
> >>>>>>>>> Agree, let me check.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> I note that any hardware
> >>>>>>>>>> implementation is already broken for legacy except on platforms with
> >>>>>>>>>> strong ordering which might be helpful in reducing the scope.
> >>>>>>>>> Yes.
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>>
> >>>>>>>>>
>


2021-12-17 02:15:32

by Jason Wang

[permalink] [raw]
Subject: Re: vdpa legacy guest support (was Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero)

On Fri, Dec 17, 2021 at 10:01 AM Michael S. Tsirkin <[email protected]> wrote:
>
> On Fri, Dec 17, 2021 at 09:57:38AM +0800, Jason Wang wrote:
> > On Fri, Dec 17, 2021 at 6:32 AM Si-Wei Liu <[email protected]> wrote:
> > >
> > >
> > >
> > > On 12/15/2021 6:53 PM, Jason Wang wrote:
> > > > On Thu, Dec 16, 2021 at 10:02 AM Si-Wei Liu <[email protected]> wrote:
> > > >>
> > > >>
> > > >> On 12/15/2021 1:33 PM, Michael S. Tsirkin wrote:
> > > >>> On Wed, Dec 15, 2021 at 12:52:20PM -0800, Si-Wei Liu wrote:
> > > >>>> On 12/14/2021 6:06 PM, Jason Wang wrote:
> > > >>>>> On Wed, Dec 15, 2021 at 9:05 AM Si-Wei Liu <[email protected]> wrote:
> > > >>>>>> On 12/13/2021 9:06 PM, Michael S. Tsirkin wrote:
> > > >>>>>>> On Mon, Dec 13, 2021 at 05:59:45PM -0800, Si-Wei Liu wrote:
> > > >>>>>>>> On 12/12/2021 1:26 AM, Michael S. Tsirkin wrote:
> > > >>>>>>>>> On Fri, Dec 10, 2021 at 05:44:15PM -0800, Si-Wei Liu wrote:
> > > >>>>>>>>>> Sorry for reviving this ancient thread. I was kinda lost for the conclusion
> > > >>>>>>>>>> it ended up with. I have the following questions,
> > > >>>>>>>>>>
> > > >>>>>>>>>> 1. legacy guest support: from the past conversations it doesn't seem the
> > > >>>>>>>>>> support will be completely dropped from the table, is my understanding
> > > >>>>>>>>>> correct? Actually we're interested in supporting virtio v0.95 guest for x86,
> > > >>>>>>>>>> which is backed by the spec at
> > > >>>>>>>>>> https://urldefense.com/v3/__https://ozlabs.org/*rusty/virtio-spec/virtio-0.9.5.pdf__;fg!!ACWV5N9M2RV99hQ!dTKmzJwwRsFM7BtSuTDu1cNly5n4XCotH0WYmidzGqHSXt40i7ZU43UcNg7GYxZg$ . Though I'm not sure
> > > >>>>>>>>>> if there's request/need to support wilder legacy virtio versions earlier
> > > >>>>>>>>>> beyond.
> > > >>>>>>>>> I personally feel it's less work to add in kernel than try to
> > > >>>>>>>>> work around it in userspace. Jason feels differently.
> > > >>>>>>>>> Maybe post the patches and this will prove to Jason it's not
> > > >>>>>>>>> too terrible?
> > > >>>>>>>> I suppose if the vdpa vendor does support 0.95 in the datapath and ring
> > > >>>>>>>> layout level and is limited to x86 only, there should be easy way out.
> > > >>>>>>> Note a subtle difference: what matters is that guest, not host is x86.
> > > >>>>>>> Matters for emulators which might reorder memory accesses.
> > > >>>>>>> I guess this enforcement belongs in QEMU then?
> > > >>>>>> Right, I mean to get started, the initial guest driver support and the
> > > >>>>>> corresponding QEMU support for transitional vdpa backend can be limited
> > > >>>>>> to x86 guest/host only. Since the config space is emulated in QEMU, I
> > > >>>>>> suppose it's not hard to enforce in QEMU.
> > > >>>>> It's more than just config space, most devices have headers before the buffer.
> > > >>>> The ordering in datapath (data VQs) would have to rely on vendor's support.
> > > >>>> Since ORDER_PLATFORM is pretty new (v1.1), I guess vdpa h/w vendor nowadays
> > > >>>> can/should well support the case when ORDER_PLATFORM is not acked by the
> > > >>>> driver (actually this feature is filtered out by the QEMU vhost-vdpa driver
> > > >>>> today), even with v1.0 spec conforming and modern only vDPA device. The
> > > >>>> control VQ is implemented in software in the kernel, which can be easily
> > > >>>> accommodated/fixed when needed.
> > > >>>>
> > > >>>>>> QEMU can drive GET_LEGACY,
> > > >>>>>> GET_ENDIAN et al ioctls in advance to get the capability from the
> > > >>>>>> individual vendor driver. For that, we need another negotiation protocol
> > > >>>>>> similar to vhost_user's protocol_features between the vdpa kernel and
> > > >>>>>> QEMU, way before the guest driver is ever probed and its feature
> > > >>>>>> negotiation kicks in. Not sure we need a GET_MEMORY_ORDER ioctl call
> > > >>>>>> from the device, but we can assume weak ordering for legacy at this
> > > >>>>>> point (x86 only)?
> > > >>>>> I'm lost here, we have get_features() so:
> > > >>>> I assume here you refer to get_device_features() that Eli just changed the
> > > >>>> name.
> > > >>>>> 1) VERSION_1 means the device uses LE if provided, otherwise natvie
> > > >>>>> 2) ORDER_PLATFORM means device requires platform ordering
> > > >>>>>
> > > >>>>> Any reason for having a new API for this?
> > > >>>> Are you going to enforce all vDPA hardware vendors to support the
> > > >>>> transitional model for legacy guest?
> > > > Do we really have other choices?
> > > >
> > > > I suspect the legacy device is never implemented by any vendor:
> > > >
> > > > 1) no virtio way to detect host endian
> > > This is even true for transitional device that is conforming to the
> > > spec, right?
> >
> > For hardware, yes.
> >
> > > The transport specific way to detect host endian is still
> > > being discussed and the spec revision is not finalized yet so far as I
> > > see. Why this suddenly becomes a requirement/blocker for h/w vendors to
> > > implement the transitional model?
> >
> > It's not a sudden blocker, the problem has existed since day 0 if I
> > was not wrong. That's why the problem looks a little bit complicated
> > and why it would be much simpler if we stick to modern devices.
> >
> > > Even if the spec is out, this is
> > > pretty new and I suspect not all vendor would follow right away. I hope
> > > the software framework can be tolerant with h/w vendors not supporting
> > > host endianess (BE specifically) or not detecting it if they would like
> > > to support a transitional device for legacy.
> >
> > Well, if we know we don't want to support the BE host it would be fine.
>
> I think you guys mean guest not host here. Same for memory ordering etc.
> What matters is whether guest has barriers etc.
>

Yes.

Thanks