Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp3856625pxb; Mon, 8 Feb 2021 01:43:03 -0800 (PST) X-Google-Smtp-Source: ABdhPJzRGvkCfBLFSFqOIaAZMc0SXKmxE5kKkrrVzxUpubr49aPOwbK24L+r4ifLJCQz2Z/SRQDF X-Received: by 2002:a50:d302:: with SMTP id g2mr12337986edh.75.1612777382979; Mon, 08 Feb 2021 01:43:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612777382; cv=none; d=google.com; s=arc-20160816; b=YxG1gkTcPVQlNcyrIBdZDrWCxmDOO32r8btmSGK0SrgxvJg1GGXR5dnIq4iIczDxXO Kv13zwuJwL/gjEk3qtCCeRjgRm2qguy7BrENEbibxgU3g4p5MGG1A4nIcLEU9AA6BbS/ pj1DUyj+VVgdQLtZX8nUuASCJenRXvlBTAghNlzVeGCynCWulMlRIjffv1Txq7CMKqSJ jvsYJuTsBdflCeonk5EZCuTBmj1luhphpc8W4wfLnZuRQoZEGzcTjNiqbwBjCmffs3xb rlj2aVKhbiNJtumMVY2EeM43iG4bOiQ6YnXa9GE9+mkqNTNPzPQsydMPtWurn/SxLAuR oHFw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=hYNmLVWjlxT+k3aE7QT+IAQgIjwsIKvrF+0CAZ8krTw=; b=Br7FXlK7zZthOgcDXaZ1a3fWdr8QVUbfMDY/bv/ZC4mk7fbFTEC0liuqJL+LtzgmFp Cm7awyFV7ETKdU4gg/ttbvu5hZPAQONmRds6aiCyGeI8Ec3J9lTuxqGfAQN/5cvaHaTO MK3HT8u/4QBZvC/KZ7stYxzbMXGylIsYgA7Si6cEfdvymXIi2Znw+cqLQrz+bT1O009K Col2rr/URSC3LGf2cka6mXLLEzHYTn6jFbVpvZ3x7W5hoC1Xege1A3ZULnCn9f7ze7zm hNTgKxmLmUnO0Imzryp7f7TZc8MaVbKA21C2TuUVnwXWLu7vKZAJi7pzZaLMN9cVkT2C Zrqg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=MYg2BY+4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q17si10802145eja.654.2021.02.08.01.42.39; Mon, 08 Feb 2021 01:43:02 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=MYg2BY+4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231722AbhBHJlM (ORCPT + 99 others); Mon, 8 Feb 2021 04:41:12 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:48995 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231667AbhBHJ1y (ORCPT ); Mon, 8 Feb 2021 04:27:54 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1612776387; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hYNmLVWjlxT+k3aE7QT+IAQgIjwsIKvrF+0CAZ8krTw=; b=MYg2BY+4qJXF388ilL5YLzk0TFEgG/3BkN20zPIFHnwFYrm+Ngj7HQRj6dghoPsLKy6S9Y nxaouX7yRUm/bcE9b6p7vcf0sOJQn1qOExPgstyx2f0mLMLX5BxMdocClRQRE5wlrRNj8x W7CEOsF+5yxUS9WT05G3pImIdhkjsfs= Received: from mail-ej1-f69.google.com (mail-ej1-f69.google.com [209.85.218.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-435-u9sblMDBOxKov_lq9DEtWA-1; Mon, 08 Feb 2021 04:26:26 -0500 X-MC-Unique: u9sblMDBOxKov_lq9DEtWA-1 Received: by mail-ej1-f69.google.com with SMTP id h4so11226175eja.12 for ; Mon, 08 Feb 2021 01:26:26 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=hYNmLVWjlxT+k3aE7QT+IAQgIjwsIKvrF+0CAZ8krTw=; b=pTDGeld2TyUTtG+6sbkt3UaFOPJTp0/LTO6Auv4FFLiLpv3lV/TiIVfpWgAppMOrEq pClWQFLDzasULgwJ0mtzWkv2kUxZ6DIdFduuPdnfiIYqqKNfC+AkJysa4IF9+uWASEgw kuFCz4CG9k9kJ/lNgkFsO/79G0sScO6nfStG6CibATo0o291ZuNiqrubO5RbhyhmF1DF zQV8bon0sYdrxGv3/AkP893SoNhOQFo+Q/gVHenZSf4VogSA9POKXZI3+3uHZyjWCxNw 807g9dj++INzFF7J6tvmhJ1eXGrkB9svEGjX2zfLgCUTvQ9aEJhO45YZb3aCDoXzdm7l 7djQ== X-Gm-Message-State: AOAM533vHQRlKY+IZk48Iic8XMvIYsNlszQngudZzgutGOp9q7yNML+G QyUkSfpiNYgCZW/rkhduVWWucg0mFijAywaH8FuMsiO2+wZYv83SRGk2zMxJtIxLaDQdNflC2vH SOO0plXrHWpH/LY/X+K2L/j/x X-Received: by 2002:a17:906:eca5:: with SMTP id qh5mr15724845ejb.161.1612776385083; Mon, 08 Feb 2021 01:26:25 -0800 (PST) X-Received: by 2002:a17:906:eca5:: with SMTP id qh5mr15724824ejb.161.1612776384845; Mon, 08 Feb 2021 01:26:24 -0800 (PST) Received: from redhat.com (bzq-79-180-2-31.red.bezeqint.net. [79.180.2.31]) by smtp.gmail.com with ESMTPSA id y13sm4822537eds.25.2021.02.08.01.26.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 Feb 2021 01:26:23 -0800 (PST) Date: Mon, 8 Feb 2021 04:26:20 -0500 From: "Michael S. Tsirkin" To: Jason Wang Cc: Eli Cohen , Si-Wei Liu , virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, lulu@redhat.com Subject: Re: [PATCH v1] vdpa/mlx5: Restore the hardware used index after change map Message-ID: <20210208042512-mutt-send-email-mst@kernel.org> References: <20210204073618.36336-1-elic@nvidia.com> <81f5ce4f-cdb0-26cd-0dce-7ada824b1b86@oracle.com> <20210208063736.GA166546@mtl-vdi-166.wap.labs.mlnx> <0d592ed0-3cea-cfb0-9b7b-9d2755da3f12@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <0d592ed0-3cea-cfb0-9b7b-9d2755da3f12@redhat.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 08, 2021 at 05:04:27PM +0800, Jason Wang wrote: > > On 2021/2/8 下午2:37, Eli Cohen wrote: > > On Mon, Feb 08, 2021 at 12:27:18PM +0800, Jason Wang wrote: > > > On 2021/2/6 上午7:07, Si-Wei Liu wrote: > > > > > > > > On 2/3/2021 11:36 PM, Eli Cohen wrote: > > > > > When a change of memory map occurs, the hardware resources are destroyed > > > > > and then re-created again with the new memory map. In such case, we need > > > > > to restore the hardware available and used indices. The driver failed to > > > > > restore the used index which is added here. > > > > > > > > > > Also, since the driver also fails to reset the available and used > > > > > indices upon device reset, fix this here to avoid regression caused by > > > > > the fact that used index may not be zero upon device reset. > > > > > > > > > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 > > > > > devices") > > > > > Signed-off-by: Eli Cohen > > > > > --- > > > > > v0 -> v1: > > > > > Clear indices upon device reset > > > > > > > > > >   drivers/vdpa/mlx5/net/mlx5_vnet.c | 18 ++++++++++++++++++ > > > > >   1 file changed, 18 insertions(+) > > > > > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > > index 88dde3455bfd..b5fe6d2ad22f 100644 > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > > > @@ -87,6 +87,7 @@ struct mlx5_vq_restore_info { > > > > >       u64 device_addr; > > > > >       u64 driver_addr; > > > > >       u16 avail_index; > > > > > +    u16 used_index; > > > > >       bool ready; > > > > >       struct vdpa_callback cb; > > > > >       bool restore; > > > > > @@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue { > > > > >       u32 virtq_id; > > > > >       struct mlx5_vdpa_net *ndev; > > > > >       u16 avail_idx; > > > > > +    u16 used_idx; > > > > >       int fw_state; > > > > >         /* keep last in the struct */ > > > > > @@ -804,6 +806,7 @@ static int create_virtqueue(struct mlx5_vdpa_net > > > > > *ndev, struct mlx5_vdpa_virtque > > > > >         obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, > > > > > obj_context); > > > > >       MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, > > > > > mvq->avail_idx); > > > > > +    MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, > > > > > mvq->used_idx); > > > > >       MLX5_SET(virtio_net_q_object, obj_context, > > > > > queue_feature_bit_mask_12_3, > > > > >            get_features_12_3(ndev->mvdev.actual_features)); > > > > >       vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, > > > > > virtio_q_context); > > > > > @@ -1022,6 +1025,7 @@ static int connect_qps(struct mlx5_vdpa_net > > > > > *ndev, struct mlx5_vdpa_virtqueue *m > > > > >   struct mlx5_virtq_attr { > > > > >       u8 state; > > > > >       u16 available_index; > > > > > +    u16 used_index; > > > > >   }; > > > > >     static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct > > > > > mlx5_vdpa_virtqueue *mvq, > > > > > @@ -1052,6 +1056,7 @@ static int query_virtqueue(struct > > > > > mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueu > > > > >       memset(attr, 0, sizeof(*attr)); > > > > >       attr->state = MLX5_GET(virtio_net_q_object, obj_context, state); > > > > >       attr->available_index = MLX5_GET(virtio_net_q_object, > > > > > obj_context, hw_available_index); > > > > > +    attr->used_index = MLX5_GET(virtio_net_q_object, obj_context, > > > > > hw_used_index); > > > > >       kfree(out); > > > > >       return 0; > > > > >   @@ -1535,6 +1540,16 @@ static void teardown_virtqueues(struct > > > > > mlx5_vdpa_net *ndev) > > > > >       } > > > > >   } > > > > >   +static void clear_virtqueues(struct mlx5_vdpa_net *ndev) > > > > > +{ > > > > > +    int i; > > > > > + > > > > > +    for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) { > > > > > +        ndev->vqs[i].avail_idx = 0; > > > > > +        ndev->vqs[i].used_idx = 0; > > > > > +    } > > > > > +} > > > > > + > > > > >   /* TODO: cross-endian support */ > > > > >   static inline bool mlx5_vdpa_is_little_endian(struct mlx5_vdpa_dev > > > > > *mvdev) > > > > >   { > > > > > @@ -1610,6 +1625,7 @@ static int save_channel_info(struct > > > > > mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqu > > > > >           return err; > > > > >         ri->avail_index = attr.available_index; > > > > > +    ri->used_index = attr.used_index; > > > > >       ri->ready = mvq->ready; > > > > >       ri->num_ent = mvq->num_ent; > > > > >       ri->desc_addr = mvq->desc_addr; > > > > > @@ -1654,6 +1670,7 @@ static void restore_channels_info(struct > > > > > mlx5_vdpa_net *ndev) > > > > >               continue; > > > > >             mvq->avail_idx = ri->avail_index; > > > > > +        mvq->used_idx = ri->used_index; > > > > >           mvq->ready = ri->ready; > > > > >           mvq->num_ent = ri->num_ent; > > > > >           mvq->desc_addr = ri->desc_addr; > > > > > @@ -1768,6 +1785,7 @@ static void mlx5_vdpa_set_status(struct > > > > > vdpa_device *vdev, u8 status) > > > > >       if (!status) { > > > > >           mlx5_vdpa_info(mvdev, "performing device reset\n"); > > > > >           teardown_driver(ndev); > > > > > +        clear_virtqueues(ndev); > > > > The clearing looks fine at the first glance, as it aligns with the other > > > > state cleanups floating around at the same place. However, the thing is > > > > get_vq_state() is supposed to be called right after to get sync'ed with > > > > the latest internal avail_index from device while vq is stopped. The > > > > index was saved in the driver software at vq suspension, but before the > > > > virtq object is destroyed. We shouldn't clear the avail_index too early. > > > > > > Good point. > > > > > > There's a limitation on the virtio spec and vDPA framework that we can not > > > simply differ device suspending from device reset. > > > > > Are you talking about live migration where you reset the device but > > still want to know how far it progressed in order to continue from the > > same place in the new VM? > > > Yes. So if we want to support live migration at we need: > > in src node: > 1) suspend the device > 2) get last_avail_idx via get_vq_state() > > in the dst node: > 3) set last_avail_idx via set_vq_state() > 4) resume the device > > So you can see, step 2 requires the device/driver not to forget the > last_avail_idx. > > The annoying thing is that, in the virtio spec there's no definition of > device suspending. So we reuse set_status(0) right now for vq suspending. > Then if we forget last_avail_idx in set_status(0), it will break the > assumption of step 2). > > > > > > > Need to think about that. I suggest a new state in [1], the issue is that > > > people doesn't like the asynchronous API that it introduces. > > > > > > > > > > Possibly it can be postponed to where VIRTIO_CONFIG_S_DRIVER_OK gets set > > > > again, i.e. right before the setup_driver() in mlx5_vdpa_set_status()? > > > > > > Looks like a good workaround. > > > Rethink of this, this won't work for the step 4), if we reuse the S_DRING_OK > for resuming. > > The most clean way is to invent the feature in virtio spec and implement > that in the driver. > > Thanks Given it's between parts of device (between qemu and host kernel) I don't think we need it in the spec even. Just add a new ioctl. > > > > > > > > Thanks > > > > > > > > > > -Siwei > > > > > > [1] > > > https://lists.oasis-open.org/archives/virtio-comment/202012/msg00029.html > > > > > > > > > > > mlx5_vdpa_destroy_mr(&ndev->mvdev); > > > > >           ndev->mvdev.status = 0; > > > > >           ndev->mvdev.mlx_features = 0;