Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp20337pxb; Tue, 2 Feb 2021 21:20:47 -0800 (PST) X-Google-Smtp-Source: ABdhPJz5luL3r6PGZSxEXy71b6BPiHTe6ucOuw2FjewHsmgvT0t5ktlLMtkl91G34AEo0roObPfw X-Received: by 2002:a05:6402:1215:: with SMTP id c21mr1355310edw.310.1612329647410; Tue, 02 Feb 2021 21:20:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612329647; cv=none; d=google.com; s=arc-20160816; b=bcZnweppcddK26SvQ57NIvGEw/yo8vCDNQwarcgw//vD5/IEq6RBhP88TBxolxph/k 6VFj1QAXcaB41t18rHNU9Z0XPSrV5TVltQGTcX64Ahi73HudOMHpjexUSGA2tzsAHeyp xgaGuRqCMwAP2O478btU76iJCew2hp4fXHUF2MQtGFd2N/IxowJrhWUycKx2JtTYs7JN i7E4Zdk53RY5Gvs8qTzK7kGZUwD53+tBUv0dTwouWgp705Srq7ADWOJKplDXlyJdXGG1 J2k8z8DJ6/isq1AC9w17B4w06N8MSkmuA9UfnAcXIKDpkHsX9/P7EQAqlM+jwnhI4pX8 Pemg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-language:content-transfer-encoding :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=RIuXnL+Hg9KeDbYH0PNTbg1/8jFx78+qdyuMnTHT5HY=; b=LyVkxhp7g/4+UhodjpzG42JzHRIyPJHaC19LKrLDkyN7Etmqy/sEy/6RC7B+8TG7zW fUa4Wxt1ko18Q+blB25mGJKTKhCYvaA6a8D/qqmt8J8jZaY45FCyYJL71AYE2guAa+F8 W3117oTnLa/BZBDQMqDSjuXUDcEBQY5ZY0H21jHlv8Aus/e+KgBD5NMZYziCrA5CPd/1 kax94cXKcmFzq3MaX4t7/P/3oUDfugnHdqDtVqAFGC4ChoZMiEt4qK0fnUhS32mTxawT //NTd2mMu/7XReVJn/EM+eG47wPFRv5tE4itx8Msv2xumw2IIqhOrinkDt/EzwkdE9ur UoJQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=f0KZ7zlm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 2si646101ejv.698.2021.02.02.21.20.07; Tue, 02 Feb 2021 21:20:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=f0KZ7zlm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229995AbhBCFRx (ORCPT + 99 others); Wed, 3 Feb 2021 00:17:53 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:28799 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229727AbhBCFRw (ORCPT ); Wed, 3 Feb 2021 00:17:52 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1612329384; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RIuXnL+Hg9KeDbYH0PNTbg1/8jFx78+qdyuMnTHT5HY=; b=f0KZ7zlm/L87EotrTKbOvRm3xzc+GfwRCDim/rPGxVwIxdpoY6p4Bi2Pb1gc6oaU59+zBY xQJq4j7bH3LNEqmyWAQx9qmhMJ2Yn4t1Yt3r94c2MD3Ayxm3AI1AsGVl6SoV+r/AlR9b7A YCP8yV9EwPgh7P2r0AE7kKCRa6xpl9o= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-362-Y8GF8fAEOJOwDKjLcUuz0Q-1; Wed, 03 Feb 2021 00:16:22 -0500 X-MC-Unique: Y8GF8fAEOJOwDKjLcUuz0Q-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1AF4D107ACE3; Wed, 3 Feb 2021 05:16:21 +0000 (UTC) Received: from [10.72.13.97] (ovpn-13-97.pek2.redhat.com [10.72.13.97]) by smtp.corp.redhat.com (Postfix) with ESMTP id 803D472167; Wed, 3 Feb 2021 05:16:15 +0000 (UTC) Subject: Re: [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue To: Si-Wei Liu , Eli Cohen Cc: mst@redhat.com, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, lulu@redhat.com, Si-Wei Liu References: <20210128134130.3051-1-elic@nvidia.com> <20210128134130.3051-2-elic@nvidia.com> <9d6058d6-5ce1-0442-8fd9-5a6fe6a0bc6b@redhat.com> <20210202070631.GA233234@mtl-vdi-166.wap.labs.mlnx> <20210202092253.GA236663@mtl-vdi-166.wap.labs.mlnx> From: Jason Wang Message-ID: Date: Wed, 3 Feb 2021 13:16:14 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021/2/3 上午1:54, Si-Wei Liu wrote: > On Tue, Feb 2, 2021 at 1:23 AM Eli Cohen wrote: >> On Tue, Feb 02, 2021 at 12:38:51AM -0800, Si-Wei Liu wrote: >>> Thanks Eli and Jason for clarifications. See inline. >>> >>> On Mon, Feb 1, 2021 at 11:06 PM Eli Cohen wrote: >>>> On Tue, Feb 02, 2021 at 02:02:25PM +0800, Jason Wang wrote: >>>>> On 2021/2/2 下午12:15, Si-Wei Liu wrote: >>>>>> On Mon, Feb 1, 2021 at 7:13 PM Jason Wang wrote: >>>>>>> On 2021/2/2 上午3:17, Si-Wei Liu wrote: >>>>>>>> On Mon, Feb 1, 2021 at 10:51 AM Si-Wei Liu wrote: >>>>>>>>> On Thu, Jan 28, 2021 at 5:46 AM Eli Cohen wrote: >>>>>>>>>> suspend_vq should only suspend the VQ on not save the current available >>>>>>>>>> index. This is done when a change of map occurs when the driver calls >>>>>>>>>> save_channel_info(). >>>>>>>>> Hmmm, suspend_vq() is also called by teardown_vq(), the latter of >>>>>>>>> which doesn't save the available index as save_channel_info() doesn't >>>>>>>>> get called in that path at all. How does it handle the case that >>>>>>>>> aget_vq_state() is called from userspace (e.g. QEMU) while the >>>>>>>>> hardware VQ object was torn down, but userspace still wants to access >>>>>>>>> the queue index? >>>>>>>>> >>>>>>>>> Refer to https://lore.kernel.org/netdev/1601583511-15138-1-git-send-email-si-wei.liu@oracle.com/ >>>>>>>>> >>>>>>>>> vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11) >>>>>>>>> vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11) >>>>>>>>> >>>>>>>>> QEMU will complain with the above warning while VM is being rebooted >>>>>>>>> or shut down. >>>>>>>>> >>>>>>>>> Looks to me either the kernel driver should cover this requirement, or >>>>>>>>> the userspace has to bear the burden in saving the index and not call >>>>>>>>> into kernel if VQ is destroyed. >>>>>>>> Actually, the userspace doesn't have the insights whether virt queue >>>>>>>> will be destroyed if just changing the device status via set_status(). >>>>>>>> Looking at other vdpa driver in tree i.e. ifcvf it doesn't behave like >>>>>>>> so. Hence this still looks to me to be Mellanox specifics and >>>>>>>> mlx5_vdpa implementation detail that shouldn't expose to userspace. >>>>>>> So I think we can simply drop this patch? >>>>>> Yep, I think so. To be honest I don't know why it has anything to do >>>>>> with the memory hotplug issue. >>>>> >>>>> Eli may know more, my understanding is that, during memory hotplut, qemu >>>>> need to propagate new memory mappings via set_map(). For mellanox, it means >>>>> it needs to rebuild memory keys, so the virtqueue needs to be suspended. >>>>> >>>> I think Siwei was asking why the first patch was related to the hotplug >>>> issue. >>> I was thinking how consistency is assured when saving/restoring this >>> h/w avail_index against the one in the virtq memory, particularly in >>> the region_add/.region_del() context (e.g. the hotplug case). Problem >>> is I don't see explicit memory barrier when guest thread updates the >>> avail_index, how does the device make sure the h/w avail_index is >>> uptodate while guest may race with updating the virtq's avail_index in >>> the mean while? Maybe I completely miss something obvious? >> DKIM-Signature: v1; arsa-sha256; crelaxed/relaxed; dnvidia.com; sn1; >> t 12257780; bhHnB0z4VEKwRS3WGY8d836MJgxu5Eln/jbFZlNXVxc08; >> hX-PGP-Universal:Date:From:To:CC:Subject:Message-ID:References: >> MIME-Version:Content-Type:Content-Disposition: >> Content-Transfer-Encoding:In-Reply-To:User-Agent:X-Originating-IP: >> X-ClientProxiedBy; >> bgGmb8+rcn3/rKzKQ/7QzSnghWzZ+FAU0XntsRZYGQ66sFvT7zsYPHogG3LIWNY77t >> wNHPw7GCJrNaH3nEXPbOp0FMOZw4Kv4W7UPuYPobbLeTkvuPAidjB8dM42vz+1X61t >> 9IVQT9X4hnAxRjI5CqZOo41GS4Tl1X+ykGoA+VE80BR/R/+nQ3tXDVULfppzeB+vu3 >> TWnnpaZ2GyoNyPlMiyVRkHdXzDVgA4uQHxwHn7otGK5J4lzyu8KrFyQtiP+f6hfu5v >> crJkYS8e9A+rfzUmKWuyHcKcmhPhAVJ4XdpzZcDXXlMHVxG7nR1o88xttC6D1+oNIP >> 9xHI3DkNBpEuA >> If you're asking about syncronization upon hot plug of memory, the >> hardware always goes to read the available index from memory when a new >> hardware object is associted with a virtqueue. You can argue then that >> you don't need to restore the available index and you may be right but >> this is the currect inteface to the firmware. >> >> >> If you're asking on generally how sync is assured when the guest updates >> the available index, can you please send a pointer to the code where you >> see the update without a memory barrier? > This is a snippet of virtqueue_add_split() where avail_index gets > updated by guest: > > /* Put entry in available array (but don't update avail->idx until they > * do sync). */ > avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1); > vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head); > > /* Descriptors and available array need to be set before we expose the > * new available array entries. */ > virtio_wmb(vq->weak_barriers); > vq->split.avail_idx_shadow++; > vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev, > vq->split.avail_idx_shadow); > vq->num_added++; > > There's memory barrier to make sure the update to descriptor and > available ring is seen before writing to the avail->idx, but there > seems no gurantee that this update would flush to the memory > immmedately either before or after the mlx5-vdpa is suspened and gets > the old avail_index value stashed somewhere. In this case, how does > the hardware ensure the consistency between the guest virtio and host > mlx5-vdpa? Or, it completly relies on guest to update the avail_index > once the next buffer is available, so that the index will be in sync > again? I'm not sure I get the question but notice that the driver should check and notify virtqueue when device want a notification. So there's a virtio_wmb() e.g in: static bool virtqueue_kick_prepare_split(struct virtqueue *_vq) {     struct vring_virtqueue *vq = to_vvq(_vq);     u16 new, old;     bool needs_kick;     START_USE(vq);     /* We need to expose available array entries before checking avail      * event. */     virtio_mb(vq->weak_barries);     old = vq->split.avail_idx_shadow - vq->num_added;     new = vq->split.avail_idx_shadow;     vq->num_added = 0; (See the comment above virtio_mb()). So the avail idx is guaranteed to be committed to memroy(cache hierarchy) before the check of notification. I think we sync through this. Thanks > > Thanks, > -Siwei > >>> Thanks, >>> -Siwei >>> >>>> But you're correct. When memory is added, I get a new memory map. This >>>> requires me to build a new memory key object which covers the new memory >>>> map. Since the virtqueue objects are referencing this memory key, I need >>>> to destroy them and build new ones referncing the new memory key. >>>> >>>>> Thanks >>>>> >>>>> >>>>>> -Siwei >>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> >>>>>>>>> -Siwei >>>>>>>>> >>>>>>>>> >>>>>>>>>> Signed-off-by: Eli Cohen >>>>>>>>>> --- >>>>>>>>>> drivers/vdpa/mlx5/net/mlx5_vnet.c | 8 -------- >>>>>>>>>> 1 file changed, 8 deletions(-) >>>>>>>>>> >>>>>>>>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c >>>>>>>>>> index 88dde3455bfd..549ded074ff3 100644 >>>>>>>>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c >>>>>>>>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c >>>>>>>>>> @@ -1148,8 +1148,6 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq) >>>>>>>>>> >>>>>>>>>> static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq) >>>>>>>>>> { >>>>>>>>>> - struct mlx5_virtq_attr attr; >>>>>>>>>> - >>>>>>>>>> if (!mvq->initialized) >>>>>>>>>> return; >>>>>>>>>> >>>>>>>>>> @@ -1158,12 +1156,6 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m >>>>>>>>>> >>>>>>>>>> if (modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND)) >>>>>>>>>> mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed\n"); >>>>>>>>>> - >>>>>>>>>> - if (query_virtqueue(ndev, mvq, &attr)) { >>>>>>>>>> - mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue\n"); >>>>>>>>>> - return; >>>>>>>>>> - } >>>>>>>>>> - mvq->avail_idx = attr.available_index; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> static void suspend_vqs(struct mlx5_vdpa_net *ndev) >>>>>>>>>> -- >>>>>>>>>> 2.29.2 >>>>>>>>>>