Received: by 2002:ac0:da4c:0:0:0:0:0 with SMTP id a12csp363102imi; Fri, 22 Jul 2022 00:14:40 -0700 (PDT) X-Google-Smtp-Source: AGRyM1tZDObKGp7eKLnj6E4KigrGM9ME4r82Wv9itk2lCpqMa0tOFm5+RGWUgJTfaSTYSIbV27aI X-Received: by 2002:a17:90b:2496:b0:1ef:a94:7048 with SMTP id nt22-20020a17090b249600b001ef0a947048mr2588055pjb.244.1658474079947; Fri, 22 Jul 2022 00:14:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658474079; cv=none; d=google.com; s=arc-20160816; b=yhM/HnqYke49JZ6wtO5gPynUFFUFfGNXtbkxAsE0WUqanJhWmel4laaif3/P0WkzeQ Z6PL97qbSzXxw4n1po3aZIRWEe716FiEaCi8eOTwuWYt+MfhUzmqnO9Q78YbmkWjEi86 sOOzhBawYAciD10hv6pL9p5q47yMtPz5EfZeGOE/2yBsUl7Y3UI4ZOLK5kQZy04MaufX eL5EVApIaOCP/3dul1sSfkcsE2WdFZ9bYnOYa10cuMysHWSu2JuLInv3AdMjkyZsBcWd 2nLWaFc8yJ59roL7JntC1Z0G9mcooE0gemIpE+e9Vx5qjeoxuGCe5IPSyM19rLMtHmzB Jc7A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=GzF+ETyrmoD747FNPp3QPTAixHepqvjTlPacCwfRImE=; b=y2umdj+SaIEa873aAjSShM6v/7B4b70W6acXDBijonF1uXmxe2vN87K/OPj2KmNkUy oXhIltGX56fjHgOmyI3xg6MAQ2RuD8TzlBJTRbvomPE6SWrz9zPrOfoJCL+F0292ISqx GAwPeWMwHL9O7HRMexHBhuW1PvePXBn3anoRKdBH776UYPtjbwRbUmuru/A3fHATQ1Ps N36C1oBkHhdUsfLIptKlWyUT6hu4gUzvKKP4DdKczEhW88BIydZ2qXR1VXdQuLIWkhu2 2LNoTA43yTunt/mtZCUerGAeZb52IjrOR7idycpRJfvLaBToNZR6NkyGYpxZjfTGvHvA JHMA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=HqCVtWKJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j7-20020a056a00174700b0052ae89b8632si5911876pfc.318.2022.07.22.00.14.24; Fri, 22 Jul 2022 00:14:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=HqCVtWKJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234120AbiGVHIB (ORCPT + 99 others); Fri, 22 Jul 2022 03:08:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33142 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229505AbiGVHH6 (ORCPT ); Fri, 22 Jul 2022 03:07:58 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id A216B8E6F8 for ; Fri, 22 Jul 2022 00:07:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1658473675; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=GzF+ETyrmoD747FNPp3QPTAixHepqvjTlPacCwfRImE=; b=HqCVtWKJvf2ds6m7gZc9pv7qUVyhOiIMmtqvQ2tc93e80h6LFJpu6yAa5xEFjS/8e32Qli ubNuonyAfGuXTH8+WhXVvop2BZQULGyfSm5taLtEMimeITKeuqnx0pdSyrtxEKpNE7+SCM 0obv411dptGOU8SnJsG2+fFs4og/Vnk= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-601-bSs6LFwUOu6rAlbG-aaq2g-1; Fri, 22 Jul 2022 03:07:53 -0400 X-MC-Unique: bSs6LFwUOu6rAlbG-aaq2g-1 Received: by mail-qt1-f197.google.com with SMTP id cj19-20020a05622a259300b0031f01f3933cso2379527qtb.18 for ; Fri, 22 Jul 2022 00:07:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=GzF+ETyrmoD747FNPp3QPTAixHepqvjTlPacCwfRImE=; b=mBTvKRFLOzFU/Hv65pZZ4p1IrythD7lyzDAVHLXlfFbFOPgVBklGTBkG88jyBAqNee 2lBeq9/qBpT40MO0oLpdifmVtDg6jWgSg+ynZHmIvEqka4RgZDTtJ6zHslqt5lq5TqgC VW++98w8ogpuZbE5x7ZWehwtvJXt2/GUA/72bDvaVc7DBKOJfmi8pgELlci3iZ5U9Poi 2mk7AnXjLoGfx4QpzVIoE6YIiars/U/ci2wEjcFjits+hSsUF9seT3Hvg7scLKsH8eLf 4Fo8Nae1NSMiLqdgYfi6MLjC/cBaixl5Metk3ngU9sR4HPsQc5sV4X/njHdYvvAldfAL U9Qw== X-Gm-Message-State: AJIora9l+09L8/372DJjayOJEqlvVo9SxqZC7zXPXfF2kA3ZuWlqEkcT Ni79AAyNQI9tgQzuQuCcTyCaXqcAOY0A4/e3jjSTskPX4L6DgNiJe/y2JT26KPDc0RKZBT9HFb6 GPa2HwnS6GZROSoMKVcRWsZOxv3AsqX555PB/30NN X-Received: by 2002:a05:620a:1456:b0:6b5:dbd5:3c50 with SMTP id i22-20020a05620a145600b006b5dbd53c50mr1461418qkl.193.1658473673329; Fri, 22 Jul 2022 00:07:53 -0700 (PDT) X-Received: by 2002:a05:620a:1456:b0:6b5:dbd5:3c50 with SMTP id i22-20020a05620a145600b006b5dbd53c50mr1461407qkl.193.1658473673036; Fri, 22 Jul 2022 00:07:53 -0700 (PDT) MIME-Version: 1.0 References: <20220721084341.24183-1-qtxuning1999@sjtu.edu.cn> <20220721084341.24183-2-qtxuning1999@sjtu.edu.cn> In-Reply-To: <20220721084341.24183-2-qtxuning1999@sjtu.edu.cn> From: Eugenio Perez Martin Date: Fri, 22 Jul 2022 09:07:17 +0200 Message-ID: Subject: Re: [RFC 1/5] vhost: reorder used descriptors in a batch To: Guo Zhi Cc: Jason Wang , Stefano Garzarella , Michael Tsirkin , netdev , linux-kernel , kvm list , virtualization Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 21, 2022 at 10:44 AM Guo Zhi wrote: > > Device may not use descriptors in order, for example, NIC and SCSI may > not call __vhost_add_used_n with buffers in order. It's the task of > __vhost_add_used_n to order them. This commit reorder the buffers using > vq->heads, only the batch is begin from the expected start point and is > continuous can the batch be exposed to driver. And only writing out a > single used ring for a batch of descriptors, according to VIRTIO 1.1 > spec. > > Signed-off-by: Guo Zhi > --- > drivers/vhost/vhost.c | 44 +++++++++++++++++++++++++++++++++++++++++-- > drivers/vhost/vhost.h | 3 +++ > 2 files changed, 45 insertions(+), 2 deletions(-) > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index 40097826c..e2e77e29f 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -317,6 +317,7 @@ static void vhost_vq_reset(struct vhost_dev *dev, > vq->used_flags = 0; > vq->log_used = false; > vq->log_addr = -1ull; > + vq->next_used_head_idx = 0; > vq->private_data = NULL; > vq->acked_features = 0; > vq->acked_backend_features = 0; > @@ -398,6 +399,8 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev *dev) > GFP_KERNEL); > if (!vq->indirect || !vq->log || !vq->heads) > goto err_nomem; > + > + memset(vq->heads, 0, sizeof(*vq->heads) * dev->iov_limit); > } > return 0; > > @@ -2374,12 +2377,49 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq, > unsigned count) > { > vring_used_elem_t __user *used; > + struct vring_desc desc; > u16 old, new; > int start; > + int begin, end, i; > + int copy_n = count; > + > + if (vhost_has_feature(vq, VIRTIO_F_IN_ORDER)) { > + /* calculate descriptor chain length for each used buffer */ > + for (i = 0; i < count; i++) { > + begin = heads[i].id; > + end = begin; > + vq->heads[begin].len = 0; > + do { > + vq->heads[begin].len += 1; > + if (unlikely(vhost_get_desc(vq, &desc, end))) { > + vq_err(vq, "Failed to get descriptor: idx %d addr %p\n", > + end, vq->desc + end); > + return -EFAULT; > + } > + } while ((end = next_desc(vq, &desc)) != -1); > + } > + > + count = 0; > + /* sort and batch continuous used ring entry */ > + while (vq->heads[vq->next_used_head_idx].len != 0) { > + count++; > + i = vq->next_used_head_idx; > + vq->next_used_head_idx = (vq->next_used_head_idx + > + vq->heads[vq->next_used_head_idx].len) > + % vq->num; > + vq->heads[i].len = 0; > + } You're iterating vq->heads with two different indexes here. The first loop is working with indexes [0, count), which is fine if heads is a "cache" and everything can be overwritten (as it used to be before this patch). The other loop trusts in vq->next_used_head_idx, which is saved between calls. So both uses are going to conflict with each other. A proposal for checking this is to push the data in the chains incrementally at the virtio_test driver, and check that they are returned properly. Like, the first buffer in the chain has the value of N, the second one N+1, and so on. Let's split saving chains in its own patch. > + /* only write out a single used ring entry with the id corresponding > + * to the head entry of the descriptor chain describing the last buffer > + * in the batch. > + */ Let's delay the batching for now, we can add it as an optimization on top in the case of devices. My proposal is to define a new struct vring_used_elem_inorder: struct vring_used_elem_inorder { uint16_t written' uint16_t num; } And create a per vq array of them, with vq->num size. Let's call it used_inorder for example. Everytime the device uses a buffer chain of N buffers, written L and first descriptor id D, it stores vq->used_inorder[D] = { .written = L, .num = N }. .num == 0 means the buffer is not available. After storing that information, you have your next_used_head_idx. You can check if vq->used_inorder[next_used_head_idx] is used (.num != 0). In case is not, there is no need to perform any actions for now. In case it is, you iterate vq->used_inorder. First you write as used next_used_head_idx. After that, next_used_head_idx increments by .num, and we need to clean .num. If vq->used_inorder[vq->next_used_head_idx] is used too, repeat. I think we could even squash vq->heads and vq->used_inorder with some tricks, because a chain's length would always be bigger or equal than used descriptor one, but to store in a different array would be more clear. > + heads[0].id = i; > + copy_n = 1; The device must not write anything to the used ring if the next descriptor has not been used. I'm failing to trace how this works when the second half of the batch in vhost/test.c is used here. Thanks! > + } > > start = vq->last_used_idx & (vq->num - 1); > used = vq->used->ring + start; > - if (vhost_put_used(vq, heads, start, count)) { > + if (vhost_put_used(vq, heads, start, copy_n)) { > vq_err(vq, "Failed to write used"); > return -EFAULT; > } > @@ -2410,7 +2450,7 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads, > > start = vq->last_used_idx & (vq->num - 1); > n = vq->num - start; > - if (n < count) { > + if (n < count && !vhost_has_feature(vq, VIRTIO_F_IN_ORDER)) { > r = __vhost_add_used_n(vq, heads, n); > if (r < 0) > return r; > diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h > index d9109107a..7b2c0fbb5 100644 > --- a/drivers/vhost/vhost.h > +++ b/drivers/vhost/vhost.h > @@ -107,6 +107,9 @@ struct vhost_virtqueue { > bool log_used; > u64 log_addr; > > + /* Sort heads in order */ > + u16 next_used_head_idx; > + > struct iovec iov[UIO_MAXIOV]; > struct iovec iotlb_iov[64]; > struct iovec *indirect; > -- > 2.17.1 >