Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp5154252imm; Fri, 18 May 2018 18:14:11 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpJN/UDigz6mB0iZ6TOQqDbCDBTQtQtTMuAGZmvYSQoQhBQ3hWR0iXXfQjFTBudpSYLNWUz X-Received: by 2002:a17:902:7444:: with SMTP id e4-v6mr11332064plt.225.1526692451123; Fri, 18 May 2018 18:14:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526692451; cv=none; d=google.com; s=arc-20160816; b=ZUuf8mHN1Q0WGIOpOuwDbu6SV/LkdjgTtuVHpev/1Rin6OA6kkU+7lYdCY8K2aL24q ArDQRFcTY9HLlLsrtApDzKaX5MnkCNoJi03Hztzdcgv8fQPDFJV9TTxO/l0HQbb5jKwN dgzLidG97DUPOhSuehocvKP3gQAO9JtfckeE3Hea/8/3vELRcnquYdFc7idgLNvfp4UQ fVkO1VFa3xjPjW1dAEUbQpAyl7PqDFK7WSFXybAf9Xfe3dXw1IzQjP3BAQ9htADrmHJl kEmO0Ks+UaABer/vAPXNdG7P7rG2+GZGWQrVJNcfRxhfTDRQ2aUFDbowTV7RE4KQnb3+ 7iiA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=CPJwBwAtUkiEhNX/H86I5yyEtbmMn3+xp5N37xaRJLM=; b=GsHEUor7N8V3pJmo9wRfUzl6Uxy6vpBWDkn/MPEEuQGlupYr/JrXWaoWaifLX5Vq0P F8ZSXazc0kU8V1hHtNzYyQGqe07wMSwMUe6ydSOw6yoa3lDA+Pr0iyfxOHudV1u6ELaW rpk9zzxT0q0oJqeH7/TtHkdqXf51HpgIAiWZC2wgqk2TNOzvwClIjb8DQ+eNEcHD5+9T 7Q9RK+L89buLHTjg4reiJu8aUtR2bmY0njR4sZOZNQ4phfcls7rOWULhBWMQGFwIkGOw 7DMZAZB1Bi3al5R6WragYuAriviE5FEj1mIVzODiJI4Rb3yPFRV/WlyGAevGvxlOgNIC zDMw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g8-v6si9358916pli.75.2018.05.18.18.13.54; Fri, 18 May 2018 18:14:11 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752201AbeESBMl (ORCPT + 99 others); Fri, 18 May 2018 21:12:41 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:35942 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751851AbeESBMk (ORCPT ); Fri, 18 May 2018 21:12:40 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 75DF940711DD; Sat, 19 May 2018 01:12:39 +0000 (UTC) Received: from [10.72.12.40] (ovpn-12-40.pek2.redhat.com [10.72.12.40]) by smtp.corp.redhat.com (Postfix) with ESMTPS id CB0E52166BAD; Sat, 19 May 2018 01:12:33 +0000 (UTC) Subject: Re: [RFC v4 3/5] virtio_ring: add packed ring support To: Tiwei Bie Cc: mst@redhat.com, virtualization@lists.linux-foundation.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, wexu@redhat.com, jfreimann@redhat.com References: <20180516083737.26504-4-tiwei.bie@intel.com> <2000f635-bc34-71ff-ff51-a711c2e9726d@redhat.com> <20180516123909.GB986@debian> <20180516134550.GB4171@debian> <20180516143332.GA1957@debian> <20180518112950.GA28224@debian> <20180518143334.GA4537@debian> From: Jason Wang Message-ID: <1a661df0-8ca9-b31d-9c17-8684d608a33a@redhat.com> Date: Sat, 19 May 2018 09:12:30 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <20180518143334.GA4537@debian> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Sat, 19 May 2018 01:12:39 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Sat, 19 May 2018 01:12:39 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jasowang@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2018年05月18日 22:33, Tiwei Bie wrote: > On Fri, May 18, 2018 at 09:17:05PM +0800, Jason Wang wrote: >> On 2018年05月18日 19:29, Tiwei Bie wrote: >>> On Thu, May 17, 2018 at 08:01:52PM +0800, Jason Wang wrote: >>>> On 2018年05月16日 22:33, Tiwei Bie wrote: >>>>> On Wed, May 16, 2018 at 10:05:44PM +0800, Jason Wang wrote: >>>>>> On 2018年05月16日 21:45, Tiwei Bie wrote: >>>>>>> On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote: >>>>>>>> On 2018年05月16日 20:39, Tiwei Bie wrote: >>>>>>>>> On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote: >>>>>>>>>> On 2018年05月16日 16:37, Tiwei Bie wrote: >>>>> [...] >>>>>>>>>>> +static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head, >>>>>>>>>>> + unsigned int id, void **ctx) >>>>>>>>>>> +{ >>>>>>>>>>> + struct vring_packed_desc *desc; >>>>>>>>>>> + unsigned int i, j; >>>>>>>>>>> + >>>>>>>>>>> + /* Clear data ptr. */ >>>>>>>>>>> + vq->desc_state[id].data = NULL; >>>>>>>>>>> + >>>>>>>>>>> + i = head; >>>>>>>>>>> + >>>>>>>>>>> + for (j = 0; j < vq->desc_state[id].num; j++) { >>>>>>>>>>> + desc = &vq->vring_packed.desc[i]; >>>>>>>>>>> + vring_unmap_one_packed(vq, desc); >>>>>>>>>> As mentioned in previous discussion, this probably won't work for the case >>>>>>>>>> of out of order completion since it depends on the information in the >>>>>>>>>> descriptor ring. We probably need to extend ctx to record such information. >>>>>>>>> Above code doesn't depend on the information in the descriptor >>>>>>>>> ring. The vq->desc_state[] is the extended ctx. >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Tiwei Bie >>>>>>>> Yes, but desc is a pointer to descriptor ring I think so >>>>>>>> vring_unmap_one_packed() still depends on the content of descriptor ring? >>>>>>>> >>>>>>> I got your point now. I think it makes sense to reserve >>>>>>> the bits of the addr field. Driver shouldn't try to get >>>>>>> addrs from the descriptors when cleanup the descriptors >>>>>>> no matter whether we support out-of-order or not. >>>>>> Maybe I was wrong, but I remember spec mentioned something like this. >>>>> You're right. Spec mentioned this. I was just repeating >>>>> the spec to emphasize that it does make sense. :) >>>>> >>>>>>> But combining it with the out-of-order support, it will >>>>>>> mean that the driver still needs to maintain a desc/ctx >>>>>>> list that is very similar to the desc ring in the split >>>>>>> ring. I'm not quite sure whether it's something we want. >>>>>>> If it is true, I'll do it. So do you think we also want >>>>>>> to maintain such a desc/ctx list for packed ring? >>>>>> To make it work for OOO backends I think we need something like this >>>>>> (hardware NIC drivers are usually have something like this). >>>>> Which hardware NIC drivers have this? >>>> It's quite common I think, e.g driver track e.g dma addr and page frag >>>> somewhere. e.g the ring->rx_info in mlx4 driver. >>> It seems that I had a misunderstanding on your >>> previous comments. I know it's quite common for >>> drivers to track e.g. DMA addrs somewhere (and >>> I think one reason behind this is that they want >>> to reuse the bits of addr field). >> Yes, we may want this for virtio-net as well in the future. >> >>> But tracking >>> addrs somewhere doesn't means supporting OOO. >>> I thought you were saying it's quite common for >>> hardware NIC drivers to support OOO (i.e. NICs >>> will return the descriptors OOO): >>> >>> I'm not familiar with mlx4, maybe I'm wrong. >>> I just had a quick glance. And I found below >>> comments in mlx4_en_process_rx_cq(): >>> >>> ``` >>> /* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx >>> * descriptor offset can be deduced from the CQE index instead of >>> * reading 'cqe->index' */ >>> index = cq->mcq.cons_index & ring->size_mask; >>> cqe = mlx4_en_get_cqe(cq->buf, index, priv->cqe_size) + factor; >>> ``` >>> >>> It seems that although they have a completion >>> queue, they are still using the ring in order. >> I guess so (at least from the above bits). Git grep -i "out of order" in >> drivers/net gives some hints. Looks like there're few deivces do this. >> >>> I guess maybe storage device may want OOO. >> Right, some iSCSI did. >> >> But tracking them elsewhere is not only for OOO. >> >> Spec said: >> >> for element address >> >> " >> In a used descriptor, Element Address is unused. >> " >> >> for Next flag: >> >> " >> For example, if descriptors are used in the same order in which they are >> made available, this will result in >> the used descriptor overwriting the first available descriptor in the list, >> the used descriptor for the next list >> overwriting the first available descriptor in the next list, etc. >> " >> >> for in order completion: >> >> " >> This will result in the used descriptor overwriting the first available >> descriptor in the batch, the used descriptor >> for the next batch overwriting the first available descriptor in the next >> batch, etc. >> " >> >> So: >> >> - It's an alignment to the spec >> - device may (or should) overwrite the descriptor make also make address >> field useless. > You didn't get my point... I don't hope so. > I agreed driver should track the DMA addrs or some > other necessary things from the very beginning. And > I also repeated the spec to emphasize that it does > make sense. And I'd like to do that. > > What I was saying is that, to support OOO, we may > need to manage these context (which saves DMA addrs > etc) via a list which is similar to the desc list > maintained via `next` in split ring instead of an > array whose elements always can be indexed directly. My point is these context is a must (not only for OOO). > > The desc ring in split ring is an array, but its > free entries are managed as list via next. I was > just wondering, do we want to manage such a list > because of OOO. It's just a very simple question > that I want to hear your opinion... (It doesn't > means anything, e.g. It doesn't mean I don't want > to support OOO. It's just a simple question...) So the question is yes. But I admit I don't have better idea other than what you propose here (something like split ring which is a little bit sad). Maybe Michael had. Thanks > > Best regards, > Tiwei Bie > >> Thanks >> >>> Best regards, >>> Tiwei Bie >>> >>>> Thanks >>>> >>>>>> Not for the patch, but it looks like having a OUT_OF_ORDER feature bit is >>>>>> much more simpler to be started with. >>>>> +1 >>>>> >>>>> Best regards, >>>>> Tiwei Bie