Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp360601pxj; Thu, 3 Jun 2021 08:27:43 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzsWQaPpDXv1OVk7ohTIuR5gisHr9zfcp7G/eYk1AEiPz0zNlr8eUadRiWnTEiriapFQflD X-Received: by 2002:aa7:cad6:: with SMTP id l22mr20280edt.303.1622734062981; Thu, 03 Jun 2021 08:27:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1622734062; cv=none; d=google.com; s=arc-20160816; b=VVkGuNeOsyiGU3BoZ1WUo5nYP2X0GKiexr4rDJF/OKc6PSsUukOxBPI7lefniDQFAb //CT7/TZ3WBNzX+FZBzitYWum2HJ0fNIw/oANQtpzSckQu0EyXVP1R6xxFn+1Ofty9ZA NTIPKpVbvach7sLSKKjRvfqtuApR3+l0rzFszCyhjxMQq9TXsgLBGH3UeQDcjWKaFzzz uDnPSd54o3Yr6vBlqqvJMJNF/XR7wRrmdyy7giQ/uuIYnht1+zXvvX2MqexhYN5vMpeU R5+Mltrn8U0Av/6gLpAneEkzEpL+8cUl7XBefYhGZOGJOj/6vwLp8AsAooSbF5er3n/I iqGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=2dgSfYYpXwBodnxak+oieIso4PBe+1C5Rbth+oHRhQs=; b=ZUfTmGJ7dAPF2V1Kj+9lavDtRNqMqQdh97YTWZdOJk/Ip454EPMbmFwE3mtQMgrp8n 3PjYrCDjIwzgc6KoJRS6yuwcWKU/if6ppDbZ0cNmUJATm3RAyTkJ1M+Ez1MExgsI/Vsy V+OZJKzT36dOt+wkJ0kIgQm0/yWP7XaxZuPzzR0snNGFGkc5E0K9ExTeHmpv4pIglymL 27LdKZTN/k1n6e6JZSZjiDahZeCmrfb+odiitGqQyw3j4dboGWLnV5m7nSaeZugp3M9Z B/paC8Qizqoisduldpet7hO5HJyuNtVg9AkyQusHT73OrIzTKp6rvt9AyFbT04U8/yjH flYA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Hc1dMyUN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id r4si2661676edw.55.2021.06.03.08.27.19; Thu, 03 Jun 2021 08:27:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Hc1dMyUN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229818AbhFCP0M (ORCPT + 99 others); Thu, 3 Jun 2021 11:26:12 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:38032 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229823AbhFCP0L (ORCPT ); Thu, 3 Jun 2021 11:26:11 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1622733866; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=2dgSfYYpXwBodnxak+oieIso4PBe+1C5Rbth+oHRhQs=; b=Hc1dMyUNUIhGzZFoPZVLHBnU3Xgmisok4pQRNEu8a4UmRzutgXR+a7YcTKRoLlTjPsiCdI recXx5/J95Xr+0GVlpeOfCBidIyVj72giAqy3S4iHhGGxC96oqA83KOhnMNzphm0+n42dl +xQ/y9J6tcNDIBUHacBNm3bVg9RkOVk= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-282-u4Hqj26qNXKSNwFSaCxFJQ-1; Thu, 03 Jun 2021 11:24:25 -0400 X-MC-Unique: u4Hqj26qNXKSNwFSaCxFJQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id F1149801817; Thu, 3 Jun 2021 15:24:23 +0000 (UTC) Received: from localhost (ovpn-114-228.ams2.redhat.com [10.36.114.228]) by smtp.corp.redhat.com (Postfix) with ESMTP id 86BDD60C17; Thu, 3 Jun 2021 15:24:17 +0000 (UTC) Date: Thu, 3 Jun 2021 16:24:16 +0100 From: Stefan Hajnoczi To: Jason Wang Cc: virtualization@lists.linux-foundation.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Christoph Hellwig , Paolo Bonzini , Jens Axboe , slp@redhat.com, sgarzare@redhat.com, "Michael S. Tsirkin" Subject: Re: [PATCH 3/3] virtio_blk: implement blk_mq_ops->poll() Message-ID: References: <20210520141305.355961-1-stefanha@redhat.com> <20210520141305.355961-4-stefanha@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="vxiBSKnm2AIk6CUa" Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --vxiBSKnm2AIk6CUa Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, May 27, 2021 at 01:48:36PM +0800, Jason Wang wrote: >=20 > =E5=9C=A8 2021/5/25 =E4=B8=8B=E5=8D=884:59, Stefan Hajnoczi =E5=86=99=E9= =81=93: > > On Tue, May 25, 2021 at 11:21:41AM +0800, Jason Wang wrote: > > > =E5=9C=A8 2021/5/20 =E4=B8=8B=E5=8D=8810:13, Stefan Hajnoczi =E5=86= =99=E9=81=93: > > > > Request completion latency can be reduced by using polling instead = of > > > > irqs. Even Posted Interrupts or similar hardware support doesn't be= at > > > > polling. The reason is that disabling virtqueue notifications saves > > > > critical-path CPU cycles on the host by skipping irq injection and = in > > > > the guest by skipping the irq handler. So let's add blk_mq_ops->pol= l() > > > > support to virtio_blk. > > > >=20 > > > > The approach taken by this patch differs from the NVMe driver's > > > > approach. NVMe dedicates hardware queues to polling and submits > > > > REQ_HIPRI requests only on those queues. This patch does not require > > > > exclusive polling queues for virtio_blk. Instead, it switches betwe= en > > > > irqs and polling when one or more REQ_HIPRI requests are in flight = on a > > > > virtqueue. > > > >=20 > > > > This is possible because toggling virtqueue notifications is cheap = even > > > > while the virtqueue is running. NVMe cqs can't do this because irqs= are > > > > only enabled/disabled at queue creation time. > > > >=20 > > > > This toggling approach requires no configuration. There is no need = to > > > > dedicate queues ahead of time or to teach users and orchestration t= ools > > > > how to set up polling queues. > > > >=20 > > > > Possible drawbacks of this approach: > > > >=20 > > > > - Hardware virtio_blk implementations may find virtqueue_disable_cb= () > > > > expensive since it requires DMA. > > >=20 > > > Note that it's probably not related to the behavior of the driver but= the > > > design of the event suppression mechanism. > > >=20 > > > Device can choose to ignore the suppression flag and keep sending > > > interrupts. > > Yes, it's the design of the event suppression mechanism. > >=20 > > If we use dedicated polling virtqueues then the hardware doesn't need to > > check whether interrupts are enabled for each notification. However, > > there's no mechanism to tell the device that virtqueue interrupts are > > permanently disabled. This means that as of today, even dedicated > > virtqueues cannot suppress interrupts without the device checking for > > each notification. >=20 >=20 > This can be detected via a transport specific way. >=20 > E.g in the case of MSI, VIRTIO_MSI_NO_VECTOR could be a hint. Nice idea :). Then there would be no need for changes to the hardware interface. IRQ-less virtqueues is could still be mentioned explicitly in the VIRTIO spec so that driver/device authors are aware of the VIRTIO_MSI_NO_VECTOR trick. > > > > +static int virtblk_poll(struct blk_mq_hw_ctx *hctx) > > > > +{ > > > > + struct virtio_blk *vblk =3D hctx->queue->queuedata; > > > > + struct virtqueue *vq =3D vblk->vqs[hctx->queue_num].vq; > > > > + > > > > + if (!virtqueue_more_used(vq)) > > >=20 > > > I'm not familiar with block polling but what happens if a buffer is m= ade > > > available after virtqueue_more_used() returns false here? > > Can you explain the scenario, I'm not sure I understand? "buffer is made > > available" -> are you thinking about additional requests being submitted > > by the driver or an in-flight request being marked used by the device? >=20 >=20 > Something like that: >=20 > 1) requests are submitted > 2) poll but virtqueue_more_used() return false > 3) device make buffer used >=20 > In this case, will poll() be triggered again by somebody else? (I think > interrupt is disabled here). Yes. An example blk_poll() user is fs/block_dev.c:__blkdev_direct_IO_simple(): qc =3D submit_bio(&bio); for (;;) { set_current_state(TASK_UNINTERRUPTIBLE); if (!READ_ONCE(bio.bi_private)) break; if (!(iocb->ki_flags & IOCB_HIPRI) || !blk_poll(bdev_get_queue(bdev), qc, true)) blk_io_schedule(); } That's the infinite loop. The block layer implements the generic portion of blk_poll(). blk_poll() calls mq_ops->poll() (virtblk_poll()). So in general the polling loop will keep iterating, but there are exceptions: 1. need_resched() causes blk_poll() to return 0 and blk_io_schedule() will be called. 2. blk-mq has a fancier io_poll algorithm that estimates I/O time and sleeps until the expected completion time to save CPU cycles. I haven't looked into detail at this one. Both these cases affect existing mq_ops->poll() implementations (e.g. NVMe). What's new in this patch series is that virtio-blk could have non-polling requests on the virtqueue which now has irqs disabled. So we could wait for them. I think there's an easy solution for this: don't disable virtqueue irqs when there are non-REQ_HIPRI requests in flight. The disadvantage is that we'll keep irqs disable in more situations so the performance improvement may not apply in some configurations. Stefan --vxiBSKnm2AIk6CUa Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAmC49CAACgkQnKSrs4Gr c8g9SggApFtVeNv8a9846y9j4yHxsw8MKkrLJR8eFajwxJYH9o4S5y1rW1nKURxn 0QI9IlDe3z/jkcaeidvP1JaiEPpDweEOuJ5R/TTRfixSMCjR5kY9+A2AOh8gHOx6 sIejzjF0BeUcD8hZzTT/tWDXRojxCcZM+NMrf4nefzOjNkOh3tCXPTm0Ov6wX34I TlAI/I1DFtGx92j3tqcgJmQpie48arGTLPX6qSrWABM7IG5bcJK8tnbwkp2AKAZX xJcNTADB0vygduI+33Ttv35gDy11zOoe9RaKdBmpJzhNTm77glSlXWMupe2b5uo+ /iPZIP3AZMMrehTbSpFricIMzMBSPQ== =reHh -----END PGP SIGNATURE----- --vxiBSKnm2AIk6CUa--