Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp4575085pxj; Tue, 25 May 2021 11:03:18 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzbTJZZ1I+ux8Hzoyvwhcer+92kNSZITSR62/1CnKDvcnplfRkabSokONQVKPR8s86ONN1T X-Received: by 2002:a92:c268:: with SMTP id h8mr16937033ild.299.1621965798252; Tue, 25 May 2021 11:03:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1621965798; cv=none; d=google.com; s=arc-20160816; b=WlnyoagDVVguHQyHvc7+TpOCmstAwwc57y0J8BLsRM/sddWs0LZFZuHZbJ3OoG5TIj N8oyAHOcz2wEEQxkZNKEEJuM2jSdoR1+rDOzq9hsKR7C+MhnzRJvEJ3H4S1q2sazHuMn LBOjEFFCsvauLQRKOZP/TKOEUu3ws2sZ1l41C1qXkorKYPkLHqdJ38N0d3Y7f/i+1cjs j/IdBID8/XRgeeWpDroLRg1SYsvwK8z/3dJP3a7+r0tBHz4TYuApxJZFL8+fbIuyzNrZ bAC3so47HYkppBAJfwbOVQJ6SFawrRf3xLBIFxNP8dwT6p3LOQ4HxLAXPXaLceqoKZd9 g+0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=fL6Y/duGHBMJKkmMDJ4jflONRQgZgnAzPjAPflMfB2s=; b=pqE+bjBkCpOmo+n+FyTNZPzYyv70Bke0xwn/OevXZnZO1qQjHnccA5DN5hVRZyj2jN OnfDDJ/NzizGOIuBJoqOKtFXoXUksEo6rmbApEU5Zc/WlsdUNkqL7pxTZW1LOAsJSPk3 bpzyaK2kgfeccgoitGucfsjnKSEldjU93ZYQlmNZP9AIxUM6UYju/g1rBcTPq+AfoJrh gmCwzu5R5NkCkbvd5f18kbegayCCpT3hzaa8IHiemcTqigIhW1tpqAt2FrdfrwLt3aA/ 44YOmv4yOm5oiqwp2u/fWSou/eRKNsr40nFulJnaYXZ4InVh6gtA8v0QgQ2xorRYEWIo e64g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NirqhuDV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id c14si12448681ilm.1.2021.05.25.11.03.03; Tue, 25 May 2021 11:03:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NirqhuDV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232743AbhEYQtQ (ORCPT + 99 others); Tue, 25 May 2021 12:49:16 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:29434 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232555AbhEYQtP (ORCPT ); Tue, 25 May 2021 12:49:15 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1621961264; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=fL6Y/duGHBMJKkmMDJ4jflONRQgZgnAzPjAPflMfB2s=; b=NirqhuDVVDLTpqdbUNHL4TtQ8UcYuD8wcTaOtX0CZxeLb4ir/ucGSE278+bmp/Cbj37WBA bkUiIRMMaM1fl2MAbroxbE8lrDWYo9//TnwbUVpSGlja58E4IZHrd+MAFDhZy+fgWbyOmm 1x6uygak2EVcRUXmM/9sEraDQqUrYmQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-248-M1ILaIflOF2vHlFCyOjrRg-1; Tue, 25 May 2021 12:47:42 -0400 X-MC-Unique: M1ILaIflOF2vHlFCyOjrRg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 2B8AA180FD66; Tue, 25 May 2021 16:47:41 +0000 (UTC) Received: from localhost (ovpn-115-80.ams2.redhat.com [10.36.115.80]) by smtp.corp.redhat.com (Postfix) with ESMTP id 449B9421F; Tue, 25 May 2021 16:47:37 +0000 (UTC) Date: Tue, 25 May 2021 17:47:36 +0100 From: Stefan Hajnoczi To: Dongli Zhang Cc: Hannes Reinecke , virtualization@lists.linux-foundation.org, linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, mst@redhat.com, jasowang@redhat.com, pbonzini@redhat.com, jejb@linux.ibm.com, martin.petersen@oracle.com, joe.jin@oracle.com, junxiao.bi@oracle.com, srinivas.eeda@oracle.com Subject: Re: [RFC] virtio_scsi: to poll and kick the virtqueue in timeout handler Message-ID: References: <20210523063843.1177-1-dongli.zhang@oracle.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="VNMvGmWXBqxStO+j" Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --VNMvGmWXBqxStO+j Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, May 24, 2021 at 11:33:33PM -0700, Dongli Zhang wrote: > On 5/24/21 6:24 AM, Stefan Hajnoczi wrote: > > On Sun, May 23, 2021 at 09:39:51AM +0200, Hannes Reinecke wrote: > >> On 5/23/21 8:38 AM, Dongli Zhang wrote: > >>> This RFC is to trigger the discussion about to poll and kick the > >>> virtqueue on purpose in virtio-scsi timeout handler. > >>> > >>> The virtio-scsi relies on the virtio vring shared between VM and host. > >>> The VM side produces requests to vring and kicks the virtqueue, while= the > >>> host side produces responses to vring and interrupts the VM side. > >>> > >>> By default the virtio-scsi handler depends on the host timeout handler > >>> by BLK_EH_RESET_TIMER to give host a chance to perform EH. > >>> > >>> However, this is not helpful for the case that the responses are avai= lable > >>> on vring but the notification from host to VM is lost. > >>> > >> How can this happen? > >> If responses are lost the communication between VM and host is broken,= and > >> we should rather reset the virtio rings themselves. > >=20 > > I agree. In principle it's fine to poll the virtqueue at any time, but I > > don't understand the failure scenario here. It's not clear to me why the > > device-to-driver vq notification could be lost. > >=20 >=20 > One example is the CPU hotplug issue before the commit bf0beec0607d ("blk= -mq: > drain I/O when all CPUs in a hctx are offline") was available. The issue = is > equivalent to loss of interrupt. Without the CPU hotplug fix, while NVMe = driver > relies on the timeout handler to complete inflight IO requests, the PV > virtio-scsi may hang permanently. >=20 > In addition, as the virtio/vhost/QEMU are complex software, we are not ab= le to > guarantee there is no further lost of interrupt/kick issue in the future.= It is > really painful if we encounter such issue in production environment. Any number of hardware or software bugs might exist that we don't know about, yet we don't pre-emptively add workarounds for them because where do you draw the line? I checked other SCSI/block drivers and found it's rare to poll in the timeout function so there does not seem to be a consensus that it's useful to do this. That said, it's technically fine to do it, the virtqueue APIs are there and can be used like this. So if you and others think this is necessary, then it's a pretty small change and I'm not against merging a patch like this. Stefan --VNMvGmWXBqxStO+j Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAmCtKigACgkQnKSrs4Gr c8jE6gf/SzpIij5k5NFqXRfDYjrAoN/hHDK0f8rALj06t2cmrKq1LugoGygtb4nd MmJypVvxZZW037iS8fChHC5kcqvBA9Y/N+0tABVJNT8GHLnvDZI8diTONgXVUCzj X/u0+3EjQNz2TX6W9pZbEJmeVv0z7JiWCQKcLf9DYq77Xei8U4U5Xv4k0Nks1b1A PFH+j4R42eYdziIwwxCPgAQtlCWlTgWbGO9B14kqeybM7I9pq2Ar+WQXIItptuuA 9R3RBQ6n1cOqtiOHwOXjz2Y4zSa3o4jrLeV8/u3MEmZAwpc2+1A7QL+fW2uxpWJf 7geSWXbXMdwb7odJ+yKHBNAMOjZqmA== =d024 -----END PGP SIGNATURE----- --VNMvGmWXBqxStO+j--