Received: by 2002:a05:6a10:d5a5:0:0:0:0 with SMTP id gn37csp3732009pxb; Mon, 4 Oct 2021 08:26:42 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxl5up0T2GQ3i9AHPbTPnDQi7JzDR3S/vABXkynWoz9Jh5m4o1kqICYY488L3UmESdgcwaJ X-Received: by 2002:a17:902:708b:b0:13e:1a20:f1b0 with SMTP id z11-20020a170902708b00b0013e1a20f1b0mr258867plk.51.1633361202713; Mon, 04 Oct 2021 08:26:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1633361202; cv=none; d=google.com; s=arc-20160816; b=ND3QCSeWtOc3oEgfEV2uHvRp8pyMj6u5eSO8E1YSKUPZ8VTkTgMTXW0vr9p92RD9UD RLGkstHmv6RKZOE511N3OIgju0gO/x2Q7ePe6udiCBldEiuFl3NYjCA2SPVJC0+1y40c aIoHURQZxPBMYWmylvfhw9144VkeNQ2zH8Ok1drRwVOPQCmwSyQuHgq9ckMdncXNXxgW eFuIIlwSPVtfCsEYsS+oXC4bXsQ6JCoM2vZIrzflvYpXogGOEUh4GQPTUA8UP7q0l1Ud PabCaimhG79nlAWjGO2uPQAFFFjns2K75QwLho3OGV6tfL0lz75NlBejHkRxrReET9lR ob1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=gJ8uD8KGtyPPl9fPzzKl5NSX78fI1kIzy833DRlhopA=; b=wClDRDx14YJZc1WMki2Utpg3vTf/ZCPqs0FqTiU/CZk+WlG3thUmm1W6+nN9+9SbUd Jz3GEQV4lQmuMiGRM1ekc1/oXimrFEImgXI2vtlEcTmu67vE80z2uXBzNUMN9wIh4rg9 QAQ2P41RAixoz9H+G/xDV1PduEnQ+ie18XYpSPHhQqykGcoeI1akxcL1vuoK0wiXBOW6 VTMbBNIxT52KpVkxTgk3tGa7tF1xnLE0Uw5bQ87ZuEjWGPUr7NpLqIhVsFlb3DpHmGjH SEefYM2CgTxSNTM9aXRGoPmtfnDC6mA0XllicNVNLyl+H/msOLlKEIQuTCXZkKeaLBCE rlZw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=hcsgdil9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n3si17869607pjh.74.2021.10.04.08.26.29; Mon, 04 Oct 2021 08:26:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=hcsgdil9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235422AbhJDNO5 (ORCPT + 99 others); Mon, 4 Oct 2021 09:14:57 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:46903 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235522AbhJDNNC (ORCPT ); Mon, 4 Oct 2021 09:13:02 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1633353072; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=gJ8uD8KGtyPPl9fPzzKl5NSX78fI1kIzy833DRlhopA=; b=hcsgdil9OhaRD3lGuTfJu1ExPahtgRnpvbqhwMnbkx3LPPuP48onRU5jXr77wSNt5la1a1 /PKTcZYqif+3eSxgRBpJqX+2ro8/0IwDra7Kndreu/HBbhZhXiNpROIp/qOYudeJBbrzEV 89OwLiWkJTIpfGQSNaYPMKApT8G+Vp0= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-255-j6sOfvViO5-LDITnLNDNhQ-1; Mon, 04 Oct 2021 09:11:12 -0400 X-MC-Unique: j6sOfvViO5-LDITnLNDNhQ-1 Received: by mail-wm1-f70.google.com with SMTP id o22-20020a1c7516000000b0030d6f9c7f5fso874469wmc.1 for ; Mon, 04 Oct 2021 06:11:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=gJ8uD8KGtyPPl9fPzzKl5NSX78fI1kIzy833DRlhopA=; b=Q77DzeMXdaJc6a2bpjMNjOAbVvR1X8gQR0QmsoMBwAw/A6TNUBuQNVVkahCFFHH5Ux dMIBA3QfcZFM+zPB0B5z0L+RGVTUaBI3ZgT2UjgALdgz7Bo1ZuPx0+n60l2+oB47FKHF jVznTIyVPgvtaIdDQcO0Wn4qRXfp9TXJpZ3q7RmnKjBzxsP1JsijU3IEi0yed545wTIu HUzDulbxcCc49KE6assObbaeBbLNWZ7gCHEa2bXGtnkk2vT7W65y+lngqpTBjZ/U0fDY JJp8yV/U1Z6JudtZxi4AQd5CTM2WOCtvvWd+7vIzwMYUHPHqma3/feBoC5urilAtOweJ A6mw== X-Gm-Message-State: AOAM5318Sb4ysh07gvvp+HzIkB29gWfpZAOOfdunSGcHSUGYxMFszeas tqTYxN3LwC5SMyYaO0FEAuq7ow250yOvmga0wQsyYD2bs1KegREel5ezdFxLSM36CdkZk8OXOqc QKdfKhCjMzHhd4AE2web9U98N X-Received: by 2002:adf:b7c1:: with SMTP id t1mr13849601wre.387.1633353070653; Mon, 04 Oct 2021 06:11:10 -0700 (PDT) X-Received: by 2002:adf:b7c1:: with SMTP id t1mr13849547wre.387.1633353070291; Mon, 04 Oct 2021 06:11:10 -0700 (PDT) Received: from redhat.com ([2.55.134.94]) by smtp.gmail.com with ESMTPSA id p3sm7750728wmp.43.2021.10.04.06.11.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Oct 2021 06:11:09 -0700 (PDT) Date: Mon, 4 Oct 2021 09:11:04 -0400 From: "Michael S. Tsirkin" To: Cornelia Huck Cc: Halil Pasic , Jason Wang , Xie Yongji , virtualization@lists.linux-foundation.org, linux-kernel@vger.kernel.org, markver@us.ibm.com, Christian Borntraeger , linux-s390@vger.kernel.org, qemu-devel@nongnu.org Subject: Re: [RFC PATCH 1/1] virtio: write back features before verify Message-ID: <20211004090018-mutt-send-email-mst@kernel.org> References: <20210930012049.3780865-1-pasic@linux.ibm.com> <20210930070444-mutt-send-email-mst@kernel.org> <20211001092125.64fef348.pasic@linux.ibm.com> <20211002055605-mutt-send-email-mst@kernel.org> <87bl452d90.fsf@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87bl452d90.fsf@redhat.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 04, 2021 at 02:19:55PM +0200, Cornelia Huck wrote: > > [cc:qemu-devel] > > On Sat, Oct 02 2021, "Michael S. Tsirkin" wrote: > > > On Fri, Oct 01, 2021 at 09:21:25AM +0200, Halil Pasic wrote: > >> On Thu, 30 Sep 2021 07:12:21 -0400 > >> "Michael S. Tsirkin" wrote: > >> > >> > On Thu, Sep 30, 2021 at 03:20:49AM +0200, Halil Pasic wrote: > >> > > This patch fixes a regression introduced by commit 82e89ea077b9 > >> > > ("virtio-blk: Add validation for block size in config space") and > >> > > enables similar checks in verify() on big endian platforms. > >> > > > >> > > The problem with checking multi-byte config fields in the verify > >> > > callback, on big endian platforms, and with a possibly transitional > >> > > device is the following. The verify() callback is called between > >> > > config->get_features() and virtio_finalize_features(). That we have a > >> > > device that offered F_VERSION_1 then we have the following options > >> > > either the device is transitional, and then it has to present the legacy > >> > > interface, i.e. a big endian config space until F_VERSION_1 is > >> > > negotiated, or we have a non-transitional device, which makes > >> > > F_VERSION_1 mandatory, and only implements the non-legacy interface and > >> > > thus presents a little endian config space. Because at this point we > >> > > can't know if the device is transitional or non-transitional, we can't > >> > > know do we need to byte swap or not. > >> > > >> > Hmm which transport does this refer to? > >> > >> It is the same with virtio-ccw and virtio-pci. I see the same problem > >> with both on s390x. I didn't try with virtio-blk-pci-non-transitional > >> yet (have to figure out how to do that with libvirt) for pci I used > >> virtio-blk-pci. > >> > >> > Distinguishing between legacy and modern drivers is transport > >> > specific. PCI presents > >> > legacy and modern at separate addresses so distinguishing > >> > between these two should be no trouble. > >> > >> You mean the device id? Yes that is bolted down in the spec, but > >> currently we don't exploit that information. Furthermore there > >> is a fat chance that with QEMU even the allegedly non-transitional > >> devices only present a little endian config space after VERSION_1 > >> was negotiated. Namely get_config for virtio-blk is implemented in > >> virtio_blk_update_config() which does virtio_stl_p(vdev, > >> &blkcfg.blk_size, blk_size) and in there we don't care > >> about transitional or not: > >> > >> static inline bool virtio_access_is_big_endian(VirtIODevice *vdev) > >> { > >> #if defined(LEGACY_VIRTIO_IS_BIENDIAN) > >> return virtio_is_big_endian(vdev); > >> #elif defined(TARGET_WORDS_BIGENDIAN) > >> if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) { > >> /* Devices conforming to VIRTIO 1.0 or later are always LE. */ > >> return false; > >> } > >> return true; > >> #else > >> return false; > >> #endif > >> } > >> > > > > ok so that's a QEMU bug. Any virtio 1.0 and up > > compatible device must use LE. > > It can also present a legacy config space where the > > endian depends on the guest. > > So, how is the virtio core supposed to determine this? A > transport-specific callback? I'd say a field in VirtIODevice is easiest. > > > >> > Channel i/o has versioning so same thing? > >> > > >> > >> Don't think so. Both a transitional and a non-transitional device > >> would have to accept revisions higher than 0 if the driver tried to > >> negotiate those (and we do in our case). > > > > Yes, the modern driver does. And that one is known to be LE. > > legacy driver doesn't. > > > >> > > The virtio spec explicitly states that the driver MAY read config > >> > > between reading and writing the features so saying that first accessing > >> > > the config before feature negotiation is done is not an option. The > >> > > specification ain't clear about setting the features multiple times > >> > > before FEATURES_OK, so I guess that should be fine. > >> > > > >> > > I don't consider this patch super clean, but frankly I don't think we > >> > > have a ton of options. Another option that may or man not be cleaner, > >> > > but is also IMHO much uglier is to figure out whether the device is > >> > > transitional by rejecting _F_VERSION_1, then resetting it and proceeding > >> > > according tho what we have figured out, hoping that the characteristics > >> > > of the device didn't change. > >> > > >> > I am confused here. So is the problem at the device or at the driver level? > >> > >> We have a driver regression. Since the 82e89ea077b9 ("virtio-blk: Add > >> validation for block size in config space") virtio-blk is broken on > >> s390. > > > > Because of a qemu bug. I agree. It's worth working around in the driver > > since the qemu bug has been around for a very long time. > > Yes, since we introduced virtio 1 support, I guess... > > > > > > >> The deeper problem is in the spec. We stated that the driver may read > >> config space before the feature negotiation is finalized, but we didn't > >> think enough about what happens when native endiannes is not little > >> endian in the different cases. > > > > Because the spec is very clear that endian-ness is LE. > > I don't see a spec issue yet here, just an implementation issue. > > Maybe not really a bug in the spec, but probably an issue, as this seems > to have been unclear to most people so far. > > > > >> I believe, for non-transitional devices we have a problem in the host as > >> well (i.e. in QEMU). > > > > Because QEMU ignores the spec and instead relies on the feature > > negotiation. > > > >> > >> > I suspect it's actually the host that has the issue, not > >> > the guest? > >> > >> I tend to say we have a problem both in the host and in the guest. I'm > >> more concerned about the problem in the guest, because that is a really > >> nasty regression. > > > > The problem is in the guest. The bug is in the host ;) > > > >> For the host. I think for legacy we don't have a > >> problem, because both sides would operate on the assumption no > >> _F_VERSION_1, IMHO the implementation for the transitional devices is > >> correct. > > > > Well no, the point of transitional is really to be 1.0 compliant > > *and* also expose a legacy interface. > > Worth noting that PCI and CCW are a tad different here: PCI exposes an > additional interface, while CCW uses a revision negotiation mechanism > (for CCW, legacy and standard-compliant are much closer on the transport > side as for PCI.) MMIO does not do transitional, if I'm not wrong. Right. It probably still uses VIRTIO_F_VERSION_1 and we need to fix that. > > > >> For non-transitional flavor, it depends on the device. For > >> example virtio-net and virtio-blk is broken, because we use primitives > >> like virtio_stl_p() and those don't do the right thing before feature > >> negotiation is completed. On the other hand virtio-crypto.c as a truly > >> non-transitional device uses stl_le_p() and IMHO does the right thing. > >> > >> Thanks for your comments! I hope I managed to answer your questions. I > >> need some guidance on how do we want to move forward on this. > >> > >> Regards, > >> Halil > > > > OK so. I don't have a problem with the patch itself, > > assuming it's enough to work around all buggy hosts. > > I am especially worried about things like vhost/vhost-user, > > I suspect they might have a bug like this too, and > > I am not sure whether your work around is enough for these. > > Can you check please? > > > > If not we'll have to move all validate code to after FEATURES_OK > > is set. > > What is supposed to happen for validate after FEATURES_OK? The driver > cannot change any features at that point in time, it can only fail to > use the device. Fail to use the device. Need to tread carefully here of course, we don't want to break working setups. > > > > We do however want to document that this API can be called > > multiple times since that was not the case > > previously. > > > > Also, I would limit this to when > > - the validate callback exists > > - the guest endian-ness is not LE > > > > We also want to document the QEMU bug in a comment here, > > e.g. > > > > /* > > * QEMU before version 6.2 incorrectly uses driver features with guest > > * endian-ness to set endian-ness for config space instead of just using > > * LE for the modern interface as per spec. > > * This breaks reading config in the validate callback. > > * To work around that, when device is 1.0 (so supposed to be LE) > > * but guest is not LE, then send the features to device one extra > > * time before validation. > > */ > > Do we need to consider migration, or do we not need to be bug-compatible > in this case? I suspect we don't need to be bug compatible, any driver accessing config before FEATURES_OK is already broken ... > > > > Finally I'd like to see the QEMU bug fix before I merge this one, > > since it will be harder to test with a fix. > > > > > > > > > >> > > >> > > >> > > Signed-off-by: Halil Pasic > >> > > Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space") > >> > > Reported-by: markver@us.ibm.com > >> > > --- > >> > > drivers/virtio/virtio.c | 4 ++++ > >> > > 1 file changed, 4 insertions(+) > >> > > > >> > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c > >> > > index 0a5b54034d4b..9dc3cfa17b1c 100644 > >> > > --- a/drivers/virtio/virtio.c > >> > > +++ b/drivers/virtio/virtio.c > >> > > @@ -249,6 +249,10 @@ static int virtio_dev_probe(struct device *_d) > >> > > if (device_features & (1ULL << i)) > >> > > __virtio_set_bit(dev, i); > >> > > > >> > > + /* Write back features before validate to know endianness */ > >> > > + if (device_features & (1ULL << VIRTIO_F_VERSION_1)) > >> > > + dev->config->finalize_features(dev); > >> > > + > >> > > if (drv->validate) { > >> > > err = drv->validate(dev); > >> > > if (err) > >> > > > >> > > base-commit: 02d5e016800d082058b3d3b7c3ede136cdc6ddcb > >> > > -- > >> > > 2.25.1 > >> >