Received: by 2002:a05:6358:4e97:b0:b3:742d:4702 with SMTP id ce23csp3262477rwb; Mon, 15 Aug 2022 22:38:17 -0700 (PDT) X-Google-Smtp-Source: AA6agR4av8IsEegDJ1C40uZZ8ro7ZpDOxckiGHlSXAVAZUvEF2rFw2+Uxo7DnIbCsAJQrhX9npdU X-Received: by 2002:a17:903:230b:b0:16f:2276:1fc4 with SMTP id d11-20020a170903230b00b0016f22761fc4mr19516221plh.172.1660628296902; Mon, 15 Aug 2022 22:38:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660628296; cv=none; d=google.com; s=arc-20160816; b=qCmvbXMA11VuJ0EQQfA1HzToGktM3DyL3Lq6YhWDfWbRVgJf2IT76K/1fEcd1E6vxK VQJro7HQ+u0JPNuz3lh655fIcR2hGsr+wZ5QWbHawuBJHAt3LGlAvFViEtpWdST97fQg yootfGM6xgLySEE1oon5eGiTbHIuyVBk6Gkf8CDKjrpT8w21F/pgbxKGR8yF/MIcjqsf tZo460yMWK8YjBCg8MBGUVa5qHzyN97WSAB0S1z7WOMKm3tu2JkpAg61WwOrpsoIbGKi CswuVjC9w7PkQ6KHFJdPmdIzyrfYp1OdRr7cwvQmeEXS7xOqxaux3l18g4olIr22wh1D sdjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=C5E1PnMcfUBlqR4DgvXh0RULdXNDlbBNqvxltGmBpAA=; b=RouccKG+yPr1UM+geN51Zq7rScccoGe2Qixl98zGTGpbgmF6nkqqYd4td/O+r/615C m+FbwDlQqq/YoVi35MzSjcZtkaGFSwgRqpJN1oL6y80CbahOGVzVJfwxYbC/UFK++zor TrgO8Ua5AWb3Jsq5dSKZOnmTcCpU/gAse/V4VYft9W/ADOGoBHep7IGevw1zht8ebMzo pXUgS+NFpeSgrRQfh0bVlfy5IMrVE6gd5zLsMJuJfMGD6Yuo6baRINmphD/oFHokT1ky fCpcHBoxJPE/NFnmYtzYs3XcSo87Mgi2uDGxYEYtoVZcKsJhO0Mhn1iPT0ZHwaTPQGIl csOA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Gtnln9nt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 66-20020a630245000000b0041bf0a54ad0si12959744pgc.729.2022.08.15.22.38.06; Mon, 15 Aug 2022 22:38:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Gtnln9nt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230167AbiHPFPK (ORCPT + 99 others); Tue, 16 Aug 2022 01:15:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55648 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232192AbiHPFOt (ORCPT ); Tue, 16 Aug 2022 01:14:49 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 237C2F14D2 for ; Mon, 15 Aug 2022 14:32:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1660599134; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=C5E1PnMcfUBlqR4DgvXh0RULdXNDlbBNqvxltGmBpAA=; b=Gtnln9nt/OM9QnZNFV/bYdho60PB040Yn39qwfAfqJZbTvZBg3c7gIZdOfHxdzWM7+mnC5 2mC5NoW34+7PLDvzQWXQqq5XDJPJ7Y5v7Pz0uO/6s9ipvb/tDdIZ7et7tMENYxJ92V6oa0 7uRxsJ/AUjlFI+PNInkn+7QEG26yCJA= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-163-aCj3XgimOk-2dQ8YGii7Mw-1; Mon, 15 Aug 2022 17:32:13 -0400 X-MC-Unique: aCj3XgimOk-2dQ8YGii7Mw-1 Received: by mail-wm1-f69.google.com with SMTP id ay27-20020a05600c1e1b00b003a5bff0df8dso69771wmb.0 for ; Mon, 15 Aug 2022 14:32:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc; bh=C5E1PnMcfUBlqR4DgvXh0RULdXNDlbBNqvxltGmBpAA=; b=aayZqlaGrzqNLV9+NbXoscszkfXpCjDtB/j6Rwc+kyKFT2e1Voqfa9FqOZUkyQSiYE M0lkYgOn5GecVvZschCbAWX/SyhbGnyWumJNn/VxgvgfdrjZSILTekq3UaW7EIbdi6Fk eP5eOQO9/MqE/k6PA+n4jWbKZpIp+NOE6kRUZaVJrU9PeRYrms0A2j54PTNG5ESVBH2A Rh3cJ0lHZirqqna6/5HzDuQy2m7fvjazmVug0E9JUUJ03O+5eAtSY78ZfQAIsL9Yw/Q+ 68DptZr+LJANREh28JsiJ5D0542/N9PwsEPvk8Hp2egYgdTa5q7mtzdugRqjtlkF+mTf fUkA== X-Gm-Message-State: ACgBeo3wGxiMJAtt+pgLHZK2lvniqoTUovstCmeY7qxn/w3Gx104AD4h mjTvC32Tb2CB+iOlRD4qR9oWHPLAl1swObM/ANO8V/rZIPFqiF4BwSoV3qrAuYVLhGkO2cOYv8M nqg9W5qgXTK532XOBkiLIhPnN X-Received: by 2002:a7b:cb0e:0:b0:3a5:afff:d520 with SMTP id u14-20020a7bcb0e000000b003a5afffd520mr16640109wmj.3.1660599131759; Mon, 15 Aug 2022 14:32:11 -0700 (PDT) X-Received: by 2002:a7b:cb0e:0:b0:3a5:afff:d520 with SMTP id u14-20020a7bcb0e000000b003a5afffd520mr16640100wmj.3.1660599131522; Mon, 15 Aug 2022 14:32:11 -0700 (PDT) Received: from redhat.com ([2.55.4.37]) by smtp.gmail.com with ESMTPSA id q13-20020a056000136d00b00224f5bfa890sm7444926wrz.97.2022.08.15.14.32.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 Aug 2022 14:32:10 -0700 (PDT) Date: Mon, 15 Aug 2022 17:32:06 -0400 From: "Michael S. Tsirkin" To: Andres Freund Cc: Xuan Zhuo , Jason Wang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, Linus Torvalds , Jens Axboe , James Bottomley , "Martin K. Petersen" , Guenter Roeck , linux-kernel@vger.kernel.org, Greg KH , c@redhat.com Subject: Re: upstream kernel crashes Message-ID: <20220815170444-mutt-send-email-mst@kernel.org> References: <20220815034532-mutt-send-email-mst@kernel.org> <20220815081527.soikyi365azh5qpu@awork3.anarazel.de> <20220815042623-mutt-send-email-mst@kernel.org> <20220815113729-mutt-send-email-mst@kernel.org> <20220815164503.jsoezxcm6q4u2b6j@awork3.anarazel.de> <20220815124748-mutt-send-email-mst@kernel.org> <20220815174617.z4chnftzcbv6frqr@awork3.anarazel.de> <20220815161423-mutt-send-email-mst@kernel.org> <20220815205330.m54g7vcs77r6owd6@awork3.anarazel.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220815205330.m54g7vcs77r6owd6@awork3.anarazel.de> X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 15, 2022 at 01:53:30PM -0700, Andres Freund wrote: > Hi, > > On 2022-08-15 16:21:51 -0400, Michael S. Tsirkin wrote: > > On Mon, Aug 15, 2022 at 10:46:17AM -0700, Andres Freund wrote: > > > Hi, > > > > > > On 2022-08-15 12:50:52 -0400, Michael S. Tsirkin wrote: > > > > On Mon, Aug 15, 2022 at 09:45:03AM -0700, Andres Freund wrote: > > > > > Hi, > > > > > > > > > > On 2022-08-15 11:40:59 -0400, Michael S. Tsirkin wrote: > > > > > > OK so this gives us a quick revert as a solution for now. > > > > > > Next, I would appreciate it if you just try this simple hack. > > > > > > If it crashes we either have a long standing problem in virtio > > > > > > code or more likely a gcp bug where it can't handle smaller > > > > > > rings than what device requestes. > > > > > > Thanks! > > > > > > > > > > I applied the below and the problem persists. > > > > > > > > > > [...] > > > > > > > > Okay! > > > > > > Just checking - I applied and tested this atop 6.0-rc1, correct? Or did you > > > want me to test it with the 762faee5a267 reverted? I guess what you're trying > > > to test if a smaller queue than what's requested you'd want to do so without > > > the problematic patch applied... > > > > > > > > > Either way, I did this, and there are no issues that I could observe. No > > > oopses, no broken networking. But: > > > > > > To make sure it does something I added a debugging printk - which doesn't show > > > up. I assume this is at a point at least earlyprintk should work (which I see > > > getting enabled via serial)? > > > > > > Sorry if I was unclear. I wanted to know whether the change somehow > > exposes a driver bug or a GCP bug. So what I wanted to do is to test > > this patch on top of *5.19*, not on top of the revert. > > Right, the 5.19 part was clear, just the earlier test: > > > > > > On 2022-08-15 11:40:59 -0400, Michael S. Tsirkin wrote: > > > > > > OK so this gives us a quick revert as a solution for now. > > > > > > Next, I would appreciate it if you just try this simple hack. > > > > > > If it crashes we either have a long standing problem in virtio > > > > > > code or more likely a gcp bug where it can't handle smaller > > > > > > Thanks! > > I wasn't sure about. > > After I didn't see any effect on 5.19 + your patch, I grew a bit suspicious > and added the printks. > > > > Yes I think printk should work here. > > The reason the debug patch didn't change anything, and that my debug printk > didn't show, is that gcp uses the legacy paths... Wait a second. Eureka I think! So I think GCP is not broken. I think what's broken is this patch: commit cdb44806fca2d0ad29ca644cbf1505433902ee0c Author: Xuan Zhuo Date: Mon Aug 1 14:38:54 2022 +0800 virtio_pci: support the arg sizes of find_vqs() Specifically: diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c index 2257f1b3d8ae..d75e5c4e637f 100644 --- a/drivers/virtio/virtio_pci_legacy.c +++ b/drivers/virtio/virtio_pci_legacy.c @@ -112,6 +112,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev, unsigned int index, void (*callback)(struct virtqueue *vq), const char *name, + u32 size, bool ctx, u16 msix_vec) { @@ -125,10 +126,13 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev, if (!num || vp_legacy_get_queue_enable(&vp_dev->ldev, index)) return ERR_PTR(-ENOENT); + if (!size || size > num) + size = num; + info->msix_vector = msix_vec; /* create the vring */ - vq = vring_create_virtqueue(index, num, + vq = vring_create_virtqueue(index, size, VIRTIO_PCI_VRING_ALIGN, &vp_dev->vdev, true, false, ctx, vp_notify, callback, name); So if you pass the size parameter for a legacy device it will try to make the ring smaller and that is not legal with legacy at all. But the driver treats legacy and modern the same, it allocates a smaller queue anyway. Lo and behold, I pass disable-modern=on to qemu and it happily corrupts memory exactly the same as GCP does. So the new find_vqs API is actually completely broken, it can not work for legacy at all and for added fun there's no way to find out that it's legacy. Maybe we should interpret the patch So I think I will also revert 04ca0b0b16f11faf74fa92468dab51b8372586cd..fe3dc04e31aa51f91dc7f741a5f76cc4817eb5b4 > If there were a bug in the legacy path, it'd explain why the problem only > shows on gcp, and not in other situations. > > I'll queue testing the legacy path with the equivalent change. > > - Andres > > > Greetings, > > Andres Freund