Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754167AbdCaDWg (ORCPT ); Thu, 30 Mar 2017 23:22:36 -0400 Received: from mx1.redhat.com ([209.132.183.28]:55756 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753589AbdCaDWe (ORCPT ); Thu, 30 Mar 2017 23:22:34 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 6288780464 Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=mst@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 6288780464 Date: Fri, 31 Mar 2017 06:22:31 +0300 From: "Michael S. Tsirkin" To: Mike Galbraith Cc: Christoph Hellwig , Thorsten Leemhuis , virtio-dev@lists.oasis-open.org, Linux Kernel Mailing List , rjones@redhat.com Subject: Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues") Message-ID: <20170331032231.GA2471@redhat.com> References: <20170323145622.GA31690@lst.de> <1490605644.14634.50.camel@gmx.de> <20170327170540.GA28715@lst.de> <1490638711.26533.44.camel@gmx.de> <1490768602.5950.25.camel@gmx.de> <20170329230936-mutt-send-email-mst@kernel.org> <1490843414.4167.11.camel@gmx.de> <1490858435.4696.25.camel@gmx.de> <20170331041959-mutt-send-email-mst@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170331041959-mutt-send-email-mst@kernel.org> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Fri, 31 Mar 2017 03:22:33 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3188 Lines: 93 On Fri, Mar 31, 2017 at 04:23:35AM +0300, Michael S. Tsirkin wrote: > On Thu, Mar 30, 2017 at 09:20:35AM +0200, Mike Galbraith wrote: > > On Thu, 2017-03-30 at 05:10 +0200, Mike Galbraith wrote: > > > > > WRT spin, you should need do nothing more than boot with threadirqs, > > > that's 100% repeatable here in absolutely virgin source. > > > > No idea why virtqueue_get_buf() in __send_control_msg() fails forever > > with threadirqs, but marking that vq as being busted (it clearly is) > > results in one gripe, and a vbox that seemingly cares not one whit that > > something went missing. CONFIG_DEBUG_SHIRQ OTOH notices, mutters > > something that sounds like "idiot" when I hibernate the thing ;-) > > > > diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c > > index e9b7e0b3cabe..831406dae1cb 100644 > > --- a/drivers/char/virtio_console.c > > +++ b/drivers/char/virtio_console.c > > @@ -567,6 +567,7 @@ static ssize_t __send_control_msg(struct ports_device *portdev, u32 port_id, > > struct scatterlist sg[1]; > > struct virtqueue *vq; > > unsigned int len; > > + unsigned long deadline = jiffies+1; > > > > if (!use_multiport(portdev)) > > return 0; > > @@ -583,9 +584,13 @@ static ssize_t __send_control_msg(struct ports_device *portdev, u32 port_id, > > > > if (virtqueue_add_outbuf(vq, sg, 1, &portdev->cpkt, GFP_ATOMIC) == 0) { > > virtqueue_kick(vq); > > - while (!virtqueue_get_buf(vq, &len) > > - && !virtqueue_is_broken(vq)) > > + while (!virtqueue_get_buf(vq, &len) && !virtqueue_is_broken(vq)) { > > cpu_relax(); > > + if (time_after(jiffies, deadline)) { > > + trace_printk("Aw crap, I'm stuck.. breaking device\n"); > > + virtio_break_device(portdev->vdev); > > + } > > + } > > } > > > > spin_unlock(&portdev->c_ovq_lock); > > > OK so with your help I was able to reproduce. Surprisingly easy: > > 1. add threadirqs > 2. add to qemu -device virtio-serial-pci -no-shutdown > 3. within guest, do echo disk > /sys/power/state > > This produces a warning. Looking deeper into it, I find: > the device has 64 vqs. This line > > err = request_irq(pci_irq_vector(vp_dev->pci_dev, msix_vec), > vring_interrupt, IRQF_SHARED, > vp_dev->msix_names[j], vqs[i]); > > fails after assigning interrupts to 33 vqs. > Is there a limit to how many threaded irqs can share a line? In fact it fails on the 33'rd one, and I see this: /* * Unlikely to have 32 resp 64 irqs sharing one line, * but who knows. */ if (thread_mask == ~0UL) { printk(KERN_ERR "%s +%d\n", __FILE__, __LINE__); ret = -EBUSY; goto out_mask; } I'm not sure why does it fail after 32 on 64 bit, but as virtio devices aren't limited to 32 vqs it looks like we should go back to requesting the irq only once for all vqs. Christoph, should I just revert for now, or do you want to look into a smaller patch for this? Another question is looking into intx support - that should work but it seems to be broken at the moment. > > If so we need to rethink the whole approach. > > Still looking into it. > > Christoph, any idea? > > > -- > MST