Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751339AbbEYW4s (ORCPT ); Mon, 25 May 2015 18:56:48 -0400 Received: from v094114.home.net.pl ([79.96.170.134]:42733 "HELO v094114.home.net.pl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1751065AbbEYW4p (ORCPT ); Mon, 25 May 2015 18:56:45 -0400 From: "Rafael J. Wysocki" To: Boris Ostrovsky , Sander Eikelenboom Cc: david.vrabel@citrix.com, linux@eikelenboom.it, xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org Subject: Re: [Xen-devel] Regression due to "device property: Make it possible to use secondary firmware nodes" Re: Xen-unstable + linux 4.1-mergewindow: problems with PV guest pci passthrough: pcifront pci-0: pciback not responding!!! Date: Tue, 26 May 2015 01:22:12 +0200 Message-ID: <1768370.Rs1vKzEP4D@vostro.rjw.lan> User-Agent: KMail/4.11.5 (Linux/4.0.0+; KDE/4.11.5; x86_64; ; ) In-Reply-To: <555FDDA1.8030806@oracle.com> References: <555FDDA1.8030806@oracle.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="utf-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 11443 Lines: 205 On Friday, May 22, 2015 09:53:37 PM Boris Ostrovsky wrote: > On 05/22/2015 04:11 AM, Sander Eikelenboom wrote: > > Hello Sander, > > > > Friday, May 15, 2015, 12:47:27 AM, you wrote: > > > >> Sorry for the resend, i messed up the to's en from's. > > > >> Hi Konrad / David, > > > >> One big snip on this thread, got some more debug info, hopefully this will > >> lead to something: > > > >> On a working kernel (with the two seemingly non related patches reverted) i get: > > > >> [ 0.717796] pcifront pci-0: Allocated pdev @ 0xffff880019e11780 pdev->sh_info @ 0xffff880018f58000 > >> [ 0.717848] pcifront pci-0: ?!?!? before alloc gntref: 0 > >> [ 0.717871] pcifront pci-0: ?!?!? after alloc gntref: 8 > >> [ 0.717892] pcifront pci-0: ?!?!? before alloc evtchn: -1 > >> [ 0.717915] pcifront pci-0: ?!?!? after alloc evtchn: 17 > >> [ 0.717984] pcifront pci-0: ?!?!? bound evtchn:17 to irqhandler:-1 err:31 > >> [ 0.721640] pcifront pci-0: publishing successful! > >> [ 0.723684] usbcore: registered new interface driver udlfb > >> [ 0.724664] xen:xen_evtchn: Event-channel device installed > >> [ 0.726597] pcifront pci-0: Installing PCI frontend > >> [ 0.726853] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled > >> [ 0.727059] pcifront pci-0: Creating PCI Frontend Bus 0000:00 > >> [ 0.727363] pcifront pci-0: PCI host bridge to bus 0000:00 > >> [ 0.727391] pci_bus 0000:00: root bus resource [io 0x0000-0xffff] > >> [ 0.727417] pci_bus 0000:00: root bus resource [mem 0x00000000-0xffffffffffff] > >> [ 0.727452] pci_bus 0000:00: root bus resource [bus 00-ff] > >> [ 0.727475] pci_bus 0000:00: scanning bus > >> [ 0.727503] pcifront pci-0: read dev=0000:00:00.0 - offset 0 size 4 > >> [ 0.728253] Linux agpgart interface v0.103 > >> [ 0.728387] Hangcheck: starting hangcheck timer 0.9.1 (tick is 180 seconds, margin is 60 seconds). > >> [ 0.728474] [drm] Initialized drm 1.1.0 20060810 > >> [ 0.728551] [drm] radeon kernel modesetting enabled. > >> [ 0.730319] pcifront pci-0: ?!?!? pciback responded !!! irq:31 irq_flags:ffff880019e100a8 ns: 1431641785551700000 ns_timeout: 1431641787541235000 evtchn:17 gnt_ref:8 > >> [ 0.730319] pcifront pci-0: ?!?!? op cmd:0 err:0 info:0 offset:0 size:4 > >> [ 0.730319] pcifront pci-0: ?!?!? active_op cmd:0 err:0 info:0 offset:0 size:4 > >> [ 0.730319] pcifront pci-0: read got back value 11113f6 > >> [ 0.738845] pcifront pci-0: read dev=0000:00:00.0 - offset e size 1 > >> [ 0.744976] brd: module loaded > >> [ 0.745204] pcifront pci-0: ?!?!? pciback responded !!! irq:31 irq_flags:ffff880019e100a8 ns: 1431641785562852000 ns_timeout: 1431641787552580000 evtchn:17 gnt_ref:8 > >> [ 0.745204] pcifront pci-0: ?!?!? op cmd:0 err:0 info:0 offset:14 size:1 > >> [ 0.745204] pcifront pci-0: ?!?!? active_op cmd:0 err:0 info:0 offset:14 size:1 > >> [ 0.745204] pcifront pci-0: read got back value 0 > >> [ 0.749204] pcifront pci-0: read dev=0000:00:00.0 - offset 6 size 2 > >> [ 0.750155] loop: module loaded > >> [ 0.752527] pcifront pci-0: ?!?!? pciback responded !!! irq:31 irq_flags:ffff880019e100a8 ns: 1431641785570841000 ns_timeout: 1431641787562917000 evtchn:17 gnt_ref:8 > >> [ 0.752527] pcifront pci-0: ?!?!? op cmd:0 err:0 info:0 offset:6 size:2 > >> [ 0.752527] pcifront pci-0: ?!?!? active_op cmd:0 err:0 info:0 offset:6 size:2 > >> [ 0.752527] pcifront pci-0: read got back value 210 > >> [ 0.757187] pcifront pci-0: read dev=0000:00:00.0 - offset 34 size 1 > > > > > >> Were as in the non-working situation i get: > > > >> [ 0.751244] pcifront pci-0: Allocated pdev @ 0xffff880019ec2e00 pdev->sh_info @ 0xffff88001aa51000 > >> [ 0.751295] pcifront pci-0: ?!?!? before alloc gntref: 0 > >> [ 0.751315] pcifront pci-0: ?!?!? after alloc gntref: 8 > >> [ 0.751334] pcifront pci-0: ?!?!? before alloc evtchn: -1 > >> [ 0.751355] pcifront pci-0: ?!?!? after alloc evtchn: 17 > >> [ 0.751422] pcifront pci-0: ?!?!? bound evtchn:17 to irqhandler:-1 err:31 > >> [ 0.755215] pcifront pci-0: publishing successful! > >> [ 0.757341] usbcore: registered new interface driver udlfb > >> [ 0.758365] xen:xen_evtchn: Event-channel device installed > >> [ 0.760419] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled > >> [ 0.760819] pcifront pci-0: Installing PCI frontend > >> [ 0.761518] pcifront pci-0: Creating PCI Frontend Bus 0000:00 > >> [ 0.761684] pcifront pci-0: PCI host bridge to bus 0000:00 > >> [ 0.761710] pci_bus 0000:00: root bus resource [io 0x0000-0xffff] > >> [ 0.761733] pci_bus 0000:00: root bus resource [mem 0x00000000-0xffffffffffff] > >> [ 0.761763] pci_bus 0000:00: root bus resource [bus 00-ff] > >> [ 0.761783] pci_bus 0000:00: scanning bus > >> [ 0.761805] pcifront pci-0: read dev=0000:00:00.0 - offset 0 size 4 > >> [ 0.767207] Linux agpgart interface v0.103 > >> [ 0.767362] Hangcheck: starting hangcheck timer 0.9.1 (tick is 180 seconds, margin is 60 seconds). > >> [ 0.767439] [drm] Initialized drm 1.1.0 20060810 > >> [ 0.767515] [drm] radeon kernel modesetting enabled. > >> [ 0.766948] pcifront pci-0: pciback not responding!!! irq:31 irq_flags:ffff880019ec0028 ns: 1431641983026498000 ns_timeout: 1431641983026497000 evtchn:0 gnt_ref:0 > >> [ 0.766948] pcifront pci-0: ?!?!? op cmd:0 err:0 info:0 offset:0 size:4 > >> [ 0.766948] pcifront pci-0: ?!?!? active_op cmd:0 err:0 info:0 offset:0 size:4 > >> [ 0.766948] pcifront pci-0: other err read got back err: ffffffff value: 0 > >> [ 2.762062] pcifront pci-0: read dev=0000:00:01.0 - offset 0 size 4 > >> [ 2.765203] pcifront pci-0: pciback not responding!!! irq:31 irq_flags:ffff880019ec0028 ns: 1431641985026742000 ns_timeout: 1431641985026741000 evtchn:0 gnt_ref:0 > >> [ 2.765203] pcifront pci-0: ?!?!? op cmd:0 err:0 info:0 offset:0 size:4 > >> [ 2.765203] pcifront pci-0: ?!?!? active_op cmd:0 err:0 info:0 offset:0 size:4 > >> [ 2.765203] pcifront pci-0: other err read got back err: ffffffff value: 0 > >> [ 4.762172] pcifront pci-0: read dev=0000:00:02.0 - offset 0 size 4 > >> [ 4.764231] brd: module loaded > >> [ 4.765508] loop: module loaded > >> [ 4.766748] pcifront pci-0: pciback not responding!!! irq:31 irq_flags:ffff880019ec0028 ns: 1431641987026850000 ns_timeout: 1431641987026849000 evtchn:0 gnt_ref:0 > >> [ 4.766748] pcifront pci-0: ?!?!? op cmd:0 err:0 info:0 offset:0 size:4 > >> [ 4.766748] pcifront pci-0: ?!?!? active_op cmd:0 err:0 info:0 offset:0 size:4 > >> [ 4.766748] pcifront pci-0: other err read got back err: ffffffff value: 0 > >> [ 6.762248] pcifront pci-0: read dev=0000:00:03.0 - offset 0 size 4 > >> [ 6.765545] pcifront pci-0: pciback not responding!!! irq:31 irq_flags:ffff880019ec0028 ns: 1431641989026930000 ns_timeout: 1431641989026929000 evtchn:0 gnt_ref:0 > >> [ 6.765545] pcifront pci-0: ?!?!? op cmd:0 err:0 info:0 offset:0 size:4 > >> [ 6.765545] pcifront pci-0: ?!?!? active_op cmd:0 err:0 info:0 offset:0 size:4 > >> [ 6.765545] pcifront pci-0: other err read got back err: ffffffff value: 0 > >> [ 8.762329] pcifront pci-0: read dev=0000:00:04.0 - offset 0 size 4 > >> [ 8.765626] pcifront pci-0: pciback not responding!!! irq:31 irq_flags:ffff880019ec0028 ns: 1431641991027006000 ns_timeout: 1431641991027005000 evtchn:0 gnt_ref:0 > >> [ 8.765626] pcifront pci-0: ?!?!? op cmd:0 err:0 info:0 offset:0 size:4 > >> [ 8.765626] pcifront pci-0: ?!?!? active_op cmd:0 err:0 info:0 offset:0 size:4 > >> [ 8.765626] pcifront pci-0: other err read got back err: ffffffff value: 0 > >> [ 10.762410] pcifront pci-0: read dev=0000:00:05.0 - offset 0 size 4 > >> [ 10.765701] pcifront pci-0: pciback not responding!!! irq:31 irq_flags:ffff880019ec0028 ns: 1431641993027087000 ns_timeout: 1431641993027086000 evtchn:0 gnt_ref:0 > >> [ 10.765701] pcifront pci-0: ?!?!? op cmd:0 err:0 info:0 offset:0 size:4 > >> [ 10.765701] pcifront pci-0: ?!?!? active_op cmd:0 err:0 info:0 offset:0 size:4 > >> [ 10.765701] pcifront pci-0: other err read got back err: ffffffff value: 0 > >> [ 12.762472] pcifront pci-0: read dev=0000:00:06.0 - offset 0 size 4 > > > > > >> So somehow in the non-working situation, pdev->evtchn and pdev->gnt_ref are 0 in > >> xen-pcifront.c:do_pci_op(), so no wonder it's not getting a response back ... > > > >> Question is .. why ? > > > >> -- > >> Sander > > > > > > Ping ? > > > > David / Boris, > > > > Any idea, since Konrad seems to be off for 2 weeks and we are at rc4 now. > > > > (+Rafael again) > > So the immediate cause of those errors is that pdev->evtchn is 0. > Backend is not notified and things not go well then. > > And it is indeed caused by 97badf873ab60e841243b66133ff9eff2a46ef29: > > We allocate pcifront_sd in pcifront_scan_root() and then pass it to > pci_scan_bus_parented() as sysdata. Eventually this sysdata is used in > pcibios_root_bridge_prepare() as pci_sysdata. It is dereferenced as > pci_sysdata->companion (which I believe is aliased to pcifront_sd->pdev) > and then set_primary_fwnode() writes it, thus corrupting > pcifront_sd->pdev (and I think this is what sets evtchn to zero). Thanks for the analysis! OK, so the pcibios_root_bridge_prepare() in arch/x86/pci/acpi.c assumes that bridge->bus->sysdata points to a struct pci_sysdata which has a 'companion' of type struct acpi_device. This is supposed to come from pci_acpi_scan_root() and not something else. > I don't have a fix for that. I will see what we can do on Tuesday since > I am out on Monday. > > Question to Rafael about commit 97badf873ab60e84124: is it really safe > to assume that bridge->bus->sysdata is a pointer to pci_sysdata in > pcibios_root_bridge_prepare()? It is declared as 'void *'. That's because other architectures pass different things through it IIRC. It should be a struct pci_sysdata pointer on x86 and ia64 at least. It is a bug otherwise and things only worked by accident before. In particular, the ACPI companion of bridge->dev was set to something random located at the end of the struct pcifront_sd or behind it (on 64 bit). If referenced, that would crash the kernel. Padding struct pcifront_sd to match the layout of struct pci_sysdata should make it work. Of course, a real fix would be to use a different pcibios_root_bridge_prepare() for Xen. Sander, can you please check if the patch below (untested) makes any difference? --- drivers/pci/xen-pcifront.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) Index: linux-pm/drivers/pci/xen-pcifront.c =================================================================== --- linux-pm.orig/drivers/pci/xen-pcifront.c +++ linux-pm/drivers/pci/xen-pcifront.c @@ -53,6 +53,8 @@ struct pcifront_device { struct pcifront_sd { int domain; + int node; + void *padding[2]; struct pcifront_device *pdev; }; @@ -465,7 +467,7 @@ static int pcifront_scan_root(struct pci domain, bus); bus_entry = kmalloc(sizeof(*bus_entry), GFP_KERNEL); - sd = kmalloc(sizeof(*sd), GFP_KERNEL); + sd = kzalloc(sizeof(*sd), GFP_KERNEL); if (!bus_entry || !sd) { err = -ENOMEM; goto err_out; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/