Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753801Ab3J0Ajy (ORCPT ); Sat, 26 Oct 2013 20:39:54 -0400 Received: from mail-wi0-f169.google.com ([209.85.212.169]:56428 "EHLO mail-wi0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753344Ab3J0Ajw (ORCPT ); Sat, 26 Oct 2013 20:39:52 -0400 MIME-Version: 1.0 In-Reply-To: References: <20131015024452.GA31951@srcf.ucam.org> <20131016202123.GA17866@google.com> From: Andreas Noever Date: Sun, 27 Oct 2013 02:39:30 +0200 Message-ID: Subject: Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan To: Bjorn Helgaas Cc: Yinghai Lu , Matthew Garrett , "linux-kernel@vger.kernel.org" , "Rafael J. Wysocki" , "linux-pci@vger.kernel.org" , Mika Westerberg , "Kirill A. Shutemov" Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4128 Lines: 82 > Sorry, I didn't understand this. Is this supposed to be an > explanation of how 928bea fixes the oops that Andreas saw? If so, can > you be a little more explicit about when the pci_dev got freed and > when pci_pme_list_scan() walked the list and accessed the freed area? I did some more debugging and it seems that 928bea is innocent after all. I added some debugging statements to pci_pme_active. The additional delay seems to make the oops easier to trigger and I can now replicate it up to https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5137a2ee2007d9cbbbeebd14abe08357a079b607 which makes much more sense. Here is what's going on (in 3.11). First of all pci_pme_activate is only ever called with false as the second paramter during boot. Now when I unplug the adapter, the first call is: [] dump_stack+0x54/0x8d [] pci_pme_active+0x30/0x210 [] ? pci_read+0x2c/0x30 (this should be pci_stop_dev imho) [] pci_stop_bus_device+0x4e/0xa0 [] pci_stop_bus_device+0x3b/0xa0 [] pci_stop_bus_device+0x3b/0xa0 [] pci_stop_and_remove_bus_device+0x12/0x20 [] pciehp_unconfigure_device+0xa8/0x1b0 [] pciehp_disable_slot+0x68/0x200 [] pciehp_power_thread+0x83/0xf0 [] process_one_work+0x178/0x470 [] worker_thread+0x121/0x3a0 [] ? manage_workers.isra.21+0x2b0/0x2b0 [] kthread+0xc0/0xd0 [] ? SyS_unshare+0x220/0x280 [] ? kthread_create_on_node+0x120/0x120 [] ret_from_fork+0x7c/0xb0 [] ? kthread_create_on_node+0x120/0x120 tg3 0000:0a:00.0: PME# disabled This is still fine. But then it gets interesting. The next call is: [] dump_stack+0x54/0x8d [] pci_pme_active+0x30/0x210 [] __pci_enable_wake+0x65/0x160 [] pci_wake_from_d3+0x25/0x40 [] tg3_power_down+0x29/0x40 [tg3] [] tg3_close+0x10c/0x1d0 [tg3] [] __dev_close_many+0x85/0xd0 [] dev_close_many+0x8b/0x100 [] rollback_registered_many+0xd8/0x250 [] rollback_registered+0x2d/0x40 [] unregister_netdevice_queue+0x58/0xb0 [] unregister_netdev+0x1c/0x30 [] tg3_remove_one+0x6b/0x120 [tg3] [] pci_device_remove+0x3b/0xb0 [] __device_release_driver+0x7f/0xf0 [] device_release_driver+0x23/0x30 [] bus_remove_device+0xf4/0x170 [] device_del+0x135/0x1d0 [] pci_stop_bus_device+0x94/0xa0 [] pci_stop_bus_device+0x3b/0xa0 [] pci_stop_bus_device+0x3b/0xa0 [] pci_stop_and_remove_bus_device+0x12/0x20 [] pciehp_unconfigure_device+0xa8/0x1b0 [] pciehp_disable_slot+0x68/0x200 [] pciehp_power_thread+0x83/0xf0 [] process_one_work+0x178/0x470 [] worker_thread+0x121/0x3a0 [] ? manage_workers.isra.21+0x2b0/0x2b0 [] kthread+0xc0/0xd0 [] ? SyS_unshare+0x220/0x280 [] ? kthread_create_on_node+0x120/0x120 [] ret_from_fork+0x7c/0xb0 [] ? kthread_create_on_node+0x120/0x120 tg3 0000:0a:00.0: PME# enabled On removal tg3 calls pci_wake_from_d3 to enable/disable wake-on-lan. This then calls pci_pme_activate(dev, true) for a device which is about to be deleted. The linked commit does no longer call pci_wake_from_d3, which "fixes" the problem. Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/