Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030672AbWJKHKP (ORCPT ); Wed, 11 Oct 2006 03:10:15 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1030674AbWJKHKP (ORCPT ); Wed, 11 Oct 2006 03:10:15 -0400 Received: from holoclan.de ([62.75.158.126]:23237 "EHLO mail.holoclan.de") by vger.kernel.org with ESMTP id S1030672AbWJKHKL (ORCPT ); Wed, 11 Oct 2006 03:10:11 -0400 Date: Wed, 11 Oct 2006 09:06:50 +0200 From: Martin Lorenz To: linux-thinkpad@linux-thinkpad.org Cc: Jeremy Fitzhardinge , linux-kernel@vger.kernel.org, "Brown, Len" , acpi-devel@kernel.org, Pavel Machek , Andrew Morton Subject: Re: [ltp] Re: X60s w/t kern 2.6.19-rc1-git: two BUG warnings Message-ID: <20061011070650.GA7003@gimli> Mail-Followup-To: linux-thinkpad@linux-thinkpad.org, Jeremy Fitzhardinge , linux-kernel@vger.kernel.org, "Brown, Len" , acpi-devel@kernel.org, Pavel Machek , Andrew Morton References: <20061010062826.GC9895@gimli> <452BECAE.2070001@goop.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.13 (2006-08-11) X-Spam-Score: -1.4 (-) X-Spam-Report: Spam detection software, running on the system "www.holoclan.de", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: On Tue, Oct 10, 2006 at 05:18:27PM -0600, Eric W. Biederman wrote: > Jeremy Fitzhardinge writes: > > > Martin Lorenz wrote: > >> Dear kernel gurus, > >> > > > > (CC:s added to various interested parties.) > > > >> whatever I do and whic problem I seem to get fixed new ones arise: > >> > >> now I loose ACPI events after suspend/resume. not every time, but roughly 3 > >> out of 4 times. > >> > > > > Yes, I'm seeing this as well. It used to be a 100% failure, so there has been > > some improvement. On the other hand, before that it used to work perfectly... > > Unfortunately I don't have a good feeling for when these changes happened. > > So the first symptom is not the failure in e1000 open but this. > [ 6727.082000] Trying to free already-free IRQ 20 > Which I suspect comes from e1000_close as it happens after > the e1000 watchdog timer goes off because it hasn't gotten > interrupts for a while. > > We have other bits of suspicious code as well. > The e1000 rolls it's own version of pci_save_state but doesn't > call any of the msi save/restore state routines. > > The bug is from the attempt to allocate an already allocated irq. > So it appears somehow in the save/restore mess the msi code > thought the irq code was allocates but the irq code did not? > [...] Content analysis details: (-1.4 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.4 ALL_TRUSTED Passed through trusted hosts only via SMTP Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6843 Lines: 164 On Tue, Oct 10, 2006 at 05:18:27PM -0600, Eric W. Biederman wrote: > Jeremy Fitzhardinge writes: > > > Martin Lorenz wrote: > >> Dear kernel gurus, > >> > > > > (CC:s added to various interested parties.) > > > >> whatever I do and whic problem I seem to get fixed new ones arise: > >> > >> now I loose ACPI events after suspend/resume. not every time, but roughly 3 > >> out of 4 times. > >> > > > > Yes, I'm seeing this as well. It used to be a 100% failure, so there has been > > some improvement. On the other hand, before that it used to work perfectly... > > Unfortunately I don't have a good feeling for when these changes happened. > > So the first symptom is not the failure in e1000 open but this. > [ 6727.082000] Trying to free already-free IRQ 20 > Which I suspect comes from e1000_close as it happens after > the e1000 watchdog timer goes off because it hasn't gotten > interrupts for a while. > > We have other bits of suspicious code as well. > The e1000 rolls it's own version of pci_save_state but doesn't > call any of the msi save/restore state routines. > > The bug is from the attempt to allocate an already allocated irq. > So it appears somehow in the save/restore mess the msi code > thought the irq code was allocates but the irq code did not? > this morning I tried and booted the machine with pci=nomsi the BUG does not come up as expected but the symptom of loosing ACPI after suspend/resume remains... I put up some information on my current configuration on http://www.lorenz.eu.org/~mlo/kernel/ tell me, what I should add to help BTW: can someone tell me, why I can't pull from git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git anymore? I keep getting 'fatal: unable to connect a socket (Connection refused)' since yesterday afternoon (first noticed ~2pm UTC) > I have scratched the msi code enough lately that something may have > shaken loose. In particular in commit: > d7bd007e480672c99d8656c7b7b12ef0549432c37 I explicitly disallowed some > weird double allocation/deallocation code that could have masked this > problem. > > The e1000 watchdog doesn't seem to be going off in the functioning > case so that may not be it. > > My problem is I have been given two incomplete dmesg traces one > that works and one that doesn't and without any background in this > area beyond noting the e1000 watchdog timer is firing I can't say what > happened. > > In the working cases do you have trouble if you up and down the > network interface? > > We seem to be at that nasty interface layer where every piece is > making assumptions about how someone else's piece works, and someone > has their assumptions wrong. > > If someone who understands the how suspend/resume is supposed to work > for these kinds of things can speak up that would help. > > > > >> the only errornous things I see in the logs are those: > >> > >> [ 6727.089000] BUG: warning at drivers/pci/msi.c:680/pci_enable_msi() > >> [ 6727.089000] [] dump_trace+0x69/0x1af > >> [ 6727.089000] [] show_trace_log_lvl+0x18/0x2c > >> [ 6727.089000] [] show_trace+0xf/0x11 > >> [ 6727.089000] [] dump_stack+0x15/0x17 > >> [ 6727.089000] [] pci_enable_msi+0x78/0x22e > >> [ 6727.090000] [] e1000_open+0x64/0x176 > >> [ 6727.091000] [] dev_open+0x2b/0x62 > >> [ 6727.092000] [] dev_change_flags+0x47/0xe4 > >> [ 6727.093000] [] devinet_ioctl+0x252/0x556 > >> [ 6727.094000] [] sock_ioctl+0x19e/0x1c2 > >> [ 6727.095000] [] do_ioctl+0x1f/0x62 > >> [ 6727.095000] [] vfs_ioctl+0x245/0x257 > >> [ 6727.096000] [] sys_ioctl+0x4c/0x67 > >> [ 6727.096000] [] syscall_call+0x7/0xb > >> [ 6727.096000] DWARF2 unwinder stuck at syscall_call+0x7/0xb > >> [ 6727.096000] > >> [ 6727.096000] Leftover inexact backtrace: > >> [ 6727.096000] > >> [ 6727.096000] [] unix_stream_connect+0x15b/0x367 > >> [ 6727.096000] ======================= > >> > >> which occurs in variations but very often > >> > > > > This just seems to be something thinking the IRQ isn't MSI-capable. It doesn't > > seem to have any negative effect - the e1000 works fine anyway. > > It is a BUG. The first symptoms of which where the failure to free > the irq in by the e1000 watchdog. The normal irq code said you didn't > have an irq handler registered for that irq. > > Does anyone know how the irq handler could go aways in the weird > suspend/resume world? > > >> [ 6708.389000] e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps Full > > Duplex > >> [ 6708.389000] e1000: eth0: e1000_watchdog: 10/100 speed: disabling TSO > >> [ 6727.082000] Trying to free already-free IRQ 20 > >> [ 6727.089000] BUG: warning at drivers/pci/msi.c:680/pci_enable_msi() > >> [ 6727.089000] [] dump_trace+0x69/0x1af > >> [ 6727.089000] [] show_trace_log_lvl+0x18/0x2c > >> [ 6727.089000] [] show_trace+0xf/0x11 > >> [ 6727.089000] [] dump_stack+0x15/0x17 > >> [ 6727.089000] [] pci_enable_msi+0x78/0x22e > >> [ 6727.090000] [] e1000_open+0x64/0x176 > >> [ 6727.091000] [] dev_open+0x2b/0x62 > >> [ 6727.092000] [] dev_change_flags+0x47/0xe4 > >> [ 6727.093000] [] devinet_ioctl+0x252/0x556 > >> [ 6727.094000] [] sock_ioctl+0x19e/0x1c2 > >> [ 6727.095000] [] do_ioctl+0x1f/0x62 > >> [ 6727.095000] [] vfs_ioctl+0x245/0x257 > >> [ 6727.096000] [] sys_ioctl+0x4c/0x67 > >> [ 6727.096000] [] syscall_call+0x7/0xb > >> [ 6727.096000] DWARF2 unwinder stuck at syscall_call+0x7/0xb > >> [ 6727.096000] [ 6727.096000] Leftover inexact backtrace: > >> [ 6727.096000] [ 6727.096000] [] unix_stream_connect+0x15b/0x367 > >> [ 6727.096000] ======================= > >> [ 6728.719000] e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps Full > > Duplex > >> [ 6728.719000] e1000: eth0: e1000_watchdog: 10/100 speed: disabling > >TSO > > Eric > -- > The linux-thinkpad mailing list home page is at: > http://mailman.linux-thinkpad.org/mailman/listinfo/linux-thinkpad gruss mlo -- Dipl.-Ing. Martin Lorenz They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. Benjamin Franklin please encrypt your mail to me GnuPG key-ID: F1AAD37D get it here: http://blackhole.pca.dfn.de:11371/pks/lookup?op=get&search=0xF1AAD37D ICQ UIN: 33588107 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/