Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964846AbWHLOch (ORCPT ); Sat, 12 Aug 2006 10:32:37 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S964841AbWHLOcg (ORCPT ); Sat, 12 Aug 2006 10:32:36 -0400 Received: from ogre.sisk.pl ([217.79.144.158]:20622 "EHLO ogre.sisk.pl") by vger.kernel.org with ESMTP id S964835AbWHLOcf (ORCPT ); Sat, 12 Aug 2006 10:32:35 -0400 From: "Rafael J. Wysocki" To: Andrew Morton Subject: Re: 2.6.18-rc3-mm2 (+ hotfixes): GPF related to skge on suspend Date: Sat, 12 Aug 2006 16:31:18 +0200 User-Agent: KMail/1.9.3 Cc: LKML , Stephen Hemminger , netdev@vger.kernel.org References: <200608121207.42268.rjw@sisk.pl> <20060812052853.f9e5d648.akpm@osdl.org> In-Reply-To: <20060812052853.f9e5d648.akpm@osdl.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200608121631.18603.rjw@sisk.pl> Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6311 Lines: 139 On Saturday 12 August 2006 14:28, Andrew Morton wrote: > On Sat, 12 Aug 2006 12:07:42 +0200 > "Rafael J. Wysocki" wrote: > > > Hi, > > > > On 2.6.18-rc3-mm2 with hotfixes I get things like the appended one on attempts > > to suspend to disk. It occurs while devices are being suspended and is fairly > > reproducible. > > > > Greetings, > > Rafael > > > > > > Suspending device 0000:01:00.0 > > Suspending device 0000:02:02.0 > > Suspending device 0000:02:01.4 > > Suspending device 0000:02:01.3 > > Suspending device 0000:02:01.2 > > Suspending device 0000:02:01.1 > > Suspending device 0000:02:01.0 > > Suspending device 0000:02:00.0 > > skge Ram read data parity error > > skge Ram write data parity error > > skge eth0: receive queue parity error > > skge : receive queue parity error This stuff comes from the interrupt handler which apparently races with something. > > skge 0000:02:00.0: PCI error cmd=0x110 status=0x2b0 > > general protection fault: 0000 [1] PREEMPT > > last sysfs file: /devices/pci0000:00/0000:00:0a.0/0000:02:02.0/subsystem_device > > CPU 0 > > Modules linked in: ide_cd cdrom usbserial asus_acpi thermal ipv6 processor fan button battery ac af_packet snd_pcm_oss snd_mixer_oss snd_seq > > snd_seq_device bcm43xx ieee80211softmac ieee80211 ieee80211_crypt pcmcia firmware_class ohci1394 ieee1394 skge yenta_socket rsrc_nonstatic pc > > mcia_core usbhid ff_memless snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_pcm snd_timer snd soundcore snd_page_alloc ehci_hcd ohci_hcd i2c_nfo > > rce2 i2c_core parport_pc lp parport dm_mod > > Pid: 4, comm: events/0 Not tainted 2.6.18-rc3-mm2 #17 > > RIP: 0010:[] [] :skge:skge_poll+0x547/0x570 > > RSP: 0018:ffffffff80621e70 EFLAGS: 00010202 > > RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000000 RCX: 0000000000000040 > > RAX doesn't look good. Yup. > > RDX: ffff81005addf128 RSI: ffffffff80621eec RDI: ffff81005addeb60 > > RBP: ffffffff80621ed0 R08: 0000000000000001 R09: 0000000000000000 > > R10: 0000000000000040 R11: 0000000000000000 R12: ffff81005addf0a0 > > R13: 0000000000000000 R14: ffff810057fe9180 R15: 00000000ffffffff > > FS: 00002b4b98df4b00(0000) GS:ffffffff808c2000(0000) knlGS:00000000558b4d00 > > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > > CR2: 00002adeb0d7d0b0 CR3: 0000000025147000 CR4: 00000000000006e0 > > Process events/0 (pid: 4, threadinfo ffff810037f44000, task ffff810037fef100) > > Stack: ffffffff80621eb0 ffffffff80621eec ffff81005addeb60 ffff81005ad61488 > > ffff81005addf128 000000400000000a 00000001008e6a25 0000000000000000 > > ffff81005addeb60 0000000000000000 00000001008e6a25 00000000ffffffff > > Call Trace: > > [] net_rx_action+0xba/0x1f0 > > [] __do_softirq+0x70/0xf0 > > [] call_softirq+0x1c/0x30 > > DWARF2 unwinder stuck at call_softirq+0x1c/0x30 > > Leftover inexact backtrace: > > [] do_softirq+0x3d/0xb0 > > [] irq_exit+0x4e/0x60 > > [] do_IRQ+0x135/0x140 > > [] rt_run_flush+0x8e/0xd0 > > [] ret_from_intr+0x0/0xf > > [] local_bh_enable_ip+0xe7/0x110 > > [] _spin_unlock_bh+0x39/0x40 > > [] rt_run_flush+0x8e/0xd0 > > [] rt_cache_flush+0xab/0x100 > > [] fib_netdev_event+0xa9/0xc0 > > [] notifier_call_chain+0x2f/0x50 > > [] raw_notifier_call_chain+0x9/0x10 > > [] netdev_state_change+0x29/0x40 > > [] linkwatch_run_queue+0x162/0x190 > > [] linkwatch_event+0x2a/0x40 > > [] run_workqueue+0xc2/0x120 > > [] linkwatch_event+0x0/0x40 > > [] worker_thread+0x121/0x160 > > [] default_wake_function+0x0/0x10 > > [] worker_thread+0x0/0x160 > > [] kthread+0xd9/0x110 > > [] trace_hardirqs_on+0x11d/0x150 > > [] child_rip+0x8/0x12 > > [] _spin_unlock_irq+0x2b/0x60 > > [] restore_args+0x0/0x30 > > [] kthread+0x0/0x110 > > [] child_rip+0x0/0x12 > > Code: 44 8b 28 c7 45 d0 00 00 00 00 45 85 ed 0f 89 29 fb ff ff e9 > > RIP [] :skge:skge_poll+0x547/0x570 > > RSP > > <0>Kernel panic - not syncing: Aiee, killing interrupt handler! > > ksymoops says: > > Code; ffffffff88107287 <_end+7ac9287/7efc2000> > 00000000 <_EIP>: > Code; ffffffff88107287 <_end+7ac9287/7efc2000> <===== > 0: 44 inc %esp <===== > Code; ffffffff88107288 <_end+7ac9288/7efc2000> > 1: 8b 28 mov (%eax),%ebp > Code; ffffffff8810728a <_end+7ac928a/7efc2000> > 3: c7 45 d0 00 00 00 00 movl $0x0,0xffffffd0(%ebp) > Code; ffffffff88107291 <_end+7ac9291/7efc2000> > a: 45 inc %ebp > Code; ffffffff88107292 <_end+7ac9292/7efc2000> > b: 85 ed test %ebp,%ebp > Code; ffffffff88107294 <_end+7ac9294/7efc2000> > d: 0f 89 29 fb ff ff jns fffffb3c <_EIP+0xfffffb3c> > Code; ffffffff8810729a <_end+7ac929a/7efc2000> > 13: e9 00 00 00 00 jmp 18 <_EIP+0x18> > > So even if we didn't deref a kfree'd pointer, we're about to. Hm, but the code should be 64-bit? > It would be good if you could poke around in gdb, work out exactly which > statement it's oopsing at, please. (gdb) l *skge_poll+0x547 0x5287 is in skge_poll (skge.c:2719). 2714 struct skge_rx_desc *rd = e->desc; 2715 struct sk_buff *skb; 2716 u32 control; 2717 2718 rmb(); 2719 control = rd->control; 2720 if (control & BMU_OWN) 2721 break; 2722 2723 skb = skge_rx_get(skge, e, control, rd->status, rd->csum2); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/