Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756522AbYGWXU6 (ORCPT ); Wed, 23 Jul 2008 19:20:58 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755919AbYGWXUg (ORCPT ); Wed, 23 Jul 2008 19:20:36 -0400 Received: from py-out-1112.google.com ([64.233.166.177]:31165 "EHLO py-out-1112.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754593AbYGWXUe (ORCPT ); Wed, 23 Jul 2008 19:20:34 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=nQuWbivS0iQVGa+X9EYwUZkbVNMU1WtSJjKHqnPDog5V0bRdjFtqNdYTdNntwMenGC B3QmUbUPKRoxH7nDwvo8HL4RPQ8LJpSM1yrIjT03pCTFIQUEQhRLR0gB1jOffNiS5mTd 33Ji9B4rOTkbIU6oM5obxJZxvJ6dDC9u8Fi6w= Message-ID: <19f34abd0807231620q6d870bc0k74d176c9e5253ff3@mail.gmail.com> Date: Thu, 24 Jul 2008 01:20:32 +0200 From: "Vegard Nossum" To: "Dmitry Adamushko" , "Jeff Garzik" Subject: Re: recent -git: BUG in free_thread_xstate Cc: "Suresh Siddha" , LKML , "the arch/x86 maintainers" , "Paul E. McKenney" , "Ingo Molnar" , "Peter Zijlstra" , netdev@vger.kernel.org, "Arnaldo Carvalho de Melo" , "Matt Mackall" In-Reply-To: <19f34abd0807231550h4ba88a9qa27b1c9e5afc80cb@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <19f34abd0807231307y191c0ad7tfab4cda57ee88eb@mail.gmail.com> <20080723203109.GH14380@linux-os.sc.intel.com> <19f34abd0807231352j1ba1414am84ee9683df9b5657@mail.gmail.com> <19f34abd0807231445h79fac5cbwecd0563b74bc18ad@mail.gmail.com> <19f34abd0807231505w1a25c2bak329a622f3a287e97@mail.gmail.com> <19f34abd0807231545u5bc8b55fm768527a02268f111@mail.gmail.com> <19f34abd0807231550h4ba88a9qa27b1c9e5afc80cb@mail.gmail.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5332 Lines: 148 On Thu, Jul 24, 2008 at 12:50 AM, Vegard Nossum wrote: > On Thu, Jul 24, 2008 at 12:45 AM, Vegard Nossum wrote: >> Hey, with this patch applied: >> >> diff --git a/include/asm-x86/string_32.h b/include/asm-x86/string_32.h >> index b49369a..7bef7ea 100644 >> --- a/include/asm-x86/string_32.h >> +++ b/include/asm-x86/string_32.h >> @@ -29,9 +29,14 @@ extern char *strchr(const char *s, int c); >> #define __HAVE_ARCH_STRLEN >> extern size_t strlen(const char *s); >> >> +extern void warn_on_slowpath(const char *file, int line); >> + >> static __always_inline void * __memcpy(void * to, const void * from, size_t n) >> { >> int d0, d1, d2; >> + if (n == 0x6b) >> + warn_on_slowpath(__FILE__, __LINE__); >> + >> __asm__ __volatile__( >> "rep ; movsl\n\t" >> "movl %4,%%ecx\n\t" >> >> I have found an important clue; it seems to be my network driver's fault: >> >> ------------[ cut here ]------------ >> WARNING: at include2/asm/string_32.h:38 skb_copy_and_csum_dev+0xee/0x100() >> Pid: 3989, comm: bash Tainted: G W 2.6.26-dirty #3 >> [] warn_on_slowpath+0x4f/0x70 >> [] ? check_bytes_and_report+0x21/0xc0 >> [] ? __kfree_skb+0x34/0x80 >> [] ? check_bytes_and_report+0x21/0xc0 >> [] ? check_object+0xdf/0x1f0 >> [] ? check_bytes_and_report+0x21/0xc0 >> [] ? __kfree_skb+0x34/0x80 >> [] ? check_object+0xdf/0x1f0 >> [] ? find_skb+0x3c/0x80 >> [] skb_copy_and_csum_dev+0xee/0x100 >> [] rtl8139_start_xmit+0x57/0x130 >> [] ? __kmalloc_track_caller+0x8b/0x120 >> [] netpoll_send_skb+0x14e/0x1a0 >> [] netpoll_send_udp+0x1e4/0x210 >> [] write_msg+0x8c/0xc0 >> [] __call_console_drivers+0x53/0x60 >> [] _call_console_drivers+0x4b/0x90 >> [] release_console_sem+0xc5/0x1f0 >> [] vprintk+0x2ce/0x420 >> [] ? do_IRQ+0x4d/0xa0 >> [] ? restore_nocheck+0x12/0x15 >> [] ? delay_tsc+0x61/0xb8 >> [] ? delay_tsc+0x86/0xb8 >> [] printk+0x1b/0x20 >> [] native_cpu_up+0x7cd/0x880 >> [] ? internal_create_group+0xd1/0x180 >> [] ? do_fork_idle+0x0/0x20 >> [] ? __raw_notifier_call_chain+0x19/0x20 >> [] _cpu_up+0x83/0x100 >> [] cpu_up+0x49/0x70 >> [] store_online+0x58/0x80 >> [] ? store_online+0x0/0x80 >> [] sysdev_store+0x2b/0x40 >> [] sysfs_write_file+0xa2/0x100 >> [] vfs_write+0x96/0x130 >> [] ? sysfs_write_file+0x0/0x100 >> [] sys_write+0x3d/0x70 >> [] sysenter_past_esp+0x78/0xd1 >> ======================= >> ---[ end trace a7919e7f17c0a725 ]--- >> >> In particular, these are interesting: >> >> [] skb_copy_and_csum_dev+0xee/0x100 >> >> This is net/core/skbuff.c:1731: >> skb_copy_from_linear_data(skb, to, csstart); >> >> [] rtl8139_start_xmit+0x57/0x130 >> >> This is drivers/net/8139too.c:1711: >> dev_kfree_skb(skb); >> > > Oops, this should of course be the line just above (because the > address on the stack is the return address...), which is: > > skb_copy_and_csum_dev(skb, tp->tx_buf[entry]); > > (Big surprise there ;-)) > >> (The line numbers are still from v2.6.26, but this reproduces on >> current -git as well.) >> >> Is this enough information to fix it? :-) > > I've also added Jeff Garzik to Cc since he seems to be the maintainer > of this driver. Hm. I'm not sure it's the driver's fault after all. Look at the skb_copy_and_csum_dev() line again: skb_copy_from_linear_data(skb, to, csstart); And csstart was probably loaded in this line: csstart = skb_headlen(skb); Which makes sense if "skb" was freed (that's the case where "csstart" would be 0x6b). Hm, looking at skb_headlen(): static inline unsigned int skb_headlen(const struct sk_buff *skb) { return skb->len - skb->data_len; } It seems difficult for this to return 0x6b unless skb->data_len has been set to 0 after it was freed. In either case, rtl_8139_start_xmit() is only passing on the skb it got from netpoll_send_skb(). The call is from net/core/netpoll.c:290: status = dev->hard_start_xmit(skb, dev); Looks like the skb is passed into this as well... netpoll_send_skb(), line 370: netpoll_send_skb(np, skb); So finally, this function is doing lots of stuff with skbs which I have no idea what is. Seems like this one is getting an already freed skbuff. Somehow. Or maybe it's freed while it's handling it. Hm, seems to be no recent changes in this area. Maybe I'm on the completely wrong track. I'll add a couple of Cc in either case. Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/