Return-path: Received: from mail-ew0-f20.google.com ([209.85.219.20]:52788 "EHLO mail-ew0-f20.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753046AbZARXct (ORCPT ); Sun, 18 Jan 2009 18:32:49 -0500 Received: by ewy13 with SMTP id 13so103373ewy.13 for ; Sun, 18 Jan 2009 15:32:47 -0800 (PST) Message-ID: <4973BAC6.5020502@gmail.com> (sfid-20090119_003254_882567_F3D1B2CE) Date: Mon, 19 Jan 2009 00:27:02 +0100 From: Artur Skawina MIME-Version: 1.0 To: Artur Skawina CC: Christian Lamparter , Johannes Berg , Larry Finger , linux-wireless@vger.kernel.org Subject: Re: wireless-testing, p54 and sinus 154 data no longer works References: <494698AF.4020204@gmail.com> <496FFC9F.1000907@lwfinger.net> <1232097187.3854.9.camel@johannes> <200901162138.51855.chunkeey@web.de> <497105D1.5040906@gmail.com> In-Reply-To: <497105D1.5040906@gmail.com> Content-Type: text/plain; charset=UTF-8 Sender: linux-wireless-owner@vger.kernel.org List-ID: Artur Skawina wrote: > Christian Lamparter wrote: >> On Friday 16 January 2009 10:13:07 Johannes Berg wrote: >>> On Thu, 2009-01-15 at 21:18 -0600, Larry Finger wrote: >>> >>>>>> Object 0xddec18d0: >69< 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b ikkkkkkkkkkkkkkk >>>> I too have seen real single bit changes - in my case 6b went to 6a, >>>> and my memory is fine. I wouldn't necessarily blame your hardware. >>> 6b to 6a is often the result of a refcounting bug that happens to unref >>> a value _after_ it has been freed. But that doesn't explain 6b to 69, >>> unless you happen to have _two_ refcounting bugs. Not that I necessarily >>> think that memory is bad >> Well, this idiotic debug patch (kref-kernel-debug-patch) could shed some light into >> the problem who's using a freed skb. > > didn't trigger anything here, just the usual: > > BUG kmalloc-4096: Poison overwritten I've run memtest, even swapped both ram and cpu, and symptoms stay the same, so it's very likely not bad hw. Still haven't found the corruptor, but at least i've narrowed it down a bit; what i'm seeing is: 1) an skb "S" gets allocated in p54u_rx_cb and is submitted together w/ the urb. 2) "S" later comes back to p54u_rx_cb, where it is given to p54_rx (eventually ieee80211_rx_irqsafe) and a new one is allocated. 3) a few (~15) rx/tx packets pass. 4) SLUB detects modified poison in what used to be S->head in (1) and (2) above; usually 0x6b turns into 0x6a, but i have also seen 0x69, just a few times. (the offset from skb->head to the decremented byte seems to stay the same, at least during the few times i tried w/ the same kernel, last one was eg 684 bytes) This is almost 100% reproducible; sometimes the machine freezes instead. artur